are you talking about something like Acquisio?
no
All right fuckers, here you go
keyword grouping
I've got a dataset loaded up in the textbox but you are welcome to use your own keywords if you prefer.
I've noticed a couple of random phrases floating around in the output so it's still not perfect yet, anything you guys could point out would be appreciated. I am happy that there are very few leftover words after all the grouping is done.
For all the nerds following this thread: We were really over-thinking things and really missed something obvious. These algorithms provide scattered results regardless how high you turn the filter up ... for example, program and problem will almost always be considered similar words.
The solution: filter out the most popular 2 & 3 word phrases 1st so that way you can effectively pull them out of the results before dropping a levenshtein on them. It also speeds things up quite a bit.
For the record, each grouping follows the pattern
1. filter all matches for the most common 2&3 word phrases
2. match possible adgroups by accepting any word > 80% using
PHP:
$lev_match = levenshtein($haystack,$needle) ;
$percent = ($lev_match / strlen($needle) ) * 100 ;
3. cycle through all words with each adgroup (#2) and pull out all matches > 80% common using similar_text
4. dump all that shit into $big_array and then run #2 & #3 again
5. take whatever leftover words you have and parse out the most common 1 word phrases. Match the phrase for each word and assign it to an adgroup.
The other functions would come in handy for matching up the misspellings ... I chose to just ignore this part for now.
I've run about a dozen niches and they all turn out relatively well. Like I said, there are still some imperfections but if I were setting up a campaign, I'd be happy to start from these groups rather than from scratch.
I'll probably not develop this particular version any further, unless you guys give some awesome suggestions. The final product that I'll be using for Campaign Sniper's AI will factor in high volume phrases for the ad grouping instead of just common words in the set. It'll also use AI to pull ad copy for each ad group.
Edit: I almost forgot, I put the smallest word in each ad group as the ad group name ... don't read anything special into the naming conventions.