Has anyone figured out how to scrape new Google Keyword Tool and willing to share? :)

One thing to keep in mind is that you can make multiple calls per request, depending on what you are trying to do. I am using the API to query numbers on the existing keywords. So I can send bulk request for 500 keywords at a time. Which, from my understanding, counts as one query, but 500 "items" at 0.1 credits per call. Meaning that I'll use up 55 credits for that one operation, which costs $0.01375.
 


acidie: I also just tried logging in and checking the output of the keyword tool and it seems to be nearly identical to the external tool. How do you grab the data from there? Do you use ubot or something like that or do you use a script (PHP/Perl/Ruby)?

Thanks!
 
How do you grab the data from there? Do you use ubot or something like that or do you use a script (PHP/Perl/Ruby)?

I use PHP with CURL, but the process is the same for any language.

Basically it break down like this;

NOTE: All request should save the returned cookie data and send any previously saved cookie data.

1) Login to Google. I use this URL 'https://www.google.com/accounts/ServiceLoginAuth', but there are a few different ones.

2) Request this url "https://adwords.google.com/um/StartNewLogin?sourceid=awo&subid=ww-en-et-gaia" and follow all the redirects, there are 6 (this is setting cookies need for Adwords).

3) When the last redirect complete and the HTML data is returned, extract the values for __u= and __c=. I use regex but any way will do.
Code:
u=(.*?)&__c=(.*?)&
4) Request this URL "https://adwords.google.com/o/Targeting/Explorer?stylePrefOverride=2&__u=&__c=" adding the values extracted for 'u' and 'c'. The URL should look something like "https://adwords.google.com/o/Targeting/Explorer?stylePrefOverride=2&__u=1654564568&__c=8794964564"

5) Extract the value for token, again I use regex
Code:
token:'(.*?)'
6) Send a POST request to the URL 'https://adwords.google.com/o/Targeting/file/DownloadAll' with the POST data outlined below.

7) (OPTIONAL) Decompress the data and format it with a CSV function.

You only have to go through the steps 1 to 5 every session, so if you have already scraped a keyword recently, you can just do step 6 with a new keyword, without repeating the login steps.

For the POST data you can modify some of the values to change the data returned, but unless specified, changing any of the other parameters will break this method and cause Google to return an error.

POST data;

$u is the __u value you extracted in step 3
$c is the __c value you extracted in step 3
$format can be CSV, GZIPPED_CSV, CSVFOREXCEL, XML or TSV
$county is a uppercase two letter country code, US, GB, DE, etc
$language is a lowercase two letter language code, en, jp, etc
$keyword is the keyword (do not url encode the keywords, leave a space as a space, don't change it to %20 or +)
$token is the token you extracted in step 5

This is the POST string for keywords

Code:
__u=$u&__c=$c&format=$format&selector=5|1|37|https://adwords.google.com/o/Targeting/|0BC91872D430CA67A5237BB62909C258|106|17t|110|en_US|ul|12g|11f|17k|11i|KEYWORD|COMPETITION|GLOBAL_MONTHLY_SEARCHES|AVERAGE_TARGETED_MONTHLY_SEARCHES|TARGETED_MONTHLY_SEARCHES|IDEA_TYPE|AD_SHARE|EXTRACTED_FROM_WEBPAGE|SEARCH_SHARE|KEYWORD_CATEGORY|NGRAM_GROUP|12x|19z|hh|$county|13a|hu|$language|139|17q|bm|13h|16x|17o|bl|$keyword|1|2|3|4|18|5|0|6|7|50|0|8|0|0|9|0|10|9|11|12|11|13|11|14|11|15|11|16|11|17|11|18|11|19|11|20|10|3|-8|11|21|11|22|10|4|23|24|1|25|26|0|27|24|1|28|29|30|31|32|2|33|34|35|36|-29|37|0|0|0|&token=$token
For example if you wanted to return gzip CSV format data for the keyword 'dog' with the location being 'United States', the language being 'english', the token being 'abcdef:1234567', the __c value '987654321' and the __u value '147852369', the POST string would look like;

Code:
__u=147852369&__c=987654321&format=GZIPPED_CSV&selector=5|1|37|https://adwords.google.com/o/Targeting/|0BC91872D430CA67A5237BB62909C258|106|17t|110|en_US|ul|12g|11f|17k|11i|KEYWORD|COMPETITION|GLOBAL_MONTHLY_SEARCHES|AVERAGE_TARGETED_MONTHLY_SEARCHES|TARGETED_MONTHLY_SEARCHES|IDEA_TYPE|AD_SHARE|EXTRACTED_FROM_WEBPAGE|SEARCH_SHARE|KEYWORD_CATEGORY|NGRAM_GROUP|12x|19z|hh|US|13a|hu|en|139|17q|bm|13h|16x|17o|bl|dog|1|2|3|4|18|5|0|6|7|50|0|8|0|0|9|0|10|9|11|12|11|13|11|14|11|15|11|16|11|17|11|18|11|19|11|20|10|3|-8|11|21|11|22|10|4|23|24|1|25|26|0|27|24|1|28|29|30|31|32|2|33|34|35|36|-29|37|0|0|0|&token=abcdef:1234567
This is the POST string for mobile keywords

Code:
__u=$u&__c=$c&format=$format&selector=5|1|38|https://adwords.google.com/o/Targeting/|0BC91872D430CA67A5237BB62909C258|106|17t|110|en_US|ul|12g|11f|17k|11i|KEYWORD|COMPETITION|GLOBAL_MONTHLY_SEARCHES|AVERAGE_TARGETED_MONTHLY_SEARCHES|TARGETED_MONTHLY_SEARCHES|IDEA_TYPE|AD_SHARE|EXTRACTED_FROM_WEBPAGE|SEARCH_SHARE|KEYWORD_CATEGORY|NGRAM_GROUP|12x|19z|hh|$county|13a|hu|$language|13d|139|17q|bm|13h|16x|17o|bl|$keyword|1|2|3|4|18|5|0|6|7|50|0|8|0|0|9|0|10|9|11|12|11|13|11|14|11|15|11|16|11|17|11|18|11|19|11|20|10|3|-8|11|21|11|22|10|5|23|24|1|25|26|0|27|24|1|28|29|30|31|32|33|2|34|35|36|37|-30|38|0|0|0|&token=$token
This is the POST string for website

Code:
__u=$u&__c=$c&format=$format&selector=5|1|35|https://adwords.google.com/o/Targeting/|0BC91872D430CA67A5237BB62909C258|106|17t|110|en_US|ul|12g|11f|17k|11i|KEYWORD|COMPETITION|GLOBAL_MONTHLY_SEARCHES|AVERAGE_TARGETED_MONTHLY_SEARCHES|TARGETED_MONTHLY_SEARCHES|IDEA_TYPE|AD_SHARE|EXTRACTED_FROM_WEBPAGE|SEARCH_SHARE|KEYWORD_CATEGORY|NGRAM_GROUP|12x|19z|hh|$county|13a|hu|$language|139|17q|bm|13i|19p|$website|1|2|3|4|18|5|0|6|7|50|0|8|0|0|9|0|10|9|11|12|11|13|11|14|11|15|11|16|11|17|11|18|11|19|11|20|10|3|-8|11|21|11|22|10|4|23|24|1|25|26|0|27|24|1|28|29|30|31|32|2|33|0|31|34|35|0|&token=$token
 
acidie: great step by step outline! Thanks a lot. +rep for that one... saved me so much time. Will try all of this later on tonight.

One question though, would this return one keyword at a time? Or does it return suggestions for a seed keyword? Can I submit 100 keywords at a time?

Thanks!
 
One question though, would this return one keyword at a time? Or does it return suggestions for a seed keyword?

This will return up to 800 keywords based on a search of one term.

So as an example, if you search for the term 'dog' you will get up 800 results. Google caps the number of results returned at 800 much like they cap the number of results in SERPs to 1000.

Can I submit 100 keywords at a time?

You can only search one term at a time (although you can modify the packet to achieve this), but you can have spaces in that term and/or multiple words, so for example 'cheap dog clothes' can be used, but you cant use multiple terms like;

Code:
dog
dogs
dog clothes
dogs clothes
cheap dog clothes
Unfortunately my understanding of the data being sent (it's GWT RPC) is limited, and therefor I can't modify the data easily to encompass the extra terms.

But having said that, I wouldn't search more than 1 term anyway, since the more terms you use, the more those terms have to compete with each other to show results.

In other words, if you have 1 term the maximum results can be 800.
If you have 2 then it's 400 per term on average (400 x 2 = 800)
If you have 4 then it's 200 per term on average (200 x 4 = 800)
If you have 100 then it's 8 per term on average (8 x 100 = 800), etc

Personally I would rather take longer to get the results, since each term has to be searched individually, and get more results than speed up the process and lose results.
 
I made a uBot script in 15 mins that scrapes GKWT.. Can also be integrated with de-captcha to automate everything.

They do include CPC costs, you just have to edit the columns to display it.

If anyone wants the uBot scraper, lemme kno.
 
Best way to do this is scrape your "seed" words. You want shit people are actually buying, then you will only run it through the tool once or twice.

Some more hints:

Code:
$url = "https://adwords.google.com/select/VariationsTool?" .
                "adgroupid=0&campaignid=0&adgroupIntegrated=false&" .
                "skipLogin=true&currencyCode=USD&maxCpcOverride=&" .
                "synonyms=true&suggest=true&excludedWords=&" .
                "targetLanguages=en&targetCountries={$country}&" .
                "allowExisting=true&showAdult=false&showTrademark=false&" .
                "keywords={$keyword}&captchaAnswer=";
You don't have to log in you just have to process the captcha and cookies. I use regular expression to parse the raw results rather than zip files or whatever. This gives you local/global volume and average cpc.

If you are a real baller, you then pull page rank, backlinks, etc. for competitors and scrape whois for exact matches. Eventually you'll run into the problem of having so much fucking data you can't run a query to get results out :D
 
Another thing to look at is Yahoo's API(BOSS).. it's free and you can abuse the hell out of it without getting banned.
 
How would you use Yahoo API to scrape Google KW tool ???
You can't. That's why I said 'another thing' as in 'in addition to using Google you can try using Yahoo to get keywords or backlinks'. Please PM me if you need more clarification.
 
You can't. That's why I said 'another thing' as in 'in addition to using Google you can try using Yahoo to get keywords or backlinks'. Please PM me if you need more clarification.

OK, I was just confused why this even came up in the context of Google scraping thread...

Acidie has great tips though it would be nice to see the complete code :)
 
OK, I was just confused why this even came up in the context of Google scraping thread...

Acidie has great tips though it would be nice to see the complete code :)

Allow me to blow your mind:

Here's a python script that will scrape bing related results for unique keywords:

Code:
>>> kws = get_keywords('apple pie', 50)
gathered  8  so far
gathered  16  so far
gathered  21  so far
gathered  29  so far
gathered  33  so far
gathered  41  so far
gathered  42  so far
gathered  43  so far
gathered  47  so far
gathered  47  so far
gathered  55  so far
>>> for kw in kws:
	print kw

	
Apple Pie Punch Everclear Recipes
French Apple Pie
Apple Pie Mixed Drink
Canned Apple Pie Filling Recipe
How to Make Moonshine Whiskey
Apple Pie Filling Recipe
Caramel Apple Pie Recipe
Apple Pie Alcohol
Best Dutch Apple Pie
Hot Apple Pie Hillbillies
Simple Apple Pie Recipe
Dutch Apple Pie Topping
Dutch Apple Pie Amish
Apple Pie Schnapps Recipe
Free Dutch Apple Pie Recipe
Best Apple Pie Recipe
Easy to Make Whiskey Still
Hot Apple Pie Mixed Drink
Dutch Apple Pie Filling Recipe
Hot Apple Pie Country Band
Make Apple Pie
Apple Pie Everclear
Hot Apple Pie Alcohol Drinks
Comstock Apple Pie Filling
Homemade Apple Pie Filling Recipe
Homemade Apple Pie
Homemade Apple Pie Liquor
Whiskey Mash Recipe
Moonshine Restaurant Austin Texas
Dutch Apple Crumb Pie Recipe
Apple Pie Liqueur
Make Corn Whiskey
Hot Apple Pie Moonshine
Hot Apple Pie Lyrics
Dutch Apple Pie
Homemade Dutch Apple Pie Recipe
Make Your Own Whiskey
Apple Dutch Easy Pie Recipe
Apple Pie Recipe
Hot Apple Pie
Betty Crocker Apple Pie
Recipes Using Apple Pie Filling
Apple Pie Everclear Recipe
Apple Pie Wine Recipe
Easy Dutch Apple Pie
Hot Apple Pie Shot
Vodka Everclear Hot Apple Pie
Make Whiskey Still
Everclear Apple Pie Wine
Build Whiskey Still

Here's the actual script:
Code:
import bingapi
import random


bing = bingapi.Bing('xxxx','en-us')


def get_keywords(phrase,amount):
    gathered_keywords = set() #hold scraped keywords
    
    while len(gathered_keywords) < amount:
        
        query = bing.do_related_search(phrase)['SearchResponse']['RelatedSearch']['Results']
        keywords = [keyword['Title'] for keyword in query]
        gathered_keywords.update(keywords)
        phrase = random.choice(list(gathered_keywords))
        
        print 'gathered ',len(gathered_keywords),' so far'
        
    return list(gathered_keywords)[:amount]

Notes:
Uses this class: Python Wrapper on Bing API ? The Usware Blog - Django Web Development you will need to edit it to accept a market parameter (en-us) or it won't work

Using random.choice is horrible inefficient, iterate over each keyword you receive instead.

Need a bing developer key.
 
though it would be nice to see the complete code

The problem with posting code is that it's useless to people who don't use the language it's written in.

I could post PHP code to login and scrape, but to people who write in perl, python, ruby, etc, it's going to make it harder to understand.

Where as straight step by step is language neutral.
 
This thread delivers. Lots of great information here, oh and please don't give out complete code as it will render it useless once some newb gets his hands on it and uses it stupidly.
 
To give you can example of what I do.

Scrape google keywords
pull google search results for keywords
evaluate PR for top competitors
for keywords with volume greater than X and competition PR < Y = pull whois for .com .net .org
Rank them by quality
next up is evaluating number of yahoo backlinks instead of PR

This then dumps out exact match domains for keywords with whatever minimum level of search volume you want in low competition niches. It's formated in a nice report with volume, cpc, PR for competition, and the urls for all the competitors. So you just skim through it and buy domains that aren't trademarks.

Anyone got a solution for auto-evaluating trademarks? A very large percentage of keywords I get with exact match domains are product names (cocacolaglasses.com). If I could prefilter these out I could get more usable domains.
 
Also any suggestions on how else to rank/filter domains for quality. Given a pretty conservative whois usage I can only evaluate so many in a day.

Low competition medium/low volume tosses out a lot of garbage. Maybe high CPC with a minimum volume?