Has anyone figured out how to scrape new Google Keyword Tool and willing to share? :)

moltar

Violating Airspace
Jul 8, 2009
987
10
0
Canada
It seems like the new tool is returning more relevant results. Has anyone deciphered the calls and returned data yet?

Thanks!
 


Working on a similar project for a client. Here's the thing, the Google API is fucking crazy cheap, $0.25 per 1000 API calls. Why deal with scraping, using proxies and solving captchas when just dishing out for the API is probably way cheaper unless you have a great source of free proxies?
 
Yes (sort of), create an account and export the data as a CSV (they even offer gzip, which is nice). Path of least resistance.

I looked at the JavaScript and was just about to start reversing it (I did actually start, but nothing more than looking at the first function call and the structure of the data returned) and then I realized that Google will just hand the formatted data to you on a plate if you have an account.

There is the issue of Google tracking what you're doing based on the account, but if that really matters then requests can be spread over multiple accounts, IPs, etc.

But for anyone that actually wants to attempt this, from my very and I mean very brief look at the code, it looks like the data is chunked and then reassembled.

So, for example, the data for the term 'cats' is 24,900,000, I assume what Google is doing is chunking the data like [2,4,9,0,0,0,0,0,0] and then resembling it.

As I say, I looked at this for like two seconds, so I could be completely off the mark. Take my assumptions with a grain of salt.

The JavaScript also contains everything that the page needs to render the data, so if you're not careful you can end up chasing functions that change font size or move a div. I suspect that out of the block of code they use, there is only 2 or 3 functions that actually convert the data to something meaningful, the rest is just junk.
 
the Google API is fucking crazy cheap, $0.25 per 1000 API calls.

Actually I like this solution better than mine.

Do you know if they will rate limit you if you start banging the absolute crap out of the API? And by absolute crap I mean 1,000 a request a minute, although not over an extended period of time.

I assume they don't care as long as the bills are paid.
 
If you where going to scrape it all you would do is watch the header data made with the ajax calls to see what data is passed to where, and what is returned. The returned data is probably just standard XHTML generated by a backend script. That's normally how these things work, it isn't fucking rocket science.

You could, as always, totally avoid all that by using something like uBot.

Anyways as for caps on API usage, no idea.

To get started with the API you need a My Client Center (MCC), go here: Google AdWords - My Client Center

API development token takes 1 - 2 weeks to get approved according to Google. I however was approved in about 3-4 days.
 
*facepalm* I can't believe I missed this the first time I looked at the code, but the sent data is GWT RPC (Google Web Tools) and the returned is in GWT JSON with a JavaScript function call to 'OK'.

There is a php gwt server, but you need a client to pull the data out of adsence and as far as I know there isn't one. So barring rewritting the JavaScript in PHP (assuming you're using PHP) manually, you're fucked.

Personally I wouldn't bother, either use the API or login.

Anyways as for caps on API usage, no idea.

To get started with the API you need a My Client Center (MCC), go here: Google AdWords - My Client Center

API development token takes 1 - 2 weeks to get approved according to Google. I however was approved in about 3-4 days.

Cool thanks
 
when you perform a a search, the following is posted to: https://adwords.google.com/o/Targeting/g?__u=1000000000&__c=1000000000


Code:
5|1|54|https://adwords.google.com/o/Targeting/|0BC91872D430CA67A5237BB62909C258|_|invoke|3|15s|19y|15n|15o|g|TiAction (Search.SearchInput.KEYWORD_IDEAS.RelatedToKeyword)|ATbciIPQBuWx_WecIzve5Ihro3s:1276948076390|19s|10g|15v|15x|c|h|i|106|17t|110|en_US|ul|12g|11f|17k|11i|KEYWORD|COMPETITION|GLOBAL_MONTHLY_SEARCHES|AVERAGE_TARGETED_MONTHLY_SEARCHES|TARGETED_MONTHLY_SEARCHES|IDEA_TYPE|AD_SHARE|EXTRACTED_FROM_WEBPAGE|SEARCH_SHARE|KEYWORD_CATEGORY|NGRAM_GROUP|12x|19z|hh|US|13a|hu|en|139|17q|bm|13h|16x|17o|bl|banana man|1|2|3|4|1|5|5|6|2016479548|32818055562133504|6|7|1|8|0|9|0|0|10|2|1000000000|0|1000000000|0|11|12|13|1|14|15|16|-4|0|0|0|17|5|0|0|0|0|0|0|0|0|0|18|0|19|1|0|0|0|0|0|0|0|0|0|7|0|0|0|20|21|235|22|0|23|24|50|0|25|0|0|26|0|27|9|28|29|28|30|28|31|28|32|28|33|28|34|28|35|28|36|28|37|27|3|-22|28|38|28|39|27|4|40|41|1|42|43|0|44|41|1|45|46|47|48|49|2|50|51|52|53|-43|54|0|0|0|

I searched for "banana man". I really dont have the time to work this one out lol. Might do later.
 
Working on a similar project for a client. Here's the thing, the Google API is fucking crazy cheap, $0.25 per 1000 API calls. Why deal with scraping, using proxies and solving captchas when just dishing out for the API is probably way cheaper unless you have a great source of free proxies?

I am not 100% sure, but last time I looked at the API, you could only query one keyword at a time. While scraping you can do 100 at a time. Is that still true? If I send a query for 100 keywords, will I pay for each one or per request?

Actually I like this solution better than mine.

Do you know if they will rate limit you if you start banging the absolute crap out of the API? And by absolute crap I mean 1,000 a request a minute, although not over an extended period of time.

I assume they don't care as long as the bills are paid.

That's another thing, I know that in the past they did rate limit you.

If you where going to scrape it all you would do is watch the header data made with the ajax calls to see what data is passed to where, and what is returned. The returned data is probably just standard XHTML generated by a backend script. That's normally how these things work, it isn't fucking rocket science.

Sounds pretty easy in theory, but in practice, this version of the keyword tool is all messed up. The first one took me 20 minutes to figure out and implement. This one I keep staring and going back and forth with Firebug, and so far it's not looking very good.

*facepalm* I can't believe I missed this the first time I looked at the code, but the sent data is GWT RPC (Google Web Tools) and the returned is in GWT JSON with a JavaScript function call to 'OK'.

Not sure what you mean here. Please elaborate?

Thanks to all who replied.
 
Seems like if speed isn't a huge concern, automating the browser is the way to go. Without usage limits and if speed is a concern, the API sounds like the best setup.
 
I am not 100% sure, but last time I looked at the API, you could only query one keyword at a time. While scraping you can do 100 at a time. Is that still true? If I send a query for 100 keywords, will I pay for each one or per request?

Looking at the payment structure of the API request it's turns out not to be a straight 1:1. In other words they charge you not only on the request made (which costs you 5 API call credits) but also they charge you on the data returned as well (at a rate of 0.1 API credits per keyword).

It roughly works out to be $25 per 900,000 keywords returned. Not a lot, but can start to add up if you're just browsing keywords or you researching a lot of them.

That's another thing, I know that in the past they did rate limit you.

Which if they still do, make the API useless, at least in my opinion.

Not sure what you mean here. Please elaborate?

The data being sent by the JavaScript is in a format called GWT, as a RPC (remote procedure call), which is a serialiation format that Google uses. It was written in Java (not JavaScript) initially, but people have made ports to other languages.

The data being returned is in GWT JSON format.

What this means is that if you want to retrieve the data easily, you have to find a GWT client that is capable of sending GWT RPC calls to a server and then interpreting the returned GWT JSON.

I looked at a few versions, and there is a PHP Server API but not one for the client. So that means, if you use PHP, that either you write an interpreter yourself or use alturnative methods.

Again I would recommend using the login method if you want to bypass the API. But if you're trying to build this into a tool for mass use, then that wont work very well, unless you run a server as a go between (proxy so to speak) which handles the login and data retrieval on behalf of the user.

I know I'm repeating myself but seriously look at loging into Google as a solution, they will hand you the data nicely formatted in CSV.

It took me less than 10 minutes to write the code to do that, but it would take me about a day to reverse the JavaScript and have a working client.

For me, it's a waste of time to chase the JavaScript, but then I might be missing something crucial.

Is there a reason you just want to scrape the keywords without having an account?
 
i think it anyway can't show all the data that semrush provides for example
and there are a lot of tools better than keyword tool. but it is only ny opinion :)
 
If anyone is interested in purchasing a subscription to a script/tool I have please email Sales@BannerBlindness.com.

The tool will allow you to auto scrape the google KW suggestion tool. Put in one root keyword and u will daisy chain out a sick list of KW's :)

Go fuck off. People are discussing something completely different here, and what you are selling has been shared free here before.
 

I'm doing that now. I have a feeling they are going to discontinue it soon though. I just want to be proactive and move over to the new tool before they do that. Also, the new tool seems to have better results.

Looking at the payment structure of the API request it's turns out not to be a straight 1:1. In other words they charge you not only on the request made (which costs you 5 API call credits) but also they charge you on the data returned as well (at a rate of 0.1 API credits per keyword).

It roughly works out to be $25 per 900,000 keywords returned. Not a lot, but can start to add up if you're just browsing keywords or you researching a lot of them.

Shit that's even worse than I thought it was. Ya that's no good at all. I have millions of keywords I need to check numbers on.


The data being sent by the JavaScript is in a format called GWT, as a RPC (remote procedure call), which is a serialiation format that Google uses. It was written in Java (not JavaScript) initially, but people have made ports to other languages.

The data being returned is in GWT JSON format.

What this means is that if you want to retrieve the data easily, you have to find a GWT client that is capable of sending GWT RPC calls to a server and then interpreting the returned GWT JSON.

I looked at a few versions, and there is a PHP Server API but not one for the client. So that means, if you use PHP, that either you write an interpreter yourself or use alturnative methods.

Again I would recommend using the login method if you want to bypass the API. But if you're trying to build this into a tool for mass use, then that wont work very well, unless you run a server as a go between (proxy so to speak) which handles the login and data retrieval on behalf of the user.

I know I'm repeating myself but seriously look at loging into Google as a solution, they will hand you the data nicely formatted in CSV.

It took me less than 10 minutes to write the code to do that, but it would take me about a day to reverse the JavaScript and have a working client.

For me, it's a waste of time to chase the JavaScript, but then I might be missing something crucial.

Is there a reason you just want to scrape the keywords without having an account?

Thank you for the explanation! I'll look into it for sure. I'm just doing a lot of volume and I'm afraid they'll block my account if I login and scrape. Anyone else has done that and got away with it?
 
Last edited:
i think it anyway can't show all the data that semrush provides for example
and there are a lot of tools better than keyword tool. but it is only ny opinion :)

Where do all these sites get the data from? I always thought they just scraped Google as well.
 
I'm just doing a lot of volume and I'm afraid they'll block my account if I login and scrape.

I should elaborate, I don't mean use your personal account, I mean create a whole bunch of disposable ones that are only used for scraping.

If they are blocked or terminated, doesn't matter create more. They are free after all. :)

The only problem I see with this method is IP rate limiting, but then that's always an issue with scraping.

Where do all these sites get the data from? I always thought they just scraped Google as well.

They do, they all agregate the data from the same places.
 
I've calculated AdWords API costs and it doesn't seem to be as expensive as I thought. I'll give that a try, and then scrape as an alternative if it doesn't work out well. Or maybe both at the same time :)
 
Looking at the payment structure of the API request it's turns out not to be a straight 1:1. In other words they charge you not only on the request made (which costs you 5 API call credits) but also they charge you on the data returned as well (at a rate of 0.1 API credits per keyword).

It roughly works out to be $25 per 900,000 keywords returned. Not a lot, but can start to add up if you're just browsing keywords or you researching a lot of them.

This I had not seen, everything I saw was 1 API credit per API call, I did find the price list though. That chances everything.