Has anyone figured out how to scrape new Google Keyword Tool and willing to share? :)

moltar · Jun 18, 2010

It seems like the new tool is returning more relevant results. Has anyone deciphered the calls and returned data yet?

Thanks!

dchuk · Jun 18, 2010

this is relevant to my interests...preferably not Perl though

Rage9 · Jun 18, 2010

Working on a similar project for a client. Here's the thing, the Google API is fucking crazy cheap, $0.25 per 1000 API calls. Why deal with scraping, using proxies and solving captchas when just dishing out for the API is probably way cheaper unless you have a great source of free proxies?

acidie · Jun 18, 2010

Yes (sort of), create an account and export the data as a CSV (they even offer gzip, which is nice). Path of least resistance.

I looked at the JavaScript and was just about to start reversing it (I did actually start, but nothing more than looking at the first function call and the structure of the data returned) and then I realized that Google will just hand the formatted data to you on a plate if you have an account.

There is the issue of Google tracking what you're doing based on the account, but if that really matters then requests can be spread over multiple accounts, IPs, etc.

But for anyone that actually wants to attempt this, from my very and I mean very brief look at the code, it looks like the data is chunked and then reassembled.

So, for example, the data for the term 'cats' is 24,900,000, I assume what Google is doing is chunking the data like [2,4,9,0,0,0,0,0,0] and then resembling it.

As I say, I looked at this for like two seconds, so I could be completely off the mark. Take my assumptions with a grain of salt.

The JavaScript also contains everything that the page needs to render the data, so if you're not careful you can end up chasing functions that change font size or move a div. I suspect that out of the block of code they use, there is only 2 or 3 functions that actually convert the data to something meaningful, the rest is just junk.

acidie · Jun 19, 2010

Rage9 said:
the Google API is fucking crazy cheap, $0.25 per 1000 API calls.

Actually I like this solution better than mine.

Do you know if they will rate limit you if you start banging the absolute crap out of the API? And by absolute crap I mean 1,000 a request a minute, although not over an extended period of time.

I assume they don't care as long as the bills are paid.

Rage9 · Jun 19, 2010

If you where going to scrape it all you would do is watch the header data made with the ajax calls to see what data is passed to where, and what is returned. The returned data is probably just standard XHTML generated by a backend script. That's normally how these things work, it isn't fucking rocket science.

You could, as always, totally avoid all that by using something like uBot.

Anyways as for caps on API usage, no idea.

To get started with the API you need a My Client Center (MCC), go here: Google AdWords - My Client Center

API development token takes 1 - 2 weeks to get approved according to Google. I however was approved in about 3-4 days.

acidie · Jun 19, 2010

*facepalm* I can't believe I missed this the first time I looked at the code, but the sent data is GWT RPC (Google Web Tools) and the returned is in GWT JSON with a JavaScript function call to 'OK'.

There is a php gwt server, but you need a client to pull the data out of adsence and as far as I know there isn't one. So barring rewritting the JavaScript in PHP (assuming you're using PHP) manually, you're fucked.

Personally I wouldn't bother, either use the API or login.

Rage9 said:
Anyways as for caps on API usage, no idea.

To get started with the API you need a My Client Center (MCC), go here: Google AdWords - My Client Center

API development token takes 1 - 2 weeks to get approved according to Google. I however was approved in about 3-4 days.

Cool thanks

BackBanana · Jun 19, 2010

when you perform a a search, the following is posted to: https://adwords.google.com/o/Targeting/g?__u=1000000000&__c=1000000000

Code:

5|1|54|https://adwords.google.com/o/Targeting/|0BC91872D430CA67A5237BB62909C258|_|invoke|3|15s|19y|15n|15o|g|TiAction (Search.SearchInput.KEYWORD_IDEAS.RelatedToKeyword)|ATbciIPQBuWx_WecIzve5Ihro3s:1276948076390|19s|10g|15v|15x|c|h|i|106|17t|110|en_US|ul|12g|11f|17k|11i|KEYWORD|COMPETITION|GLOBAL_MONTHLY_SEARCHES|AVERAGE_TARGETED_MONTHLY_SEARCHES|TARGETED_MONTHLY_SEARCHES|IDEA_TYPE|AD_SHARE|EXTRACTED_FROM_WEBPAGE|SEARCH_SHARE|KEYWORD_CATEGORY|NGRAM_GROUP|12x|19z|hh|US|13a|hu|en|139|17q|bm|13h|16x|17o|bl|banana man|1|2|3|4|1|5|5|6|2016479548|32818055562133504|6|7|1|8|0|9|0|0|10|2|1000000000|0|1000000000|0|11|12|13|1|14|15|16|-4|0|0|0|17|5|0|0|0|0|0|0|0|0|0|18|0|19|1|0|0|0|0|0|0|0|0|0|7|0|0|0|20|21|235|22|0|23|24|50|0|25|0|0|26|0|27|9|28|29|28|30|28|31|28|32|28|33|28|34|28|35|28|36|28|37|27|3|-22|28|38|28|39|27|4|40|41|1|42|43|0|44|41|1|45|46|47|48|49|2|50|51|52|53|-43|54|0|0|0|

I searched for "banana man". I really dont have the time to work this one out lol. Might do later.

moltar · Jun 19, 2010

Rage9 said:
Working on a similar project for a client. Here's the thing, the Google API is fucking crazy cheap, $0.25 per 1000 API calls. Why deal with scraping, using proxies and solving captchas when just dishing out for the API is probably way cheaper unless you have a great source of free proxies?

I am not 100% sure, but last time I looked at the API, you could only query one keyword at a time. While scraping you can do 100 at a time. Is that still true? If I send a query for 100 keywords, will I pay for each one or per request?

acidie said:
Actually I like this solution better than mine.

Do you know if they will rate limit you if you start banging the absolute crap out of the API? And by absolute crap I mean 1,000 a request a minute, although not over an extended period of time.

I assume they don't care as long as the bills are paid.

That's another thing, I know that in the past they did rate limit you.

Rage9 said:
If you where going to scrape it all you would do is watch the header data made with the ajax calls to see what data is passed to where, and what is returned. The returned data is probably just standard XHTML generated by a backend script. That's normally how these things work, it isn't fucking rocket science.

Sounds pretty easy in theory, but in practice, this version of the keyword tool is all messed up. The first one took me 20 minutes to figure out and implement. This one I keep staring and going back and forth with Firebug, and so far it's not looking very good.

acidie said:
*facepalm* I can't believe I missed this the first time I looked at the code, but the sent data is GWT RPC (Google Web Tools) and the returned is in GWT JSON with a JavaScript function call to 'OK'.

Not sure what you mean here. Please elaborate?

Thanks to all who replied.

dchuk · Jun 19, 2010

Seems like if speed isn't a huge concern, automating the browser is the way to go. Without usage limits and if speed is a concern, the API sounds like the best setup.

bigmoneyrob · Jun 20, 2010

you can still use the old tool if you want

https://adwords.google.com/select/KeywordToolExternal?forceLegacy=true

LosAngeles · Jun 20, 2010

If anyone is interested in purchasing a subscription to a script/tool I have please email Sales@BannerBlindness.com.

The tool will allow you to auto scrape the google KW suggestion tool. Put in one root keyword and u will daisy chain out a sick list of KW's

acidie · Jun 21, 2010

moltar said:
I am not 100% sure, but last time I looked at the API, you could only query one keyword at a time. While scraping you can do 100 at a time. Is that still true? If I send a query for 100 keywords, will I pay for each one or per request?

Looking at the payment structure of the API request it's turns out not to be a straight 1:1. In other words they charge you not only on the request made (which costs you 5 API call credits) but also they charge you on the data returned as well (at a rate of 0.1 API credits per keyword).

It roughly works out to be $25 per 900,000 keywords returned. Not a lot, but can start to add up if you're just browsing keywords or you researching a lot of them.

moltar said:
That's another thing, I know that in the past they did rate limit you.

Which if they still do, make the API useless, at least in my opinion.

moltar said:
Not sure what you mean here. Please elaborate?

The data being sent by the JavaScript is in a format called GWT, as a RPC (remote procedure call), which is a serialiation format that Google uses. It was written in Java (not JavaScript) initially, but people have made ports to other languages.

The data being returned is in GWT JSON format.

What this means is that if you want to retrieve the data easily, you have to find a GWT client that is capable of sending GWT RPC calls to a server and then interpreting the returned GWT JSON.

I looked at a few versions, and there is a PHP Server API but not one for the client. So that means, if you use PHP, that either you write an interpreter yourself or use alturnative methods.

Again I would recommend using the login method if you want to bypass the API. But if you're trying to build this into a tool for mass use, then that wont work very well, unless you run a server as a go between (proxy so to speak) which handles the login and data retrieval on behalf of the user.

I know I'm repeating myself but seriously look at loging into Google as a solution, they will hand you the data nicely formatted in CSV.

It took me less than 10 minutes to write the code to do that, but it would take me about a day to reverse the JavaScript and have a working client.

For me, it's a waste of time to chase the JavaScript, but then I might be missing something crucial.

Is there a reason you just want to scrape the keywords without having an account?

Eeola · Jun 21, 2010

i think it anyway can't show all the data that semrush provides for example
and there are a lot of tools better than keyword tool. but it is only ny opinion

gutterseo · Jun 21, 2010

LosAngeles said:
If anyone is interested in purchasing a subscription to a script/tool I have please email Sales@BannerBlindness.com.

The tool will allow you to auto scrape the google KW suggestion tool. Put in one root keyword and u will daisy chain out a sick list of KW's

Go fuck off. People are discussing something completely different here, and what you are selling has been shared free here before.

moltar · Jun 21, 2010

bigmoneyrob said:
you can still use the old tool if you want

https://adwords.google.com/select/KeywordToolExternal?forceLegacy=true

I'm doing that now. I have a feeling they are going to discontinue it soon though. I just want to be proactive and move over to the new tool before they do that. Also, the new tool seems to have better results.

acidie said:
Looking at the payment structure of the API request it's turns out not to be a straight 1:1. In other words they charge you not only on the request made (which costs you 5 API call credits) but also they charge you on the data returned as well (at a rate of 0.1 API credits per keyword).

It roughly works out to be $25 per 900,000 keywords returned. Not a lot, but can start to add up if you're just browsing keywords or you researching a lot of them.

Shit that's even worse than I thought it was. Ya that's no good at all. I have millions of keywords I need to check numbers on.

acidie said:
The data being sent by the JavaScript is in a format called GWT, as a RPC (remote procedure call), which is a serialiation format that Google uses. It was written in Java (not JavaScript) initially, but people have made ports to other languages.

The data being returned is in GWT JSON format.

What this means is that if you want to retrieve the data easily, you have to find a GWT client that is capable of sending GWT RPC calls to a server and then interpreting the returned GWT JSON.

I looked at a few versions, and there is a PHP Server API but not one for the client. So that means, if you use PHP, that either you write an interpreter yourself or use alturnative methods.

Again I would recommend using the login method if you want to bypass the API. But if you're trying to build this into a tool for mass use, then that wont work very well, unless you run a server as a go between (proxy so to speak) which handles the login and data retrieval on behalf of the user.

I know I'm repeating myself but seriously look at loging into Google as a solution, they will hand you the data nicely formatted in CSV.

It took me less than 10 minutes to write the code to do that, but it would take me about a day to reverse the JavaScript and have a working client.

For me, it's a waste of time to chase the JavaScript, but then I might be missing something crucial.

Is there a reason you just want to scrape the keywords without having an account?

Thank you for the explanation! I'll look into it for sure. I'm just doing a lot of volume and I'm afraid they'll block my account if I login and scrape. Anyone else has done that and got away with it?

moltar · Jun 21, 2010

Eeola said:
i think it anyway can't show all the data that semrush provides for example
and there are a lot of tools better than keyword tool. but it is only ny opinion

Where do all these sites get the data from? I always thought they just scraped Google as well.

acidie · Jun 21, 2010

moltar said:
I'm just doing a lot of volume and I'm afraid they'll block my account if I login and scrape.

I should elaborate, I don't mean use your personal account, I mean create a whole bunch of disposable ones that are only used for scraping.

If they are blocked or terminated, doesn't matter create more. They are free after all.

The only problem I see with this method is IP rate limiting, but then that's always an issue with scraping.

moltar said:
Where do all these sites get the data from? I always thought they just scraped Google as well.

They do, they all agregate the data from the same places.

moltar · Jun 21, 2010

I've calculated AdWords API costs and it doesn't seem to be as expensive as I thought. I'll give that a try, and then scrape as an alternative if it doesn't work out well. Or maybe both at the same time

Rage9 · Jun 21, 2010

acidie said:
Looking at the payment structure of the API request it's turns out not to be a straight 1:1. In other words they charge you not only on the request made (which costs you 5 API call credits) but also they charge you on the data returned as well (at a rate of 0.1 API credits per keyword).

It roughly works out to be $25 per 900,000 keywords returned. Not a lot, but can start to add up if you're just browsing keywords or you researching a lot of them.

This I had not seen, everything I saw was 1 API credit per API call, I did find the price list though. That chances everything.

Has anyone figured out how to scrape new Google Keyword Tool and willing to share? :)

Violating Airspace

Senior Botter

Banned

A=A

A=A

Banned

A=A

New member

Violating Airspace

Senior Botter

.

Banned

A=A

New member

&#9644;&#9644;&#9644;&#9644;&#9644;&#9644;&#9644;&

Violating Airspace

Violating Airspace

A=A

Violating Airspace

Banned

▬▬▬▬▬▬▬&