Google AJAX Search APIs - pointers?

Status
Not open for further replies.

BrooklynBlue

New member
Dec 14, 2007
321
4
0
I'm not much of a coder, but savvy enough to whip up crude tools here and there when I need them.

Right now, I've got the innards of something in place that hits up Google's search APIs from server-side.

Everything looks to be working well, now it's just a matter of glueing it all together, plugging it into wordpress, and pushing the automate button.

I don't have much practice with scraping/auto-posting with my own tools, particularly from the big G, so I'm curious what stumbling blocks I should anticipate or safeguards I should put in place - particularly in terms of getting banned/blacklisted. I don't have any throwaway shell accounts to work with right now, so I'd like to have some sense of thresholds to honor up front.

Words of advice ?
 


Well I know you have to have two things to use the search API
1) A website
2) An api key associated with that website.

If for whatever reason google notices your api key is doing some heavy searches and is suspected of scraping you may get the api key revoked and the project no longer works. And you'd need a new API key for each domain the service will be accessed from. (it won't work without an api key)

So I guess the only thing I can think of, is to cache your results, and don't do them too often.
 
At least on the Search services and Translate, the API key is optional. I can run simple command-line queries using curl no problem. Adwords however does require the key.
 
Any particular reason you're using their ajax api instead of just scraping the serps? If you're actually trying to get the data that google returns, you might have an easier time just doing a regular scrape.

Either way, you'll want to do everything you can to avoid leaving a "signature" that can be tracked to you.

-Don't query too often - sleep() for a variable amount of time between calls if you're doing it frequently.
-Don't use the same search terms - have a random query list and randomly pick some of them to hide what you're doing.
-If you do use the api, try and get a few more keys. Randomize key use.
Switch up your user-agent every call.
-If you're going heavy duty, find some proxies and make the call come from different ips. When I would scrape the adwords tool or overture, I used tor and privoxy and every call it would route it through there so it looked like it always came from a unique IP. This is very slow though.

The gist of it is figure out what data you pass onto G when you request data, and change as much of it as you can.
 
Status
Not open for further replies.