Google and rate limiting

NathanRidley

New member
Jul 26, 2008
189
1
0
London, UK
I've found that when scraping data from Google, they use fancy rate limiting algorithms that ban your IP address in progressively-increasing intervals the more you violate their rate limits.

To avoid hitting the rate limits (or at least to make it more efficient), I rotate proxies in and out of a pool. If a proxy was working and then gets rejected by Google, I rotate it out and let it cool off for a short time. When it comes out of that cool off period, if it still gets rejected by Google, I put it back in cooldown, but for a progressively longer time for each failure it gets. I've found that, with a decent number of proxies, this yields a fairly decent number of possible daily hits on Google from my proxies.

Also, rotating the user agent that I send in the headers seems to help lengthen the time until proxies start getting rejected.

Does anyone else have any tips to maximise how much they get out of their proxies before Google's rate limiting comes into effect?
 


get more proxies and use a longer delay between standard hits and you'll never get rate limited in the first place. If you do for some reason, your method is the standard when dealing with rate limiting.
 
The old "get more proxies" suggestion is a good one, though I have 500 and it's still not enough. Pity IP address blocks are such hot property...
 
If you're scraping other sites too, instead of rotating the proxies, rotate the sites you're scraping. This way, you're still always scraping something, but there would be more time in between requests to a given site
 
Also queries such as inurl and that contain 'powered by' gets your ip's blocked a lot quicker than normal queries.
 
If you're scraping other sites too, instead of rotating the proxies, rotate the sites you're scraping. This way, you're still always scraping something, but there would be more time in between requests to a given site

that's what I do, I store a timestamp for each site/service I hit, that way I can hit google, yahoo, delicious, etc in quick succession rather than just using a global sleep
 
Hello

The old "get more proxies" suggestion is a good one, though I have 500 and it's still not enough. Pity IP address blocks are such hot property...
 
if google blocks the ip, just enter human verification code to unlock the algorithm ..
 
Google's rate limiting is so strict that I've been limited when doing manual searches before! I think they need to loosen up their rules a little. I know they need to limit scrapers, but they have gone a little bit too far.
 
Google's rate limiting is so strict that I've been limited when doing manual searches before! I think they need to loosen up their rules a little. I know they need to limit scrapers, but they have gone a little bit too far.

why? they don't make money off of scrapers, they make money from humans clicking ads.

The golden rule is: "Google owes us nothing"