I've one my fair share of scraping since Bofu started imparting some of his PHP knowledge on my dumb ass. However, I never had to scrape a site like Google that will hit you with a captcha if your behavior does not seem... human.
How do you deal with that? I'm already spoofing my user agent but I know that's not enough. I could add some random sleep() times into the script if that would help. It would slow things down but I can live with that. What I really don't want to have to do is use a proxy for each new page.
And if it matters, I'm using file_get_contents, not cURL.
How do you deal with that? I'm already spoofing my user agent but I know that's not enough. I could add some random sleep() times into the script if that would help. It would slow things down but I can live with that. What I really don't want to have to do is use a proxy for each new page.
And if it matters, I'm using file_get_contents, not cURL.