Scraper advice.

Status
Not open for further replies.

Jondoe0069

New member
Mar 21, 2007
1,317
19
0
Estados Unidos
Does anyone know of a good scraper tutorial or easy to use/learn-to-modify tool or script?

I need to scrape a site for keywords. They are all in well organized tables, but each category is on a different page. Each <td> is tagged so scraping the site to get all data that's in <td class="title"> shouldn't be that hard. Right?

Any advice/ direction is appreciated. Thanks!
 


Learn the following class:

Code:
[B]class[/B] bot
{
function setup()
{
[COLOR=blue]$cookieJar[/COLOR] = [COLOR=green]‘cookies.txt’[/COLOR];
curl_setopt([COLOR=blue]$this[/COLOR]->curl,CURLOPT_COOKIEJAR, [COLOR=blue]$cookieJar[/COLOR]);
curl_setopt([COLOR=blue]$this[/COLOR]->curl,CURLOPT_COOKIEFILE, [COLOR=blue]$cookieJar[/COLOR]);
curl_setopt([COLOR=blue]$this[/COLOR]->curl,CURLOPT_AUTOREFERER,[B]true[/B]);
curl_setopt([COLOR=blue]$this[/COLOR]->curl,CURLOPT_FOLLOWLOCATION,[B]true[/B]);
curl_setopt([COLOR=blue]$this[/COLOR]->curl,CURLOPT_RETURNTRANSFER, [B]true[/B]);
}

function get([COLOR=blue]$url[/COLOR])
{
[COLOR=blue]$this[/COLOR]->curl = curl_init([COLOR=blue]$url[/COLOR]);
[COLOR=blue]$this[/COLOR]->setup();
return [COLOR=blue]$this[/COLOR]->request();
}

function request()
{
return curl_exec([COLOR=blue]$this[/COLOR]->curl);
}
}

To use it just do something like this:

Code:
[COLOR=blue]$bot[/COLOR] = [B]new[/B] bot();
[COLOR=blue]$feeders[/COLOR] = [COLOR=blue]$bot[/COLOR]->get([COLOR=green]‘SOME WEBSITE’[/COLOR]);

Then play with things, scraping pages is easy as you can tell, its what you do with the content in those pages is when things get tricky.
 
Status
Not open for further replies.