Automatically Extracting Google URLs

Status
Not open for further replies.

schaum

Senior WickedFire Member
Jul 4, 2007
136
2
0
Hey guys, id like to automatically extract just the urls from the SERPs.

for example if i search "blue widgets" i want a tool that will make a text file containing the top 200 urls from the search.

dont have any php experience yet :(
any ideas?


thanks
 


Well, I was gonna say - learn php. That's the best option. If you learn some basics you can build a scraper to do this task in no time.

BUT if you need this now...If you PM me I have a script already written that does this, and I will give you the URL where its hosted so you can use it if you want.
 
I love scraping.

PHP:
<?

if(!$_GET['q']) {
    die("You did not enter a query in the url.");
}

// Setup query, replace spaces with +'s
$query = str_replace(" ", "+", $_GET['q']);

// Grab the SERP's, if page 1 is not a full 100 links, don't grab page 2
$serp1 = getpage("http://www.google.com/search?q=$query&num=100");
preg_match_all("/<a href=\"([^\"]*?)\" class=l/", $serp1, $links1);
if(count($links1[1])==100) {
    $serp2 = getpage("http://www.google.com/search?q=$query&num=100&start=100");
    preg_match_all("/<a href=\"([^\"]*?)\" class=l/", $serp2, $links2);
    $links = array_merge($links1[1], $links2[1]);
} else {
    $links = $links1[1];
}

// Implode link array, save to text file named after the query (Including + signs)
$linklist = implode("\n", $links);
$txt = fopen("$query.txt", "w");
fwrite($txt, $linklist);
fclose($txt);

// Output a link to the text file
echo "<a href='$query.txt'>$query.txt</a>";


// Curl Function
function getpage($url) {
    $useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.5";
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    $page = curl_exec($ch);
    curl_close($ch);
    return $page;
}

?>
It uses cURL, so you must have the cURL module installed on the server you use it on.

I put a live version on my site for testing (It saves the results in a txt, that means it DOES record what you put in. Probably best to search for goat porn at google still. ;))

Code:
http://tritelife.com/wf/txt_google.php?q=Your Query Here
I've been meaning to create something like this for awhile, here's the version I made for myself. It doesn't store the searches (except in the servers logs, so again, quit with the pron). It instead outputs the results and includes the ability to highlight your own site. Additionally, it's pretty and valid. Very nice. :D

Google SERP Checker

Anyway, it was written up pretty quick. If you need any changes, or anyone sees a fault let me know.
 
Status
Not open for further replies.