Need Help Scraping KD

Status
Not open for further replies.

eliquid

Serpwoo.com
May 10, 2007
7,207
205
63
A/B Testing
Cant seem to get this to work 100%... i can grab the image and id and submit it, but cant get to the result page, any help would be great..

Code:
<base href="http://www.keyworddiscovery.com/" />
<?php
@$submitted = $_POST['submitted'];
@$id = $_POST['id'];
@$token = $_POST['token'];


if(!$submitted){

//Keyword Discovery First 100 results
print "<br /><br /><b>Results from Keyword Discovery <a href='http://www.keyworddiscovery.com/search.html?query=louisville'>".$_POST['query']."</a></b><br />";

$signuppage = "http://www.keyworddiscovery.com/search.html?query=louisville";
$socket = curl_init();
CURL_SETOPT($socket, CURLOPT_URL, $signuppage);
CURL_SETOPT($socket, CURLOPT_SSL_VERIFYPEER, FALSE);
CURL_SETOPT($socket, CURLOPT_SSL_VERIFYHOST, 2);
CURL_SETOPT($socket, CURLOPT_HEADER, 1); 
CURL_SETOPT($socket, CURLOPT_REFERER, "http://www.keyworddiscovery.com/search.html");
CURL_SETOPT($socket, CURLOPT_FOLLOWLOCATION, 1);
CURL_SETOPT($socket, CURLOPT_RETURNTRANSFER, 1);
CURL_SETOPT($socket, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
CURL_SETOPT($socket, CURLOPT_COOKIEJAR, "cookies.txt");
CURL_SETOPT($socket, CURLOPT_COOKIEFILE, "cookies.txt");
$data = curl_exec($socket);

//Grab the id
preg_match('#id\" value\=\"([0-9]+?)\"#', $data, $output);
$id = $output[1];


//Grab the captcha image
preg_match('/captcha\/([a-z0-9\.]+?)\"/', $data, $output);
$token = $output[1];



echo "<form method='post' name=\"search_form\" action=''>";
echo "<input type='hidden' name='submitted' value='1'>";


echo "<div class='gmail'>
<table>
<tr><td>Image:</td><td><img src='http://www.keyworddiscovery.com/img/captcha/".$token."'>
<tr><td>Captcha:</td><td><input type='text' name='security_code'></td></tr>
<input type='hidden' name='id' value='".$id."'>
<tr><td> </td><td><input type='submit' value='submit'></td></tr>
</div>";



}else{
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
//$header[] = "Pragma: "; // browsers keep this blank.
$header[] = "Content-Type: application/x-www-form-urlencoded";
  
  
$socket = curl_init();
CURL_SETOPT($socket, CURLOPT_URL, "http://www.keyworddiscovery.com/search.html?query=louisville");
CURL_SETOPT($socket, CURLOPT_FOLLOWLOCATION, 1);
CURL_SETOPT($socket, CURLOPT_RETURNTRANSFER, 1);
CURL_SETOPT($socket, CURLOPT_HEADER, 1);
curl_setopt($socket, CURLOPT_HTTPHEADER, $header);
CURL_SETOPT($socket, CURLOPT_POST, 1);
CURL_SETOPT($socket, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11");
CURL_SETOPT($socket, CURLOPT_COOKIEJAR, "cookies.txt");
CURL_SETOPT($socket, CURLOPT_COOKIEFILE, "cookies.txt");
CURL_SETOPT($socket, CURLOPT_POSTFIELDS, "query=louisville&submit=submit&security_code=".$token."&id=".$id."");
curl_setopt($socket, CURLOPT_REFERER, 'http://www.keyworddiscovery.com/search.html?query=louisville');
curl_setopt($socket, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($socket, CURLOPT_AUTOREFERER, true);
$data1 = curl_exec($socket);

echo $data1;


}


?>
 


If you are manually solving the captcha anyways, depending on your javascript kungfoo you could always write a greasemonkey script to scrape the content for you.
 
Status
Not open for further replies.