The WF PHP Functions War Chest

radio · Apr 8, 2009

while ($drama) {

$nom = "popcorn";

}

OlegBkk · Apr 8, 2009

Nice code, thanks BackBanana.
btw, how to make it work through proxy?

d03boy · Apr 8, 2009

Ugly colors:

PHP:

function mysql_escape($dirty) {
     return mysql_real_escape_string(magic_quotes_gpc() ? stripslashes($dirty) : $dirty);
}

Not so ugly colors:

Code:

function mysql_escape($dirty) {
     return mysql_real_escape_string(magic_quotes_gpc() ? stripslashes($dirty) : $dirty);
}

stmadeveloper · Apr 8, 2009

If you would like to play with it for a little bit I put it up here:

Black Hat Digest - Helping new webmasters build up their networks.

figure it will be a few minutes before the ip gets banned from scrapping - but at least you can run a keyword or two through.

zany zoroaster · Apr 8, 2009

Here's one that doesn't scrape:

http://www.wickedfire.com/traffic-content/51727-howto-slow-down-content-rotation-auto-sites.html

This code is for when you have a random list of items (using ORDER BY RAND() in MySQL). It lets you pick items at random, but to make it so that the random order doesn't change for a while. i.e. so that the items are random but their order changes once a week.
Probably helpful for making your sites look more organic.

Erect, you rock!

JeremyMorgan · Apr 8, 2009

This thread is an awesome idea.

This is a function to snag a youtube video based on keywords you send as a parameter:

PHP:

<?php

class youtube{
	
	function get_youtubevideo($theproduct){				
		///do youtube stuff	
		$theproductencoded=urlencode($theproduct);		
		$apicall = "http://gdata.youtube.com/feeds/videos?vq=$theproductencoded&max-results=1";		
		//Initialize the Curl session
		$ch = curl_init();		
		//Set curl to return the data instead of printing it to the browser.
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);		
		//Set the URL
		curl_setopt($ch, CURLOPT_URL, $apicall);		
		//Execute the fetch
		$data = curl_exec($ch);		
		//Close the connection
		curl_close($ch);							
		$xml = new SimpleXMLElement($data);					
		// Check to see if the XML response was loaded, else print an error
		$contentcenter.="<div class=\"leftside\">";		
		//show the video if a video was found		
		if (!empty($xml->entry)) {			
		$media = $xml->entry->children('http://search.yahoo.com/mrss/');		
		$vidtitle= $media->group->title;		
		$vid= $media->group->content->attributes()->url;			
		$contentcenter.="<h1>Video for " . str_replace("-"," ",ucwords($theproduct)). "</h1>";					
		$results = '';		
		// If the XML response was loaded, parse it and build links 		
		$vid = str_replace("?v=", "v/", $vid); 		
		// For each SearchResultItem node, build a link and append it to $results		
		$contentcenter.= "<object width=\"425\" height=\"355\">
		<param name=\"movie\" value=\"$vid\"></param>
		<param name=\"wmode\" value=\"transparent\"></param>
		<embed src=\"$vid\" type=\"application/x-shockwave-flash\" wmode=\"transparent\" width=\"425\" height=\"355\"></embed>
		</object>";		
		}
		
		// If there was no XML response, print an error		
		else {			
			//$results = "Dang! Must not have got the XML response!";
			$contentcenter .= "<h2>Error: No video found</h2>";					
		}
		$contentcenter.="</div>";		
		return $contentcenter;		
	}	
}

Usage:

PHP:

<?php
$theproduct = "Ford Mustang";

$videoshow = new youtube();

echo $videoshow->get_youtubevideo($theproduct);

?>

Enjoy!

emp · Apr 8, 2009

Nice, Jeremy.
You can't know how timely that one is for me. Thanks.

Oh and +rep

::emp::

emp · Apr 8, 2009

BackBanana said:
Who said anything about keeping anything? I came here, I shared, it then got posted elsewhere to be developed with credit given to others, and my own credit lost. I'm really not asking for much.
.

You did:

BackBanana said:
I dont think I'll bother in future if what I post gets credited to others in some other thread.

And what is your hangup about "credit"? Hang around long enough and you will notice that all the +rep -rep does not mean shit to the senior members.

I've been around long enough on here and in the internet dev business to see my code walk around the world several times, with credited people hanging left and right. (And yeah, having it in its own thread or somesuch does not help that any) And ya know what? It does not mean shit.
The code was useful for me and if someone changed it to the better, even more so.

::emp::

Right now you are just whining.

pileofcrap · Apr 8, 2009

JeremyMorgan said:

This thread is an awesome idea.

This is a function to snag a youtube video based on keywords you send as a parameter:

PHP:

<?php

class youtube{
    
    function get_youtubevideo($theproduct){                
        ///do youtube stuff    
        $theproductencoded=urlencode($theproduct);        
        $apicall = "http://gdata.youtube.com/feeds/videos?vq=$theproductencoded&max-results=1";        
        //Initialize the Curl session
        $ch = curl_init();        
        //Set curl to return the data instead of printing it to the browser.
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);        
        //Set the URL
        curl_setopt($ch, CURLOPT_URL, $apicall);        
        //Execute the fetch
        $data = curl_exec($ch);        
        //Close the connection
        curl_close($ch);                            
        $xml = new SimpleXMLElement($data);                    
        // Check to see if the XML response was loaded, else print an error
        $contentcenter.="<div class=\"leftside\">";        
        //show the video if a video was found        
        if (!empty($xml->entry)) {            
        $media = $xml->entry->children('http://search.yahoo.com/mrss/');        
        $vidtitle= $media->group->title;        
        $vid= $media->group->content->attributes()->url;            
        $contentcenter.="<h1>Video for " . str_replace("-"," ",ucwords($theproduct)). "</h1>";                    
        $results = '';        
        // If the XML response was loaded, parse it and build links         
        $vid = str_replace("?v=", "v/", $vid);         
        // For each SearchResultItem node, build a link and append it to $results        
        $contentcenter.= "<object width=\"425\" height=\"355\">
        <param name=\"movie\" value=\"$vid\"></param>
        <param name=\"wmode\" value=\"transparent\"></param>
        <embed src=\"$vid\" type=\"application/x-shockwave-flash\" wmode=\"transparent\" width=\"425\" height=\"355\"></embed>
        </object>";        
        }
        
        // If there was no XML response, print an error        
        else {            
            //$results = "Dang! Must not have got the XML response!";
            $contentcenter .= "<h2>Error: No video found</h2>";                    
        }
        $contentcenter.="</div>";        
        return $contentcenter;        
    }    
}

Usage:

PHP:

<?php
$theproduct = "Ford Mustang";

$videoshow = new youtube();

echo $videoshow->get_youtubevideo($theproduct);

?>

Enjoy![/quote]


Does this grab the first one with the tag or is it random?

emp · Apr 9, 2009

An older one of mine.
Gets google pictures.

Code:

<HTML>
<body>
<h2>Google Image Crawler</h2>

    <FORM action ="<?php echo($SERVER['PHP_SELF']); ?>" method ="get">
        <INPUT Type="TEXT" name="query" size="30"/>
        <INPUT type="SUBMIT" value="Get this!"/>
    </FORM>
    
<?php
    // Setting the variables 
    $GooglePrefix = "http://images.google.com/images?q=";
    $query = urlencode($_GET['query']);
    if ($query != NULL)
    {
        echo "Looking for ".$query."<br>";
            $CompleteURL = $GooglePrefix.$query;
        $loop =0;
            echo $CompleteURL;
            $res = $res.webFetcher($CompleteURL); // we use the function webFetcher to get the page
            //echo $res;
        echo "<hr>";
        $resultURLs = do_reg($res, "/,\"http(.*)\",/U");
        //Displaying the images
            for ($i = 0; $i < count($resultURLs); $i++) //we use the length of the returned array to count.
            {
            echo $i."<br>";
                $text = $resultURLs[$i]; //$text is set to the item in the result we are at
                {
                if (!preg_match("/google/", $text, $matches)) 
                    echo "<img src=http".$text."><br>";
                }
            }
        echo "done";
    }
    function do_reg($text, $regex) //returns all the found matches in an array
    {
        preg_match_all($regex, $text, $regxresult, PREG_PATTERN_ORDER);
        return $regresult = $regxresult[1];
    
    }
    function webFetcher($url)
    {
        /* This does exactly what it is named after - it fetches a page from the web, just give it the URL */
        $crawl = curl_init(); //the curl library is initiated, the following lines set the curl variables
        curl_setopt ($crawl, CURLOPT_URL, $url); //The URL is set
        curl_setopt($crawl, CURLOPT_RETURNTRANSFER, 1); //Tells it to return the results in a variable
        $resulting = $resulting.curl_exec($crawl);  //curl is executed and the results stored in $resulting
        curl_close($crawl);     // closes the curl procedure.
        return $result = $resulting;
    }
?>

::emp::

Fat Tom · Apr 9, 2009

Wow this thread kicks ass. im getting better with php but till not really good with it. This is really helping me understand some things ive been working on. Just looking at the code. Thanks guys.

DewChugr · Apr 9, 2009

emp said:

damn... excel

Here ya go, with ; as separator

Code:

#simple to use, just use yourscriptname.php?keywords
if ($_SERVER['QUERY_STRING']!='') {
  gsscrape($_SERVER['QUERY_STRING']);
  foreach ($kw as $keyword) {
  gsscrape($keyword);
  }
}
echo 'Url;Results<br>';
#all results are in array $kw...
foreach($kw as $keyword) {
    if ($keyword !='')
    { 
        $url = 'http://www.google.com/search?q='.urlencode($keyword);
        $html=getHttp($url);
        $abc = str_replace('of about <b>', '', strstr($html, 'of about <b>'));
        echo '<a href="'.$url.'">'.$keyword.'</a>;'.substr($abc, 0, strpos($abc, '</b>')).'<br />';
    }
}
?>

Just replace from #simple .. to the end of the script.
Copy, paste into a .csv, import.

::emp::

Let's have it download the file for us as a csv naming it like keyword-google-reults.csv.

PHP:

<?php
set_time_limit  (600);

function text_between($start,$end,$string) {
  if ($start != '') {$temp = explode($start,$string,2);} else {$temp = array('',$string);}
  $temp = explode($end,$temp[1],2);
  return $temp[0];
}

function getHttp($url)
    { 
        $userAgent = 'Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6';
        
        // make the cURL request to $target_url
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
        curl_setopt($ch, CURLOPT_URL,$url);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_AUTOREFERER, true);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
        curl_setopt($ch, CURLOPT_TIMEOUT, 10);

        $html= curl_exec($ch);
        if (!$html) 
        {
            echo "<br />cURL error number:" .curl_errno($ch);
            echo "<br />cURL error:" . curl_error($ch);
            exit;
        }
        return $html;
    }

function gsscrape($keyword) {
  $keyword=str_replace(" ","+",$keyword);
  global $kw;
    $url='http://clients1.google.com/complete/search?hl=en&q='.$keyword;
    $data = getHttp($url);
    $data=explode('[',$data,3);
    $data=explode('],[',$data[2]);
    foreach($data as $temp) {
        $kw[]= text_between('"','"',$temp);
    }
}

#simple to use, just use yourscriptname.php?keywords
if ($_SERVER['QUERY_STRING']!='') {
  gsscrape($_SERVER['QUERY_STRING']);
  foreach ($kw as $keyword) {
  gsscrape($keyword);
  }
}

#all results are in array $kw...
$csv='';
foreach($kw as $keyword) {

    if ($keyword !='')
    { 
        $url = 'http://www.google.com/search?q='.urlencode($keyword);
        $html=getHttp($url);
        $abc = str_replace('of about <b>', '', strstr($html, 'of about <b>'));
        $csv .= $url.','.$keyword.','.str_replace(',','',substr($abc, 0, strpos($abc, '</b>')))."\r\n";
    }
}

header('Content-type: text/plain');
header("Content-Disposition: attachment; filename=\"".$_SERVER['QUERY_STRING']."-google-results.csv\"; Content-Length: 50000");
echo $csv;

?>

Jizzlobber · Apr 9, 2009

DewChugr said:
Let's have it download the file for us as a csv naming it like keyword-google-reults.csv

Let's!

Icecube · Apr 9, 2009

Remove double keywords from a keyword list

PHP:

<html><body>
<?php
  if(isset($_POST["keywordlist"]))
  {
    $keywordlist = $_POST["keywordlist"];
    $keywords = explode("\n",$keywordlist);
    $done = Array();
    $donecount = 0;
    $numkw = count($keywords);
    for($i = 0; $i<$numkw;$i+=1)
    {
      if (!in_array(trim($keywords[$i]),$done))
      {
        $done[$donecount]=trim($keywords[$i]);
        $donecount += 1;
      }
    }
    for($a=0;$a<count($done);$a+=1)
    {
      echo strtolower($done[$a])."<br/>";
    }
  }
  else
  {
  ?>
  <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
  <table width="100%" border="0">
  <tr><td></td><td><textarea name="keywordlist" cols="40" rows="20"></textarea></td></tr>
  <tr>
      <td> </td>
      <td><input type="submit" class="button" value="Remove Doubles" accesskey="s" /></td>
  </tr>
  </table>
  </form>
  <?php 
  }
  ?>
</body></html>

erect · Apr 9, 2009

OlegBkk said:
Nice code, thanks BackBanana.
btw, how to make it work through proxy?

Utilizing proxies is an important thing to deal with while scraping. If you get ip blocked by a site, your data dies ... plain and simple.

Proxies are used for far more things than simply scraping suggest so I'll submit a generic function that utilizes proxies so you can implement it on any script you need.

On BackBanana's script ported to this thread by emp, you'd just change the getHttp() function to something like below

PHP:

function curl_proxy($url,$proxy) {
  //$useragent= random_useragent();
  $cUrl = curl_init();
  curl_setopt($cUrl, CURLOPT_URL, $url);
  curl_setopt($cUrl, CURLOPT_RETURNTRANSFER, 5);
  curl_setopt($cUrl, CURLOPT_CONNECTTIMEOUT, 5);
  curl_setopt($cUrl, CURLOPT_TIMEOUT, 30);
  //curl_setopt($cUrl, CURLOPT_PROXY, $proxy);
  curl_setopt($cUrl, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
  //curl_setopt($cUrl, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5);
  //curl_setopt($cUrl, CURLOPT_USERAGENT, $useragent);
  curl_setopt($cUrl, CURLOPT_FOLLOWLOCATION, TRUE);               
  $PageContent = curl_exec($cUrl);
  return $PageContent ;
}

A couple of things to note:
1. random_useragent() simply returns the useragent of a browser. I randomize mine (hence the function), but for some things it's not necessary. I left it commented out in the function above (since I didn't include it) but feel free to uncomment that line & the CURLOPT_USERAGENT if you want to use one ... it's necessary sometimes.

2. The function above is easily modified to accept either socks or http proxies. Comment or uncomment CURLOPT_PROXYTYPE to fit your needs.

3. Proxies go up and down regularly. If the script doesn't return anything, it's likely the proxy is down ... try another. Here are a few freebies (thanks MachineControl)

Below is an example of how to use the function.

PHP:

$text = curl_proxy('http://www.wickedfire.com','213.174.113.122:3128') ;
echo $text ;

BTW ::: Admins, the code font is looking much better than it was. Change the green to a yellow or tan and we'll be in business. Thanks for changing this for all of us phpreaks.

radio · Apr 9, 2009

this thread makes me hard. totally agree w/ Fat Tom - nothing like seeing some working code to get a better understanding and generate ideas.

erect · Apr 9, 2009

Now that we've talked about using proxies, lets do a scrape of value ... how about google serps?

First, I'll introduce a new function

PHP:

function extract_links($text) {
  preg_match_all('/<\s*a[^<>]*?href=[\'"]?([^\s<>\'"]*)[\'"]?[^<>]*>(.*?)<\/a>/si',
    $text,
    $match_array,
    PREG_SET_ORDER);
  $return = array() ;
  foreach ($match_array as $serp) {
    $full_anchor = $serp[0];
    $href = $serp[1];
    $anchortext = $serp[2];
    if ( (preg_match("/http:/i",$href)) &&
         (!preg_match("/cache/i",$href)) &&
         (!preg_match("/google.com/i",$href)) &&
         (!preg_match("/youtube.com/i",$href)) &&
         (!preg_match("/wikipedia.org/i",$href)) &&
         ($href[0]!= '/') ) {
      $anchor_array = array($href,$anchortext) ;
      array_push($return,$anchor_array) ;
    }
  }
  
  return $return ;
}

I use that function anytime I'm looking for links on a page. It basically ignores any HTML formatting and just returns all the links on a page in order, including the anchor text used.

By ignoring the formatting, that function cannot break (unless someone takes a page from diggbar) due to html changes.

Some of that preg_match shit is just in place to remove some links from the result. If we're scraping google, I don't want to see cached pages, youtube, wiki or google links because that's really not my competition (IMO).

Here's how to use the script [Make sure and include curl_proxy() above]

PHP:

$text = curl_proxy('http://www.google.com/search?q=angelina+jolie','213.174.113.122:3128') ;
$links = extract_links($text) ;
print_r($links) ;

The results are returned in a multi-dimensional array of the link location & anchor text.

The script is not perfect. For instance, some queries return news, images, video, etc. which bumps the serps down a notch or 3. That's google for you and you'll have to deal with it.

On a side note, I don't use this to scrape yahoo & msn as they have a useful api. I don't mind the extra work when scraping these engines ... at least they throw the programmers a bone worth gnawing on.

Now ... if you're scraping to find your position in the SERPs, you can add &num=100 to the query and set up a counter ( $i ++ ; ). When you find your domain name in the array, save your site's information & ranking (db/txt/output) and move on to the next term.

JeremyMorgan · Apr 9, 2009

pileofcrap said:
Does this grab the first one with the tag or is it random?

It grabs the first video, you can tweak it to grab more. I seem to remember adding randomization in there somehow, if I can find the code, I'll post that as well.

Bofu2U · Apr 9, 2009

It's been a bit since I've used this, but I'm sure someone will find this useful...

Re-ordering sentences in a string for content creation/jumbling:

PHP:

function reorder($article)
{

    $main_explode = explode(".", $article);

        $total_sentences = count($main_explode);
    
    for($x = 0; $x < ($total_sentences * 2); $x++)
    {
    
    $rand1 = rand(0,$total_sentences);
    $rand2 = rand(0,$total_sentences);
    
    $temp = null;
    
    
    $temp = $main_explode[$rand1];
    $main_explode[$rand1] = $main_explode[$rand2];
    $main_explode[$rand2] = $temp;
    
    $temp = null;
    
    }
    
    for($x = 0; $x < count($main_explode); $x++)
    {
    
        $new_article .= trim($main_explode[$x]).". ";
    
    }

    $new_article = str_replace(" .",".",$new_article);
    
    return $new_article;

}

If anyone has WordNet (if not, I just "dropped a nugget") heres a function to get a synonym per word. This will -KILL- your database with huge articles, but oh boy does it work well.

Get syn's from a body of text (strings, paragraphs, etc)

PHP:

function grab_syns( $body )
{

$bigarray = explode(" ", $body);

$final = null;

for($x = 0; $x < count($bigarray); $x++)
{

    if(strlen($bigarray[$x]) > 4)
{

    $tempme = get_syn($bigarray[$x]);
    $final .= "".$tempme." ";

}
else
{
    $final .= $bigarray[$x]." ";
}

}

$final = str_replace(" .", ".", $final);
$final = str_replace(" ,",",",$final);
$final = str_replace(" !","!",$final);
$final = str_replace(" ?","?",$final);

return $final;

}

Get individual per word (required for above)

PHP:

function get_syn($word)
{

$comma = false;
$period = false;
$question = false;
$exclemation = false;

$pos = null;

$pos = strpos($word, ".");

$word = str_replace("'","",$word);

$pos = strpos($word, ".");

if($pos === false)
{

    $word = $word;

}
else
    {
        $period = true;
        $word = str_replace(".", "", $word);
    }

$pos = strpos($word, "?");

if($pos === false)
{

    $word = $word;

}
else
    {
        $question = true;
        $word = str_replace("?", "", $word);
    }
    
    $pos = strpos($word, "!");

if($pos === false)
{

    $word = $word;

}
else
    {
        $exclemation = true;
        $word = str_replace("!", "", $word);
    }
    
    

$pos = null;

$pos = strpos($word, ",");

if($pos === false)
{

        $word = $word;

}
else
        {
                $comma = true;
                $word = str_replace(",", "", $word);
        }

$thissyn = rand(0,2);

$query = "select synsetid, w2.lemma from sense
left join word as w2 on w2.wordid=sense.wordid
where sense.synsetid in
(
select sense.synsetid from word as w1
left join sense on w1.wordid=sense.wordid
where w1.lemma='$word'
)
and w2.lemma<>'$word' LIMIT ".$thissyn.",1";

@$dome = mysql_query($query) or die(mysql_error());

@$asdf = mysql_fetch_array($dome);

if($comma)
    return trim($asdf['lemma']).",";

if($period)
    return trim($asdf['lemma']).".";
    
if($exclemation)
    return trim($asdf['lemma'])."!";
if($question)
    return trim($asdf['lemma'])."?";

if(isset($asdf['lemma']))
return $asdf['lemma'];

else
return $word;

}

Been a while since I've messed with these, but they worked when I used em. Can't see why they wouldn't know. Enjoy

kblessinggr · Apr 9, 2009

geo-ip using MaxMind

Simple way of getting the country code so you can change links and what not accodingly.

geoip.php (or the top of your page)

<?
include("geoip.inc");
$gi = geoip_open("/path/to/data/or/web/root/GeoIP.dat",GEOIP_STANDARD);

$addr = $_SERVER["REMOTE_ADDR"];
$country = geoip_country_code_by_addr($gi, $addr);

geoip_close($gi);

/* 2 Letter Code found here Country Codes */
switch($country)
{
case "US":
$mylink = "http://www.good-ol-usa.com";
break;
case "AU":
$mylink = "http://www.link-to-Australia.com";
break;
default:
$mylink = "http://www.link-to-somewhere-not-defined-above.com";
}

/* You can instead keep the $country active to use in other parts of the site */
unset($country);
?>

you can then do something like <a href="<?=$mylink;?>">Text</a> to have the link set further below.

GeoIP.dat (GeoLite binary format) from MaxMind - GeoLite Country | Open Source IP Address to Country Database

geoip.inc from Index of /download/geoip/api/php

I know there's a way to do the exact same thing above with google's API without needing an API key, but I'll leave that to someone else to post.

The WF PHP Functions War Chest

Unconsciously competent.

New member

80% [---|-]

New member

Señior Member

SEO Master

New member

New member

New member

New member

Like A Boss

Photoshop God

Moist

Up 24h/day

New member

Unconsciously competent.

New member

SEO Master

Automation Specialist

PedoBeard