The WF PHP Functions War Chest

A great function that allows you to sort arrays by the frequency of the words inside the array.

PHP:
function array_sort_by_count($_array,$order) {
  $count_array = array();

  foreach (array_unique($_array) as $value) {
    $count = 0;
  
    foreach ($_array as $element) {
      if ($element == $value) $count++;
    }
  
    $count_array[$value] = $count;
  }
    
    if(strtolower($order) == 'asc') { asort($count_array); } // desc
    elseif(strtolower($order) == 'desc') { arsort($count_array); } // desc

    return $count_array;
}

This little hack has come in handy MANY, MANY times
 


Here's something I use when I have a lot of different files I've either scraped or downloaded from somewhere. Basically, give it a root directory ($dir) and it will traverse it and all subdirs and do stuff with the files based on their extension.

One weakness is that uses string manipulation to find the extension instead of getting the true mime type, but it works for me now, so I haven't needed to change it.

Some of the code is straight out of the function reference and some of it I added/changed myself. I think when I put it together I was also looking to use it on remote sites that do not have DirectoryIndex set up in their apache configs.



PHP:
function do_traverse($dir)
    {
        $mark = timer();
        
        try {
            $iter = new DirectoryIterator($dir);
            walk_dir($iter);        
        }
        
        catch (Exception $e) {
            print_r($e);
        }    
    }


function get_extension($filename)//this really sucks, get finfo class
    {
        $x = explode('.', $filename);
        return '.'.end($x);
    }


function walk_dir(DirectoryIterator $iter, $depth = 0) 
    {        
        $out = str_repeat(' ', 
          ($depth * 5)).$iter->getPathName().'<br />';    
        
        while ($iter->valid()) 
        {
            $node = $iter->current();
            
            if ($node->isDir() && $node->isReadable() && !$node->isDot()) 
            {
                $out .= walk_dir
                  (new DirectoryIterator($node->getPathname()), $depth + 1);
                $d++;
            }
            elseif ($node->isFile())
            {

                $str_path = $node->getPath();
                $str_file = $node->getFilename();
                $out .= str_repeat
                  (' ',((1 + $depth)*5)).$node->getFilename().'<br />';
                $f++;
                $ext = get_extension($str_file);    
                
                //echo 'FILE IS '.$str_file.' PATH IS '.$str_path;    //for debugging                
                //echo 'EXTENSION IS'.$ext;                            //for debugging    
            
                $rel_path = $str_path."/".$str_file;
            
                switch($ext)//do different things with different files
                {
                    case '.txt': {
                        echo 'do stuff with text files'; 
                        break;
                    }
                    case '.htm':
                    case '.html': echo 'do stuff with html files'; break;
                    case '.gif':
                    case '.bmp':
                    case '.png':
                    case '.jpeg':
                    case '.jpg':    echo 'do stuff with images'; break;
                    case '.php': echo 'do stuff with scripts'; break;
                                        
                    default: echo 'unsupported filetype '.$ext.'<br />';
                }    
                
            }//end elseif            
            echo '<br />';            
            $iter->next();      
        }
        return $out;        
    }
 
I want to share something with the community - as I just signed up some days ago :)
(not a big deal but it's what it's coming to my mind as I have seen some geolocation scripts)
I found out that on small VPSs, even 10k hits per day would almost take the server down if you're using GeoIP databases (file system based).

So everytime I have to do some geolocation, I use to do this if I require it to be done in the server side - and if I'm too lazy to setup a proper Geolocation Engine myself, or just don't have the time.

Code:
<?
$country=trim(file_get_contents("http://api.hostip.info/country.php?ip=".$_SERVER['REMOTE_ADDR']));
?>
Could it get any simpler? ;)
Using trim in case they ever send extra "unexpected" spaces. Then I use to do some in_array() with $country and done.

I've not extensively used this (just for about 20k hits per day or so) though I don't think they have any limits around that numbers.

Way faster with cURL, if your server supports it.
 
Updated some code posted here last year by emp - http://www.wickedfire.com/design-de...rgeted-adword-ads-something-i-whipped-up.html

What it does is spiders the Google results for a keyword and scrapes each result page looking for Adsense code. You then know which sites have legitimate traffic and are probably less likely to have fraudulent clicks.

PHP:
<?php

if (isset($_REQUEST['query'])) {
		if (!$_REQUEST['query'] == '') {
			/* Setting the variables 
			A google Query looks like this:    
			so here are the variables I need for my  query:
			http://www.google.com/search?q=MYQUERY&start=MYSTART
			¦-------------------GooglePrefix------¦-----query--¦suffix¦-counter--¦
			*/
			$GooglePrefix = "http://www.google.com/search?q=";
			$query = urlencode($_REQUEST['query']);
			$GoogleCountSuffix ="&start=";
			// Enter any URLs you don't want to crawl into this array - sites that would never have adsense
			$dontcrawlarray = array('wikipedia.org','google.com','amazon.com');
			//-----------------------------
			echo "Results for " . $_REQUEST['query'] . "<br /><br />";
			/* Loop to get the Google result pages 
			While going through the loop, we build the query URL out of the parts and the loop counter 
			The results are stored in the $res variable.
			Basically, we get the complete source code for each result page, and store ALL of them in one looong string.
			*/
			$loop = '';
			while ($loop <= 10)
			{
				$completeURL = $GooglePrefix . $query . $GoogleCountSuffix . $loop;
				$res = $res.webFetcher($completeURL); // we use the function webFetcher to get the page
				$loop = $loop + 10;
			}
			/* Now we use regular expressions to filter the URLs out of the result pages
			For this, the function "do_reg" is called, giving it the complete resultstring and the regular expression.
			The returned value (an array of matches) is stored in $regx
			*/
			$resultURLs = do_reg($res, "/h3.class=r.*(http.*)\"/U");
			
			
			/* Loop through the list of dontcrawlarray domains and remove them before we crawl */
			foreach ($dontcrawlarray as $url) {
				$resultURLs = array_ereg_search($url,$resultURLs);
				}
				
			/* Now we want to fetch all those URLs
			Again, we use a loop for this. Some more explanations in the loop itself.
			*/
				for ($i = 0; $i < count($resultURLs); $i++) //we use the length of the returned array to count.
				{
					$text = $resultURLs[$i]; //$text is set to the item in the result we are at
					$comp = webFetcher($text); //we get the page at the URL
					if (preg_match("/google_ad/", $comp, $matches))
					/* again, we use aregular expression function.
					This time, we are looking for "google_ad", a code snippet that tells us that google ads are used in the page.
					If found, this is true.
					*/
					{
						echo "$text<br />";
					}
				}
			}
	}
	
    function do_reg($text, $regex) //returns all the found matches in an array
    {
        preg_match_all($regex, $text, $regxresult, PREG_PATTERN_ORDER);
        return $regresult = $regxresult[1];
    }
    
    function webFetcher($url)
    {
        /* This does exactly what it is named after - it fetches a page from the web, just give it the URL */
        $crawl = curl_init(); //the curl library is initiated, the following lines set the curl variables
        curl_setopt ($crawl, CURLOPT_URL, $url); //The URL is set
        curl_setopt($crawl, CURLOPT_RETURNTRANSFER, 1); //Tells it to return the results in a variable
        $resulting = $resulting.curl_exec($crawl);  //curl is executed and the results stored in $resulting
        curl_close($crawl);     // closes the curl procedure.
        return $result = $resulting;
    }
	
      function array_ereg_search($val, $array) {
	  /* This removes $val from $array if found - used to remove the dontcrawlarray URLs */
          $return = array();
          foreach($array as $v) {
               if(!eregi($val, $v)) $return[] = $v;
          }
      return $return;
      } 	
	
	
?>


HTML:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Untitled Document</title>
</head>

<body>

 <h2>Find Adsense</h2>
 <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
 <div> Keyword : <input type="text" size="20" maxlength="50" name="query" /> </div>
 <div>
   <input type="submit" name="Submit" value="Go!" />
 </div>
 </form>

</body>
</html>

A few mods -
Added a do not crawl list so you don't waste time crawling URLs like wikipedia or google video/news
Added a keyword box
Google changed from using H2 to H3 for each result URL


The output list is all sites found with Adsense and is ready to be copied into your Adwords account to target those sites.

It can take a while to run since you're going to be scraping 10-20 URLs and some sites are slow. You'll probably want to adjust your php timeout setting in php.ini (or wherever it is).
 
  • Like
Reactions: jeff_p
If google keyword suggest is active against scrapers and will get your ip banned, here's a site I think is a bit safer Free Keyword Suggestion Tool From Wordtracker

I don't know if its search numbers are correct but it will give you some keywords. It does detect scrapers but only if you're using it aggresively, in which case you have to fill in a captcha on the site and you're free to go again. I don't know if its any better than google suggest but I just thought I'd throw it in anyway.

Here's a very basic script I made that returns the words in an array without the numbers:

Code:
function keywords($word)
{
$url = 'http://freekeywords.wordtracker.com/?seed='.$word.'&adult_filter=remove_offensive&suggest=Hit+Me';
$data = file_get_contents($url);

preg_match_all ("/&adult_filter=remove_offensive\">([^`]*?)<\/a><\/td><\/tr>/", $data, $matches);
foreach ($matches[0] as $match) {
preg_match ("/&adult_filter=remove_offensive\">([^`]*?)<\/a><\/td><\/tr>/", $match, $temp);
$keyword = $temp['1'];
$keyword = str_replace(" ", "+", $keyword);
if(isset($words)){
$words[] = $keyword;
}
else{
$words = array ($keyword);
}
}
return $words;
}
 
Weighted Rotator Function

i use this baby all the time, like on every site i do, or at least a variation of it anyway. its a weighted rotator function to be used for rotating urls,tracking ids,prosper landing page ids or basically anything you want. feed in an array get the result, simple. the power is in the fact that you can specify a second parameter to "weight" the results in favour of the first array item in a percentage integer. so set it at 50 for an array of 3 items and item 1 will show 50% of the time and items 2 and 3 will show 25% each. its been tested over 1000's of clicks and is very accurate. well my prosper stats say it is anyway. as always use at your own risk and feel free to modify/use/abuse at will :)

PHP:
function weighted_rotate($items,$weight=''){                              
    if(count($items)>1){
        $rnum = rand(1,100);                           
        if($weight == '') $weight = round(100/count($items));                                         
        if($rnum > $weight) $selected = trim($items[rand(1,count($items)-1)]);
        else $selected = trim($items[0]);
    }else $selected = trim($items[0]);
    return $selected;
}

$array  = array("a","b","c","d","e"); // array of items (item 1 gets the weight)
$result = weighted_rotate($array, 50); // parameter 2 (50 in this case) controls the weight
// $result = weighted_rotate($array); // straight rotate, no weight 
echo $result;
i have a multidimensional array version as well if anyones interested that does things like:

PHP:
$offers[] = array(google.com, 1, '');
$offers[] = array(yahoo.com,  1, '*');
// offers array( url , active, selected for weighting)
and is useful for flogs where you just load all your offers at the top of the page and then select which offers are active and which is the "star" without rewriting the code all the time :) i'll post it if anyones interested.
 
Lightweight Yahoo Suggest Scraper

PHP:
function yahoo_suggest($keyword){
  $scrape = file_get_contents("http://sugg.search.yahoo.com/gossip-us-sayt/?output=yjsonp&nresults=10&l=1&command=".urlencode($keyword));
  preg_match_all("|\[\"([a-zA-Z0-9\s\']+)\",0\]|", $scrape, $matches);
  return $matches[1];
}
print_r(yahoo_suggest("monavie"));
present downfalls - only returns standard english letters and numbers. i really really couldn't be bothered to tweak the regex to catch all ascii characters. i haven't got the patience and its sunday anyway, lol. sorry!

might note i think yahoo are catching on and not returning results on certain words - the keyword online singles for instance doesn't have any results. i believe this is intentional but haven't done much research on it yet.
 
Remove double keywords from a keyword list

PHP:
<html><body>
<?php
  if(isset($_POST["keywordlist"]))
  {
    $keywordlist = $_POST["keywordlist"];
    $keywords = explode("\n",$keywordlist);
    $done = Array();
    $donecount = 0;
    $numkw = count($keywords);
    for($i = 0; $i<$numkw;$i+=1)
    {
      if (!in_array(trim($keywords[$i]),$done))
      {
        $done[$donecount]=trim($keywords[$i]);
        $donecount += 1;
      }
    }
    for($a=0;$a<count($done);$a+=1)
    {
      echo strtolower($done[$a])."<br/>";
    }
  }
  else
  {
  ?>
  <form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
  <table width="100%" border="0">
  <tr><td></td><td><textarea name="keywordlist" cols="40" rows="20"></textarea></td></tr>
  <tr>
      <td> </td>
      <td><input type="submit" class="button" value="Remove Doubles" accesskey="s" /></td>
  </tr>
  </table>
  </form>
  <?php 
  }
  ?>
</body></html>
Could this work with any list and if so how, I am not good with PHP and coding but have some people might like
 
Not sure if someone will have much use for this one, but it scrapes ColourLovers.com using a Palette ID (for example, using '285' will scrape COLOURlovers :: Palette / Sel/Arg/Ale/) and returns up to 5 hex codes used in that palette. Good for creating color schemes on the fly:

PHP:
<?

function curlThis($url) {

  $ch = curl_init();
  $timeout = 1; // set to zero for no timeout
  curl_setopt ($ch, CURLOPT_URL, $url);
  curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
  $file_contents = curl_exec($ch);
  curl_close($ch);
  return $file_contents;
  
}

function scrapeColors($id) {

    $contents = curlThis('http://www.colourlovers.com/pallete/' . $id . '/');
    
    $pattern = '@<strong>HEX:</strong> \#(?P<hex>.*)  @';
    preg_match_all($pattern, $contents, $matches);
    
    return $matches;
    
}

$colors = scrapeColors('9725'); // Insert Palette ID here
foreach ($colors['hex'] as $code) {
    echo "$code<br />";
}

?>
 
^^^ would have been dead hot if you could specify a hex color & scale and it returned an array of colors.. hmmm

nice code anyway Tyler. Would have loved that in my site autogen days!
 
Detect if a user is using a transparent proxy:

Code:
$proxysigs = array(
'HTTP_FORWARDED',
'HTTP_X_FORWARDED_FOR',
'HTTP_CLIENT_IP',
'HTTP_VIA',
'HTTP_XROXY_CONNECTION',
'HTTP_PROXY_CONNECTION'
);
$isProxy = false;
foreach($proxysigs as $sig)
{
	if(isset($_SERVER[$sig]))
	{
		$isProxy = true;
		break;
	}
}
 
  • Like
Reactions: erect
Pull the page title and meta tags (description & keywords) of any webpage:

Code:
<?php 

    $url = "http://www.yoursitehere.com";
    $fp = fopen( $url, ‘r’ );
    $content = “”;
 
    while( !feof( $fp ) ) {
       $buffer = trim( fgets( $fp, 4096 ) );
       $content .= $buffer;
    }
   
    $start = '<title>';
    $end = '<\/title>';
    
    preg_match( "/$start(.*)$end/s", $content, $match );
    $title = $match[ 1 ]; 
    $metatagarray = get_meta_tags( $url );
    $keywords = $metatagarray[ "keywords" ];
    $description = $metatagarray[ "description" ];  

    echo "<div><strong>URL:</strong> $url</div>\n";
    echo "<div><strong>Title:</strong> $title</div>\n";
    echo "<div><strong>Description:</strong> $description</div>\n";
    echo "<div><strong>Keywords:</strong> $keywords</div>\n"; 
?>
 
Damn, guys there is to much quoted colors on this thread for me :1bluewinky: Can't we have a thread about raping pixels instead? This way I could contribute something myself :bootyshake:

Great scripts and shares here, keep em coming :)
 
My faster and cleaner version of maxmind geoip's php web service lookup function:

Code:
function ws_geoip_lookup($ip, $license_key = 'your license key')
{
	$host = 'geoip1.maxmind.com';
	$path = "/f?l=$license_key&i=$ip";
	$fp = fsockopen($host, 80, $errno, $errstr, 1) or die("Can not open connection to geoip server: $errno=$errstr");
	fputs($fp, "GET $path HTTP/1.0\nHost: $host\n\n");
	while(!feof($fp))
		$buf .= fgets($fp, 128);
	$lines = split("\n", $buf);
	$data = $lines[count($lines)-1];
	fclose($fp);
	return explode(',', $data);
}
 
Here's two functions I wrote a long time ago. It was for ranking when to upload videos to YouTube based on quality (you can easily modify it to rank something else). Obviously not every video is as good as the next. So with a ranking algorithm (I have that if you want it), I assigned each video a value and then evenly distributed percentiles amongst the batch of videos. I stored the percentile in $array[$i]['percentile'].

Then I used these two functions to schedule when the videos would be posted. I figured it was best to do a boost of a bunch of good videos first, so I ranked them by putting $m number of high percentile videos at the top, and then evenly distributing the rest. So on one day I might post some good videos and also some shitty videos. Here are the functions:

Code:
<?php
/*
    * scheduler($arr, $n, $boost)
    * given an array with an iterating top index (most arrays),
      schedules a timestamp for each key
    * $arr = array, $n = videos per day, $boost = first day boost
*/
function scheduler($arr, $n, $boost = true)
{
    $times = array();
    $c = time();
    $day = 86400;
    $m = ( ($boost == true) && ($boost >= 25) ) ? (count($arr) - 25) / $n : count($arr) / $n;
    
    if( ($boost == true) && ($boost >= 25) )
    {
        for($i=0;$i<=25;$i++)
                $times[] = $c;
    }
    
    for($i=0;$i<=$m;$i++)
    {
        for($j=1;$j<=$n;$j++)
            if(count($times) < count($arr)) { $times[] = $c + ($i*$day) + ($j*($day/$n)); } else return $times;
    }
    return $times;
}

/*
    * order_percentiles($arr, $n, $m)
    * given an array of format $arr[$k]['percentile'], orders it
    * so that the first $m entries are the highest percentiles, and after
    * that each day contains an entry from each percentile group
    * $arr = array, $n = videos per day, $m = initial # entries
*/
function order_percentiles($arr, $n, $m)
{
    $cn = count($arr);
    $days = ($cn - $m) / $n;
    $return = array();
    for($i=0;$i<=$m;$i++)
        $return[] = $i;
    
    $max_p = $arr[$m+1]['percentile'];
    $locations = array();
    for($i=1;$i<=$max_p;$i++)
    {
        $locations[$i] = array();
        foreach($arr as $k => $v)
        {
            if($arr[$k]['percentile'] == $i)
                $locations[$i][] = $k;
        }
    }
    
    $q = round($max_p / $n);
    $min = array(); $max = array();
    for($i=0;$i<$n;$i++)
        $min[] = (($i*$q) + 1);
    
    for($i=1;$i<$n;$i++)
        $max[] = $i*$q;
    $max[] = $max_p;
    
    for($i=0;$i<$days;$i++)
    {
        for($j=0;$j<$n;$j++)
        {
            for($l=$min[$j];$l<=$max_p;$l++)
            {
                if(count($locations[$l]) > 0)
                {
                    $p = $l;
                    break;                
                }
            }
            
            if(count($locations[$p]) == 0)
            {
                for($l=1;$l<=$max_p;$l++)
                {
                    if(count($locations[$l]) > 0)
                    {
                        $p = $l;
                        break;
                    }
                }
            }
    
            $return[] = $locations[$p][0];
            array_shift($locations[$p]);
            if(count($return) == count($arr))
                break;
        }    
    }

    foreach($return as $k => $r)
        $return[$k] = $arr[$r];
    return $return;
}
?>
 
Also, here's a scraper for Youtube and Stupidvideos, along with the ranking algorithm I described above. I HAVE NOT TESTED THIS CODE IN A YEAR. I'm sure there's some HTML quirks but the base code should give you a good starting point if you want a scraper. You'll need win-'s cURL class which I included.
 

Attachments

  • lib.zip
    9.7 KB · Views: 10
Shit! Is this a wicked thread or what! I feel like parsing it. :stonedsmilie:

Ok, my small contribution. An email parser:

Code:
$res = preg_match_all( "/[a-z0-9]+([_\\.-][a-z0-9]+)*@([a-z0-9]+([\.-][a-z0-9]+)*)+\\.[a-z]{2,}/i", $htmldata, $matches );

if ($res) 
                {   
                    foreach(array_unique($matches[0]) as $email) 
                    {
                        echo $email;   
                    }     
                }
else echo('Email parser fail');
Just get the htmldata with curl (you know the drill) and go fetch some emails. This was not inside a function but you can easily do it if you like.

Cheers
 
This isn't terribly useful, but if you want to stop complete newbs from ripping your landers, clean the output with this:

PHP:
function clean_output()

{

   $html = strtr(ob_get_contents(), array("\t" => "", "\n" => "\n", "\r" => ""));
   $html = strtr(ob_get_contents(), array("\t" => "", "\n" => "", "\r" => ""));
   for ($x = 0; $x < 5; $x++) {
   $html = str_replace("      ", "", $html);
   $html = str_replace("   ", "", $html);
   $html = str_replace(">  <", "><", $html);
   $html = str_replace(">    <", "><", $html);
   $html = str_replace("> <", "><", $html);
   }
   ob_end_clean();
   ob_start("ob_gzhandler");
   echo $html;
}
 
quick and dirty CSV to RSS/XML for WP import...

Code:
<?php
function get_csv($filename, $delim=',')
{
   $row = 0;
   $dump = array();
  
   $f = fopen ($filename,"r");
   $size = filesize($filename)+1;
   while ($data = fgetcsv($f, $size, $delim)) {
       $dump[$row] = $data;
       $row++;
   }
   fclose ($f);
  
   return $dump;
}

$myfile = "123.csv";
$xxx=0;
$mywritefile = "";

$test = get_csv($myfile);


foreach ($test as $mthis) { 
$xxx++;
$hourcount = ($xxx * 12);
$mincount = (12 * $xxx);
$futuredate = mktime(date("h")+$hourcount,date("i")+$mincount,date("s")+$mincount,date("m"),date("d"),date("Y"));

$mypubdate = date("D, d M Y h:i:s A",$futuredate);

$mywritefile .= "<item>\n";
$mywritefile .= "<title>".$mthis[0]."</title>\n";
$mywritefile .= "<category>Your Category Tag</category>\n";
$mywritefile .= "<content:encoded>".$mthis[1]."<br />".$mthis[2]."<br />".$mthis[3]."<br />".$mthis[4]."</content:encoded>\n";
$mywritefile .= "<pubDate>".$mypubdate."</pubDate>\n";
$mywritefile .= "</item>\n\n\n";

}

echo $mywritefile;

?>
then you can view the source and save the output to a xml file that you can import into WP. nothing fancy, but thought it might help one or two ppl...
 
  • Like
Reactions: erect