The WF PHP Functions War Chest



Alright, wickedfire is awesome and I've mostly been quiet - Here's my contribution... It could definitely stand some improvements from the more skilled developers, but it does the job. It pulls wordpress blogs from google with your keywords, parses out the post ID, comment posting URL and randomly selects an author, email, destination URL and comment to post. It reports on the status as it goes and provides a summary at the end.

It's slow, because blogs were rejecting the comments based on not enough time elapsing between initial page pull and comment post, added 15 second sleep to satiate them.

Usage: gatherblogs.php?keyword=acai
 

Attachments

  • gatherblogs.txt
    10.5 KB · Views: 195
Here is an even simpler way to Geo Locate users.... geoPlugin to geolocate your visitors

There API includes multiple ways of querying and they also provide a PHP class...

I don't really care for that method (though some people might), since it depends on the host site being up for it to work. That and not everyone may want the script creator to know of the sites it's being used on. (not to mention if a crap load of people use that service, it may slow down your own site when dealing with large amounts of traffic you're trying to geocode)
 
@Karl Nice to see you chiming in - someone can extend that code to language and city - ie: Spanish speakers in Atlanta Ga USA, French speakers in Montreal CA, etc. My own hacks are too lousy to post.

That'd take a few more lines. Most people just want to know the Country (otherwise they already got a pretty good system for anything more detailed). Also far as language, usually the browser can pass that info to the server so you don't always need geocoding for that. Course I guess when it comes to spiders from the around the world you may wish to present them with content relevant to the region.
 
This thread is an awesome idea.

This is a function to snag a youtube video based on keywords you send as a parameter:

PHP:
<?php

class youtube{
    
    function get_youtubevideo($theproduct){                
        ///do youtube stuff    
        $theproductencoded=urlencode($theproduct);        
        $apicall = "http://gdata.youtube.com/feeds/videos?vq=$theproductencoded&max-results=1";        
        //Initialize the Curl session
        $ch = curl_init();        
        //Set curl to return the data instead of printing it to the browser.
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);        
        //Set the URL
        curl_setopt($ch, CURLOPT_URL, $apicall);        
        //Execute the fetch
        $data = curl_exec($ch);        
        //Close the connection
        curl_close($ch);                            
        $xml = new SimpleXMLElement($data);                    
        // Check to see if the XML response was loaded, else print an error
        $contentcenter.="<div class=\"leftside\">";        
        //show the video if a video was found        
        if (!empty($xml->entry)) {            
        $media = $xml->entry->children('http://search.yahoo.com/mrss/');        
        $vidtitle= $media->group->title;        
        $vid= $media->group->content->attributes()->url;            
        $contentcenter.="<h1>Video for " . str_replace("-"," ",ucwords($theproduct)). "</h1>";                    
        $results = '';        
        // If the XML response was loaded, parse it and build links         
        $vid = str_replace("?v=", "v/", $vid);         
        // For each SearchResultItem node, build a link and append it to $results        
        $contentcenter.= "<object width=\"425\" height=\"355\">
        <param name=\"movie\" value=\"$vid\"></param>
        <param name=\"wmode\" value=\"transparent\"></param>
        <embed src=\"$vid\" type=\"application/x-shockwave-flash\" wmode=\"transparent\" width=\"425\" height=\"355\"></embed>
        </object>";        
        }
        
        // If there was no XML response, print an error        
        else {            
            //$results = "Dang! Must not have got the XML response!";
            $contentcenter .= "<h2>Error: No video found</h2>";                    
        }
        $contentcenter.="</div>";        
        return $contentcenter;        
    }    
}
Usage:

PHP:
<?php
$theproduct = "Ford Mustang";

$videoshow = new youtube();

echo $videoshow->get_youtubevideo($theproduct);

?>

Enjoy![/quote]

thank you ;)

I have added it to a site of mine
 
I'm feeling generous so let's do a little competition.

Here's how it's going to work:
1. You PM me what you want to see feature-wise. For example "I want to scrape Google for no-follow Wordpress blogs". Really anything goes... When you have excellent team, there's really no boundaries what you can do.
2. I'll gather your PMs for a week and then I'll post 3 most popular requests here for everybody to consume.

And if you are wondering what I get out of this? It's really easy actually, I'll get ideas. I have been in this business for so long and I have SO big library of scrapers, scripts etc. so I'm really looking for fresh ideas. And by giving me your ideas you'll get free scripts. I think it's a good deal since you probably wouldn't act on those ideas anyway like I don't act on 99% of my ideas ;)

Not to underestimate you guys in anyway but I'm really surprised if I get fresh ideas so give me your best :) I want to be positively surprised :)
 
It's been a bit since I've used this, but I'm sure someone will find this useful...

Re-ordering sentences in a string for content creation/jumbling:
PHP:
function reorder($article)
{

    $main_explode = explode(".", $article);

        $total_sentences = count($main_explode);
    
    for($x = 0; $x < ($total_sentences * 2); $x++)
    {
    
    $rand1 = rand(0,$total_sentences);
    $rand2 = rand(0,$total_sentences);
    
    $temp = null;
    
    
    $temp = $main_explode[$rand1];
    $main_explode[$rand1] = $main_explode[$rand2];
    $main_explode[$rand2] = $temp;
    
    $temp = null;
    
    }
    
    for($x = 0; $x < count($main_explode); $x++)
    {
    
        $new_article .= trim($main_explode[$x]).". ";
    
    }

    $new_article = str_replace(" .",".",$new_article);
    
    return $new_article;

}
If anyone has WordNet (if not, I just "dropped a nugget") heres a function to get a synonym per word. This will -KILL- your database with huge articles, but oh boy does it work well.

Get syn's from a body of text (strings, paragraphs, etc)
PHP:
function grab_syns( $body )
{

$bigarray = explode(" ", $body);

$final = null;

for($x = 0; $x < count($bigarray); $x++)
{

    if(strlen($bigarray[$x]) > 4)
{

    $tempme = get_syn($bigarray[$x]);
    $final .= "".$tempme." ";

}
else
{
    $final .= $bigarray[$x]." ";
}

}

$final = str_replace(" .", ".", $final);
$final = str_replace(" ,",",",$final);
$final = str_replace(" !","!",$final);
$final = str_replace(" ?","?",$final);

return $final;

}
Get individual per word (required for above)
PHP:
function get_syn($word)
{

$comma = false;
$period = false;
$question = false;
$exclemation = false;

$pos = null;

$pos = strpos($word, ".");

$word = str_replace("'","",$word);

$pos = strpos($word, ".");

if($pos === false)
{

    $word = $word;

}
else
    {
        $period = true;
        $word = str_replace(".", "", $word);
    }

$pos = strpos($word, "?");

if($pos === false)
{

    $word = $word;

}
else
    {
        $question = true;
        $word = str_replace("?", "", $word);
    }
    
    $pos = strpos($word, "!");

if($pos === false)
{

    $word = $word;

}
else
    {
        $exclemation = true;
        $word = str_replace("!", "", $word);
    }
    
    

$pos = null;

$pos = strpos($word, ",");

if($pos === false)
{

        $word = $word;

}
else
        {
                $comma = true;
                $word = str_replace(",", "", $word);
        }

$thissyn = rand(0,2);

$query = "select synsetid, w2.lemma from sense
left join word as w2 on w2.wordid=sense.wordid
where sense.synsetid in
(
select sense.synsetid from word as w1
left join sense on w1.wordid=sense.wordid
where w1.lemma='$word'
)
and w2.lemma<>'$word' LIMIT ".$thissyn.",1";

@$dome = mysql_query($query) or die(mysql_error());

@$asdf = mysql_fetch_array($dome);

if($comma)
    return trim($asdf['lemma']).",";

if($period)
    return trim($asdf['lemma']).".";
    
if($exclemation)
    return trim($asdf['lemma'])."!";
if($question)
    return trim($asdf['lemma'])."?";

if(isset($asdf['lemma']))
return $asdf['lemma'];

else
return $word;

}
Been a while since I've messed with these, but they worked when I used em. Can't see why they wouldn't know. Enjoy :)

that's some hot shit man, +rep
do you think it's normal that it takes about 5 minutes to rewrite a 600-700 words article ( using wordnet ) ?
I haven't tested it a lot, just a couple tests
the db seems to have indexes so I wouldn't know how to tweak it

I use an AMD 4600+ 2 gigs ram
 
i just read threw a lot of this again, and wow i understand it a lot better now, im really hard right. thanks guys
 
that's some hot shit man, +rep
do you think it's normal that it takes about 5 minutes to rewrite a 600-700 words article ( using wordnet ) ?
I haven't tested it a lot, just a couple tests
the db seems to have indexes so I wouldn't know how to tweak it

I use an AMD 4600+ 2 gigs ram

Yeah that's the only problem, because the SQL's are so complex it takes a while to run it. The main thing I used this for was doing 100 runs of it, then like 1000 markov's of each. There you have 100 * 1000 (100,000) articles out of one and they -should- all pass as unique. :)
 
Definitely man. It took me a few hours to even find someone that remotely knew what they were talking about so I could find that query, had NO IDEA otherwise.

Yeah, the academic sites have the good stuff- here's a couple similar items you can use to make databases:

University of South Florida Free Association Norms (good for finding related keywords)

eturner - omcsnet-wnlg (an expansion of WordNet)

That's the kind of stuff that forces you to use good database design, since you could potentially be working with massive amounts of data.
 
If you want to speed up the wordnet db lookups in a rewriter... try checking out the lucene index that was compiled (and actually listed/mentioned on their site too). Takes a while to figure out but it speeds up the calls 'slightly'.. but in this case any speed increase without having to sacrifice more CPU usage is a good thing
direct download link here (with all readme/install files):
http://eden.dei.uc.pt/~nseco/lwwn.tar.gz

It's worth a look.

If you want to put wordnet on steroids, throw that sucker in a Sedna db and use xquery calls... here's an example using a 24gig mirror of wikipedia..
http://demiurg.dyndns.org:9999/
(it's a safe link, don't believe me? just google wikiXMLDB.. that's the site with even a sandbox allowing u to toy around with their EC2 hosted demo)

Sedna: Sedna XML Database

That's a project that i've been putting off for a long time now... kinda a headache but i guarantee there'd be a shit-ton of folks who'd be more than happy to pay for hosting/bandwidth/your amazon cloud account if you were able to convert the sucker.
 
I didn't write the following but its bound to be useful for those always looking for proxies

This is a function to allow PHP to utilize the anonymous Tor service (the drawback is you must install Tor on the server, so a no-go for you shared hosting folks)

Code:
<?phpfunction tor_wrapper($url){
$ua = array('Mozilla','Opera','Microsoft Internet Explorer','ia_archiver');
$op = array('Windows','Windows XP','Linux','Windows NT','Windows 2000','OSX');
$agent = $ua[rand(0,3)].'/'.rand(1,8).'.'.rand(0,9).' ('.$op[rand(0,5)].' '.rand(1,7).'.'.rand(0,9).'; en-US;)';

# Tor address & port
$tor = '127.0.0.1:9050';
# set a timeout.
        $timeout = '300';
        $ack = curl_init();
         curl_setopt ($ack, CURLOPT_PROXY, $tor);
         curl_setopt ($ack, CURLOPT_URL, $url);
         curl_setopt ($ack, CURLOPT_HEADER, 1);
         curl_setopt ($ack, CURLOPT_USERAGENT, $agent);
         curl_setopt ($ack, CURLOPT_RETURNTRANSFER, 1);
         curl_setopt ($ack, CURLOPT_FOLLOWLOCATION, 1);
         curl_setopt ($ack, CURLOPT_TIMEOUT, $timeout);
        $syn = curl_exec($ack);
        # $info = curl_getinfo($ack);
        curl_close($ack);
        # $info['http_code'];
   return $syn;}
        # example:
        $wrapped = tor_wrapper("http://www.sillysite.com?page=1' OR 1=1");
        echo $wrapped;?>
Stolen from 0×000000.com
 
I didn't write the following but its bound to be useful for those always looking for proxies

This is a function to allow PHP to utilize the anonymous Tor service (the drawback is you must install Tor on the server, so a no-go for you shared hosting folks)

Code:
<?phpfunction tor_wrapper($url){
$ua = array('Mozilla','Opera','Microsoft Internet Explorer','ia_archiver');
$op = array('Windows','Windows XP','Linux','Windows NT','Windows 2000','OSX');
$agent = $ua[rand(0,3)].'/'.rand(1,8).'.'.rand(0,9).' ('.$op[rand(0,5)].' '.rand(1,7).'.'.rand(0,9).'; en-US;)';
 
# Tor address & port
$tor = '127.0.0.1:9050';
# set a timeout.
        $timeout = '300';
        $ack = curl_init();
         curl_setopt ($ack, CURLOPT_PROXY, $tor);
         curl_setopt ($ack, CURLOPT_URL, $url);
         curl_setopt ($ack, CURLOPT_HEADER, 1);
         curl_setopt ($ack, CURLOPT_USERAGENT, $agent);
         curl_setopt ($ack, CURLOPT_RETURNTRANSFER, 1);
         curl_setopt ($ack, CURLOPT_FOLLOWLOCATION, 1);
         curl_setopt ($ack, CURLOPT_TIMEOUT, $timeout);
        $syn = curl_exec($ack);
        # $info = curl_getinfo($ack);
        curl_close($ack);
        # $info['http_code'];
   return $syn;}
        # example:
        $wrapped = tor_wrapper("http://www.sillysite.com?page=1' OR 1=1");
        echo $wrapped;?>
Stolen from 0×000000.com






if you use tor you might find this usefull .. it forces tor to load a new identity/proxy good if you are scraping a website and you dont wanna keep using the same ip..

i dont use tor anymore so im not sure if it still works

Code:
function tor_new_identity($tor_ip='127.0.0.1', $control_port='9051', $auth_code=''){
$fp = fsockopen($tor_ip, $control_port, $errno, $errstr, 30);
if (!$fp) return false; //can't connect to the control port
 
fputs($fp, "AUTHENTICATE $auth_code\r\n");
$response = fread($fp, 1024);
list($code, $text) = explode(' ', $response, 2);
if ($code != '250') return false; //authentication failed
 
//send the request to for new identity
fputs($fp, "signal NEWNYM\r\n");
$response = fread($fp, 1024);
list($code, $text) = explode(' ', $response, 2);
if ($code != '250') return false; //signal failed
 
fclose($fp);
return true;
}

:)
 
if you use tor you might find this usefull .. it forces tor to load a new identity/proxy good if you are scraping a website and you dont wanna keep using the same ip..

i dont use tor anymore so im not sure if it still works

Thats awesome, I was wondering how that would be possible after I posted. Now if only Tor wasn't so fucking slow.
 
Google Suggest Scraper. Current version will just print the array of the results and the hits.
PHP:
<?php 
    if(!isset($_GET['keyword']))
    {
        die("No Keyword");
    }
    $keyword=$_GET['keyword'];
    $keywordList=scrapeResults($keyword,"keywords");
    $hitList=scrapeResults($keyword,"hits");
    for($i=0; $i<sizeof($keywordList); $i++)
    {
        echo $keywordList[$i]." - ".$hitList[$i]."<br>";
    }



    function scrapeResults($keyword, $retType)
    {

        $ch=curl_init();
        $useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1"; 
        curl_setopt($ch, CURLOPT_USERAGENT, $useragent); 
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
        curl_setopt($ch, CURLOPT_RETURNTRANSFER  ,1);
        curl_setopt($ch, CURLOPT_URL,"http://clients1.google.com/complete/search?hl=en&q=".urlencode($keyword));        
        curl_setopt($ch, CURLOPT_REFERER,"http://www.google.com/webhp?complete=1&hl=en");
        $data=curl_exec($ch);
        curl_close($ch);
        $data=substr($data, strpos($data,"[[")+1);
        preg_match_all("/\[\"(.*?)\",\"(.*?) results/si", $data, $out);
        if($retType=="all")
        {
            return($out);
        }
        else if($retType=="keywords")
        {
            return($out[1]);
        }
        else if($retType=="hits")
        {
            return($out[2]);
        }
    }
   
?>