PHP proxy checker using curl_multi

Rage9

Banned
Jan 7, 2008
6,062
101
0
Been using mattseh's proxy finder lately, after lengthy testing you get lots of dead or not anonymous proxies. Testing these one by one with back to back curl calls is very time inefficient. So I've wrote a nice little function that uses the curl_multi functions to run the calls simultaneously.

I setup a simple script on another server that simply echos 'Hi'. This script will query a url and test the output vs 'Hi' (of course you can change it to whatever you like).

I'm hung over as fuck, have fun.

Code:
<?php
function checkProxies($proxies){
    //$proxies is an array of proxies in format ip:port,username:password
    $url = 'http://someplace.tld/testphp.php'; //url to query
    $return = 'Hi'; //expected reply

    $count = count($proxies); //number of items in array
    echo 'Number of proxies in list: ' . $count . '<br />';

    $curl_arr = array();
    $master = curl_multi_init(); //create multi curl resource

    for($i = 0; $i < $count; $i++) {
        $proxy = $proxies[$i]; //grab proxy from array
        $curl_arr[$i] = curl_init(); // create new curl resource
        curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, TRUE); //return the data don't output it outright
        curl_setopt($curl_arr[$i], CURLOPT_HEADER, FALSE); //do not output the header info
        curl_setopt($curl_arr[$i], CURLOPT_URL, $url); //set our url to query
        curl_setopt($curl_arr[$i], CURLOPT_CONNECTTIMEOUT, 10); //set how long we'll give the proxy to respond in seconds in this instance 10 seconds
        curl_setopt($curl_arr[$i], CURLOPT_HTTPPROXYTUNNEL, TRUE); //tunnel through the proxy

        $cproxy = explode(',', $proxy); //split the proxy into an array $cproxy[0] will be ip:port $cproxy[1] will be username:password
        curl_setopt($curl_arr[$i], CURLOPT_PROXY, $cproxy[0]); //set our proxy ip:port

        if($cproxy[1]) { //test for username pass
            curl_setopt($curl_arr[$i], CURLOPT_PROXYUSERPWD, $cprosy[1]); //set username:password
        }
        curl_multi_add_handle($master, $curl_arr[$i]); //add the current curl resource handle to the master
    }

        $running = null;
        do {
            curl_multi_exec($master,$running); //while there are running connections just keep looping
        } while($running > 0);

        echo 'Results: <br />';
        $a = 0; //output array counter
        for($i = 0; $i < $count; $i++) {
            $rawdata = curl_multi_getcontent($curl_arr[$i]); //get returned data from curl handle

            if($rawdata == $return){ //check the data returned vs what we expect
                echo $i . '. Good Proxy: ' . $proxies[$i] . '<br /><br />';
                $proxylist[$a] = $proxies[$i]; //it's a good proxy add it to our list
                $a++;
            } else echo $i . '. Bad Proxy: ' . $proxies[$i] . '<br /><br />';
        }
        echo 'Number of good proxies: ' . count($proxylist);

        curl_multi_close($master); //destroy the multi curl resource
        return $proxylist; //return an array of usable proxies

}

//start of main code
$proxies = file('proxies.txt'); //loads a file into an array each line being a new element 
$proxies = checkProxies($proxies); //$proxies will be a returned array of usable proxies

?>
 


Don't see how your script checks for anonymity of the proxies though. For that you want to be checking in the headers, you want your script on the server echoing it out and then parsing that.
 
Don't see how your script checks for anonymity of the proxies though. For that you want to be checking in the headers, you want your script on the server echoing it out and then parsing that.

That's a good idea, I'll see about adding that.
 
Just so you know mattseh's tool already does this, thats what the judges.txt file is for. If you want to host your own judge upload this to your server:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>AZ Environment variables 1.04</title>
</head>
<body>
<pre>
<?php
##########################################################################
#
# AZ Environment variables 1.04 © 2004 AZ
# Civil Liberties Advocacy Network
# http://clan.cyaccess.com http://clanforum.cyaccess.com
#
# AZenv is written in PHP & Perl. It is coded to be simple,
# fast and have negligible load on the server.
# AZenv is primarily aimed for programs using external scripts to
# verify the passed Environment variables.
# Only the absolutely necessary parameters are included.
# AZenv is free software; you can use and redistribute it freely.
# Please do not remove the copyright information.
#
##########################################################################

foreach ($_SERVER as $header => $value )
{ if (strpos($header , 'REMOTE')!== false || strpos($header , 'HTTP')!== false ||
strpos($header , 'REQUEST')!== false) {echo $header.' = '.$value."\n"; } }
?>
</pre>
</body>
</html>

And then include it as the only url in the judges.txt file. YOu can also lower the minutesperupdate in the config.txt file to specify how often it queries the proxies against the judge.
 
you know, learning what the options in the config file do would be nice.. I didnt know the minutesperupdate was the judge thing.. i thought it was the delay in getting new proxies.... maybe im missing something here.
 
Just so you know mattseh's tool already does this, thats what the judges.txt file is for. If you want to host your own judge upload this to your server

Yeah I just re-read a message he sent me. It would seem that it does check the header to see if it's anonymous. Allegedly.

I say allegedly because when using the list, as fresh as I can from the output, the amount of usable proxies is sooo few vs the list size, as in no better than ~5%. It doesn't seem to get that much better over say an hour long run. Makes me wonder if he's really getting any data back from the judges.txt server.

Anyways that's my 2 cents. I'll still do a little write up on how to do it in PHP because you never know when it'll come in handy you know.
 
Please send me some example proxies that are allegedly (see what I did there?) returning crap, I'll investigate. I use the proxies my tool generates, yes some fail, but with retries and a decent number of threaded connections, shit gets done, who cares if a few are dead. I plan on a post fully explaining all config.txt options soon.
 
For those wanting anonymous checking here you go:

Target server script:
Code:
<?php
//proxy levels
//Level 3 Elite Proxy, connection looks like a regular client
//Level 2 Anonymous Proxy, no ip is forworded but target site could still tell it's a proxy
//Level 1 Transparent Proxy, ip is forworded and target site would be able to tell it's a proxy

if(!$_SERVER['HTTP_X_FORWARDED_FOR'] && !$_SERVER['HTTP_VIA'] && !$_SERVER['HTTP_PROXY_CONNECTION']){
    echo '3';
} elseif(!$_SERVER['HTTP_X_FORWARDED_FOR']){
    echo '2';
} else echo '1';
?>
Proxy checking script (remove the echo statements for no output):
Code:
<?php
function checkProxies($proxies){
    //$proxies is an array of proxies in format ip:port,username:password
    $url = 'http://www.url.tld/proxycheck.php'; //url to query

    $count = count($proxies); //number of items in array
    echo 'Number of proxies in list: ' . $count . '<br />';

    $curl_arr = array();
    $master = curl_multi_init(); //create multi curl resource

    for($i = 0; $i < $count; $i++) {
        $proxy = $proxies[$i]; //grab proxy from array
        $curl_arr[$i] = curl_init(); // create new curl resource
        curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, TRUE); //return the data don't output it outright
        curl_setopt($curl_arr[$i], CURLOPT_HEADER, FALSE); //do not output the header info
        curl_setopt($curl_arr[$i], CURLOPT_URL, $url); //set our url to query
        curl_setopt($curl_arr[$i], CURLOPT_CONNECTTIMEOUT, 10); //set how long we'll give the proxy to respond in seconds in this instance 10 seconds
        curl_setopt($curl_arr[$i], CURLOPT_HTTPPROXYTUNNEL, TRUE); //tunnel through the proxy

        $cproxy = explode(',', $proxy); //split the proxy into an array $cproxy[0] will be ip:port $cproxy[1] will be username:password
        curl_setopt($curl_arr[$i], CURLOPT_PROXY, $cproxy[0]); //set our proxy ip:port

        if($cproxy[1]) { //test for username pass
            curl_setopt($curl_arr[$i], CURLOPT_PROXYUSERPWD, $cprosy[1]); //set username:password
        }
        curl_multi_add_handle($master, $curl_arr[$i]); //add the current curl resource handle to the master
    }

        $running = null;
        do {
            curl_multi_exec($master,$running); //while there are running connections just keep looping
        } while($running > 0);

        echo 'Results: <br />';
        $a = 0; //output array counter
        for($i = 0; $i < $count; $i++) {
            $rawdata = curl_multi_getcontent($curl_arr[$i]); //get returned data from curl handle
            if($rawdata == '3'){
                //process elite proxy
                echo 'Elite Proxy found: ' . $proxies[$i] . '<br /><br />';
                $proxylist[$a] = $proxies[$i]; //it's a good proxy add it to our list
                $a++;
            } elseif($rawdata == '2'){
                //process anonymous proxy
                echo 'Anonymous Proxy found: ' . $proxies[$i] . '<br /><br />';
                $proxylist[$a] = $proxies[$i]; //it's a good proxy add it to our list
                $a++;
            } elseif($rawdata == '1') {
                //process transparent proxy
                echo 'Transparnet proxy: ' . $proxies[$i] . ' - Skipping. <br /><br />';
            } else echo 'Bad Proxy, nothing returned: ' . $proxies[$i] .  ' - Skipping. <br /><br />';
            
        }
        echo 'Number of good proxies: ' . count($proxylist);

        curl_multi_close($master); //destory the multi curl resource
        return $proxylist; //return an array of useable proxies

}

//start of main code
$proxies = file('proxies.txt'); //loads a file into an array each line being a new element
$proxies = checkProxies($proxies); //$proxies will be a returned array of usable proxies


?>
 
I can't edit my post above, but removing the line:

curl_setopt($curl_arr[$i], CURLOPT_HTTPPROXYTUNNEL, TRUE);

Will cause more proxies to be returned (from about 5% of your list to 30%-50%).

I also apologize to mattseh, as it was my fault I wasn't getting back so many usable proxies.
 
I actually saw this thread yesterday AFTER I did this exact same thing ... didn't build from scratch, rather revamped an older proxy system. +rep to rage as it would have saved me some time and I'm sure will help someone else.

Hey, what would you think about getting some code together (I'll help) that does quite a bit more ... for instance, saving proxies to databases, making sure they're live, checking for bans (google, etc), geoip, .... all built on sql. Obviously, logging uptime & response time for the proxies are things that would be hugely beneficial too.

I love mattseh's proxyfinder, it's been running for 3 days straight on my ubuntu box. I did find after setting it up to a website and archiving the proxies that I went from ~30-50 proxies at any given time -> 300 live in just under a day.

Hit me or we can start the discussion here if it's somthing you (or other php/sql) programmers are interested in collaborating on.
 
just for completeness, do these variables help at all in your code rage?


  • REMOTE_ADDR
  • REMOTE_HOST
  • HTTP_X_FORWARDED_FOR
  • HTTP_VIA
  • HTTP_CLIENT_IP
  • HTTP_PROXY_CONNECTION
  • FORWARDED_FOR
  • X_FORWARDED_FOR
  • X_HTTP_FORWARDED_FOR
  • HTTP_FORWARDED
  • HTTP_REFERER
 
Hey, what would you think about getting some code together (I'll help) that does quite a bit more ... for instance, saving proxies to databases, making sure they're live, checking for bans (google, etc), geoip, .... all built on sql. Obviously, logging uptime & response time for the proxies are things that would be hugely beneficial too.

Yeah I'm down for it, not use to collaborating with a bunch of other people on one project, I'm normally a lone wolf. So if you have any idea on how to efficiently collaborate all this I'm all ears.

just for completeness, do these variables help at all in your code rage?


  • REMOTE_ADDR
  • REMOTE_HOST
  • HTTP_X_FORWARDED_FOR
  • HTTP_VIA
  • HTTP_CLIENT_IP
  • HTTP_PROXY_CONNECTION
  • FORWARDED_FOR
  • X_FORWARDED_FOR
  • X_HTTP_FORWARDED_FOR
  • HTTP_FORWARDED
  • HTTP_REFERER

Been kind of wondering the same thing. I was scouring the net trying to find as much info as I could on checking proxys by headers.

I tried finding a bunch of random proxies to test the headers against and what I wrote is basically what I came up with coupled with other info I saw.
 
I've created a repository on bitbucket.org. If your interested in helping out with the project erect has proposed PM your bitbucket user name so I can give you access to to it.
 
Yeah I'm down for it, not use to collaborating with a bunch of other people on one project, I'm normally a lone wolf. So if you have any idea on how to efficiently collaborate all this I'm all ears.

Very little experience co-oping here, I prefer flying solo myself, but this sounds like a good start TBH since most here at wickedfire won't be as anal as to complain about the difference between ...

if ($usuckballz == $ueatcock) {
echo 'faggot' ;
}

if ($usuckballz == $ueatcock)
{
echo 'faggot' ;
}

Shoot me the info over. I guess an overview of the pieces needed is probably the best place to start, then we'll divvy up the work to be done. That might be best discussed here, not sure if they've got wikis or whatever for planning development.

Like I said, though, this is perfect timing as I'm still tweaking my overhaul as we speak. We'll just offer the whole thing as a zip when we're finished.
 
Yeah they have project wiki's and a task system. I think the project is small enough if we had 2-4 people we could crank it out in no time. Just create an account on bitbucket, send PM your username and I'll add you to the project and we can go from there.