OK before I start this tutorial it is going to be for complete newbs to php to people who may know a little but have no experience in writing bots. If you already have programming knowledge you will probably not benefit from reading this but I have seen people asking a lot of question about automation recently and thought a primer might be useful for some people. In this post I am not going to give out completed code as you'll learn nothing from that. Hopefully though if you follow along and use google for anything you dont understand you should be able to write a working bot.
OK to start with your going to either need a webserver with PHP and Curl installed. I recommend when starting out that you test scripts from your local machine. The way I do it is by running a Linux distro with apache and LAMP. Hint. If you don't understand what I just said Google the following things:
"Installing Apache (Linux distro)"
"Installing LAMP Apache (Linux distro)"
"Installing Curl Apache"
If you followed all those things it should take you roughly one hour but congratulations you can now test all your php scripts on your local machine.
I cant be bothered to explain the very basics of PHP here. Go here PHP Tutorial - Introduction and do the first few tutorials, it wont take you more than 1-2 hours to get the basic syntax of PHP down, for example; what are strings, basic loops, functions, how to read files etc. This may all sound terribly complicated but its really not. Get yourself a cup of coffee, close your other tabs and open your text editor of choice. Try messing around with the examples they give on that site and get your first scripts to do some stupid things.
OK, now that the boring shit is out of the way hopefully you have an idea of what web service you would like to automate. I am going to take for an example an article submission bot. The theory behind this can apply to almost any other posting bot but for the sake of this tutorial thats what I'm gonna do.
Before you even begin to code your new project you need to get an exact idea of what it has to do. How do you get that? by doing it manually of course. So take a pen and paper and go submit a new article. Write down everything you have to do to post the article. Your list should look something like this:
Go to Login page
Enter Username
Enter Password
Click Login
Click Submit Article
Fill out all the details associated with posting article
Click Submit Article
View the page that says article submitted
Not too many things OK this looks good. Are list looks pretty good but we need to be a lot more specific if were going to automate this with a script. So next what your going to do is install a plugin for firefox called Live HTTP headers. What this does is allow you to see all of the information your browser is sending to the remote server.
So now open up live HTTP headers and do the exact same thing again, logging in and posting an article. You want to do this in two steps. Firstly when you login copy the header (data from live HTTP headers to a text file). This will show information like this:
This is just the very topmost section of data in live HTTP headers, normally its all you need. Upon viewing this you can learn what page the data gets sent to and by what method. In this case its post and beside that is the dest url. You can also see that a cookie gets set (more on this later). You can also see that email and password aren't the only data being sent there is also "&SUBMIT=Login" this is hidden when viewed in a browser but must be sent.
Before I show a sample script for doing this I need to briefly touch on Curl. Curl is great it takes care of cookies and posting the data. It also emulates the user agent(browser OS your using) and refferer(page the data is being sent from).
In all my time of using Curl I have rarely had to modify the Curl fuction I use. If you done the tutorials I mentioned above you should know what a function is and how to execute it. Below is the Curl functions I use. This is not my own code but is copied from Harry who had the darkseoprogramming blog (now closed). There are two functions in the code below. The first one is used when you need to post data and the second one is just for requesting a page(useful for when you want to check if something submitted correctly).
OK, our sample code for logging in will look like this
Tinker with this till you get it to work. Remember if its not working double check to make sure you are sending the exact data in the script as in live HTTP headers. If you got this far you should be able to repeat the same thing again to post the article. This time your code should be a bit longer and your regex different. Thats basically all there is to a simple posting bot. If you have any questions post them below and Ill do my best to help you.
OK to start with your going to either need a webserver with PHP and Curl installed. I recommend when starting out that you test scripts from your local machine. The way I do it is by running a Linux distro with apache and LAMP. Hint. If you don't understand what I just said Google the following things:
"Installing Apache (Linux distro)"
"Installing LAMP Apache (Linux distro)"
"Installing Curl Apache"
If you followed all those things it should take you roughly one hour but congratulations you can now test all your php scripts on your local machine.
I cant be bothered to explain the very basics of PHP here. Go here PHP Tutorial - Introduction and do the first few tutorials, it wont take you more than 1-2 hours to get the basic syntax of PHP down, for example; what are strings, basic loops, functions, how to read files etc. This may all sound terribly complicated but its really not. Get yourself a cup of coffee, close your other tabs and open your text editor of choice. Try messing around with the examples they give on that site and get your first scripts to do some stupid things.
OK, now that the boring shit is out of the way hopefully you have an idea of what web service you would like to automate. I am going to take for an example an article submission bot. The theory behind this can apply to almost any other posting bot but for the sake of this tutorial thats what I'm gonna do.
Before you even begin to code your new project you need to get an exact idea of what it has to do. How do you get that? by doing it manually of course. So take a pen and paper and go submit a new article. Write down everything you have to do to post the article. Your list should look something like this:
Go to Login page
Enter Username
Enter Password
Click Login
Click Submit Article
Fill out all the details associated with posting article
Click Submit Article
View the page that says article submitted
Not too many things OK this looks good. Are list looks pretty good but we need to be a lot more specific if were going to automate this with a script. So next what your going to do is install a plugin for firefox called Live HTTP headers. What this does is allow you to see all of the information your browser is sending to the remote server.
So now open up live HTTP headers and do the exact same thing again, logging in and posting an article. You want to do this in two steps. Firstly when you login copy the header (data from live HTTP headers to a text file). This will show information like this:
Code:
http://www.goarticles.com/cgi-bin/member.cgi
POST /cgi-bin/member.cgi HTTP/1.1
Host: www.goarticles.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.14) Gecko/2009090216 Ubuntu/9.04 (jaunty) Firefox/3.0.14
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://www.goarticles.com/ulogin.html
Cookie: CGISESSID=aseshidhere
Content-Type: application/x-www-form-urlencoded
Content-Length: 72
email=myemail&password=mypw&SUBMIT=Login
Before I show a sample script for doing this I need to briefly touch on Curl. Curl is great it takes care of cookies and posting the data. It also emulates the user agent(browser OS your using) and refferer(page the data is being sent from).
In all my time of using Curl I have rarely had to modify the Curl fuction I use. If you done the tutorials I mentioned above you should know what a function is and how to execute it. Below is the Curl functions I use. This is not my own code but is copied from Harry who had the darkseoprogramming blog (now closed). There are two functions in the code below. The first one is used when you need to post data and the second one is just for requesting a page(useful for when you want to check if something submitted correctly).
Code:
<?php
function post($page, $fields)
{
$file_cookie = "cookies/cookies.tmp";
$reffer = "copy your refferer from live HTTP headers here";
$ch = curl_init($page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $file_cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $file_cookie);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
$response = curl_exec($ch);
curl_close($ch);
//echo curl_error($ch);
return $response;
}
function scrape_page($page)
{
// cookie path
$file_cookie = "cookies/cookies.tmp";
$reffer = "copy your refferer from live HTTP headers here";
$ch = curl_init($page);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $file_cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $file_cookie);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_REFERER, $reffer);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)");
$response = curl_exec($ch);
curl_close($ch);
//echo curl_error($ch);
return $response;
}
?>
Code:
<?php
require_once("curlfunctions.php");
$site = "www.articlesite.com";
$variables = "username";
//repeat using appropriate variable for your username, password etc.
//If your feeling adventurous get your script to load these from a file
//were now constructing a string the same as the one from Live HTTP headers
$poststring = "email=" . $email . "&password=" . $password . "&SUBMIT=Login";
//The page the data is being sent to is $site . "/cgi-bin/member.cgi";
//We are loading the info into our curl function and setting $result as the page we would get when we login in.
$result = post($site . "/cgi-bin/member.cgi", $poststring);
//Now to check if were logged in we need a regex, note the delimiters for the regex.
$regex = "/Welcome username/";
//check if its contained in the page returned, this will check if the regex string was found in the result page, if a match was found it will be saved in $matches
preg_match($regex, $result, $matches);
//now to check if it was
if(empty($match)){
echo "posting failed";
}else{
echo "success";}
?>