https://github.com/dabbamont/domquery
This isn't fully tested and isn't well documented *yet*, but I'm using it as a sample for a few companies that are considering me for engineering positions. If you've done any work with scraping or data mining in PHP you'll see how powerful it is.
I've used this for a lot of scraping and it makes it EXTREMELY easy. I'll keep it short and sweet, so I'll post an example of a scraping class I wrote really quick with it (this is also included in the source).
Using the GoogleSearch class:
The actual GoogleSearch class (built using DOMQuery)
Returns:
This isn't fully tested and isn't well documented *yet*, but I'm using it as a sample for a few companies that are considering me for engineering positions. If you've done any work with scraping or data mining in PHP you'll see how powerful it is.
I've used this for a lot of scraping and it makes it EXTREMELY easy. I'll keep it short and sweet, so I'll post an example of a scraping class I wrote really quick with it (this is also included in the source).
Using the GoogleSearch class:
PHP:
$search = new GoogleSearch("xml");
$results = $search->getResults();
print_r($results);
The actual GoogleSearch class (built using DOMQuery)
Returns:
Code:
Array
(
[1] => Array
(
[keyword] => xml
[position] => 1
[title] => XML - Wikipedia, the free encyclopedia
[url] => http://en.wikipedia.org/wiki/XML
[description] => Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and ...
[domain] => en.wikipedia.org
)
[2] => Array
(
[keyword] => xml
[position] => 2
[title] => XML Tutorial
[url] => http://www.w3schools.com/xml/
[description] => A well organized and easy to understand free tutorial with lots of examples and source code.
[domain] => www.w3schools.com
)
[3] => Array
(
[keyword] => xml
[position] => 3
[title] => XML Introduction - What is XML?
[url] => http://www.w3schools.com/xml/xml_whatis.asp
[description] => XML was designed to transport and store data. ... XML stands for EXtensible Markup Language; XML is a markup language much like HTML; XML was designed ...
[domain] => www.w3schools.com
)
[4] => Array
(
[keyword] => xml
[position] => 4
[title] => Extensible Markup Language (XML)
[url] => http://www.w3.org/XML/
[description] => Main page for World Wide Web Consortium (W3C) XML activity and information.
[domain] => www.w3.org
)
[5] => Array
(
[keyword] => xml
[position] => 5
[title] => XML From the Inside Out -- XML development, XML resources, XML ...
[url] => http://www.xml.com/
[description] => XML.com, where the XML community shares XML development resources and solutions, features timely news, opinions, features, and tutorials; the Annotated ...
[domain] => www.xml.com
)
[6] => Array
(
[keyword] => xml
[position] => 6
[title] => IBM developerWorks : XML tutorials, code, and forums
[url] => http://www.ibm.com/developerworks/xml/
[description] => Dec 11, 2012 ... The XML section on the developerWorks Web site is your resource for XML- related tools, samples, standards information, education, news and ...
[domain] => www.ibm.com
)
[7] => Array
(
[keyword] => xml
[position] => 7
[title] => Microsoft XML Downloads - MSDN - Microsoft
[url] => http://msdn.microsoft.com/en-us/data/bb190600.aspx
[description] => Extensible Markup Language (XML): Library, learning resources, downloads, support, and community. Evaluate and find out how to install, deploy, and maintain ...
[domain] => msdn.microsoft.com
)
[8] => Array
(
[keyword] => xml
[position] => 8
[title] => XML at The Apache Foundation
[url] => http://xml.apache.org/
[description] => Provides commercial-quality standards-based XML solutions for Java, C++ and Perl that are developed in an open and cooperative fashion. Includes XML and ...
[domain] => xml.apache.org
)
[9] => Array
(
[keyword] => xml
[position] => 9
[title] => XML Tutorial - Introduction
[url] => http://www.tizag.com/xmlTutorial/
[description] => Learn the basics of XML with Tizag.com's XML beginner tutorial.
[domain] => www.tizag.com
)
[10] => Array
(
[keyword] => xml
[position] => 10
[title] => The XML FAQ
[url] => http://xml.silmaril.ie/
[description] => FAQs maintained by Peter Flynn, part of the W3C's XML special interest group.
[domain] => xml.silmaril.ie
)
)
PHP:
class GoogleSearch {
public $keyword = "";
private $results = [];
public function __construct($keyword) {
$this->keyword = $keyword;
}
public function getResults($maxResults = 10) {
if (count($this->results) > $maxResults) return $this->results;
$curPage = 1;
$curCount = 0;
while ($curCount < $maxResults) {
$curCount = $curPage * 10;
$this->getPage($curPage);
$curPage++;
}
return $this->results;
}
private function getPage($num) {
$start = ($num - 1) * 10;
$doc = new DOMQuery\Doc("http://www.optimum.net/Search?q=" . urlencode($this->keyword) . "&p=$num");
$results = $doc->find("#websearch > div");
$current = $start;
foreach ($results as $result) {
$current++;
$link = $result->find("a");
$this->results[$current] = [
"keyword" => $this->keyword,
"position" => $current,
"title" => $link->text(),
"url" => $link->attr("href"),
"description" => $result->find("div")->text(),
"domain" => parse_url($link->attr("href"))["host"]
];
}
}
}