Dogpile Search Spy - Watch Real Time Searches Being Made

riddarhusetgal · Jun 28, 2009

Have you ever watched this shit?

It amazes me what people are searching for no matter how many times I watch it....

Dogpile SearchSpy

icwkd · Jun 28, 2009

yeah

british women like to be fucked and tied with big wet dogs

emp · Jun 28, 2009

Wondering if this is censored / doctored.

Most "live search" features in search engines are.

- stripping out swearwords
- stripping of "offensive" searches

I fondly remember having access to a real live search backend. From 6pm on, it becomes really interesting.

The frontend was doctored like you would not believe.

::emp::

icwkd · Jun 28, 2009

There's a check box to show an unfiltered stream.

CPW-Carl · Jun 28, 2009

uncheck the box that says "omit adult terms"

lol can't believe how many porn searches I saw in like 30 seconds. Just think of all the men sitting there with their dick in their hands searching for things like "free black anal" which just scrolled by.

AffAholic · Jun 28, 2009

emp said:
Wondering if this is censored / doctored.

you can turn adult terms on/off. some of my favs that just came through:

ebony anal
illegal young girls
pronhub
drunk and abused
freeanalvideo

conclusion: people are great

emp · Jun 28, 2009

The real question being:

How can we scrape this?
Anyone?

::emp::

meln · Jun 28, 2009

Looks real simple to scrape.

Just send a POST request to:

http://www.dogpile.com/dogpile/ws/service/DPServices.asmx/GetSearchSpy

POST data: {"numTerms":50,"filter":true}

SandnSurf · Jun 28, 2009

Had someone code a chron driven perl script a couple of years ago to scrape Dogpile. At the end of each day it would zip up the log & name by the days date. This is the guts of it, might be of use to someone to get working. If I remember where I archived the rest of the files I'll post. They were just the archive & start/stop functions, so not vital to the actual scraping.
Filename - xmlscrape.cgi

Code:

#!/usr/bin/perl -X
## perl /home/xxxxxxx/public_html/kwscrape/pack.cgi
## perl /home/xxxxxxx/public_html/kwscrape/pack.cg
sub sendemail
{
    my ($t,$s,$b) = @_;
    
    if( open SENDMAIL, "|/usr/sbin/sendmail -oi -t -odq" ) {
        $data{from}='xxxxxxxx@gmail.com';
        $data{to}=$t;
        $data{subject}=$s;
        $data{text}=$b;
        
        print SENDMAIL 'From: ' . $data{from} . "\n";
        print SENDMAIL 'To: ' . $data{to} . "\n";
        print SENDMAIL "Content-type: text/plain;charset=windows-1251\n";
        print SENDMAIL 'Subject: ' . $data{subject} . "\n";
        print SENDMAIL "\n";
        print SENDMAIL $data{text};
        close (SENDMAIL);
    };
};


sub got_error{
    $ke=0;
    if (open(E,"<scrape_error.log")){
        $ke=<E>;
        close(E);
    };
    if ($ke==10){
        sendemail('xxxxxxxxx@gmail.com','Fetching Error',"More then 10 times fetching error!");
        $ke=0;
    };
    $ke++;
    open(E,">scrape_error.log");
    print E $ke;
    close(E);
};

use LWP::UserAgent;
use HTTP::Request::Common;

$fwrite="log.txt";

if (open(C,"<stop.flg")){
    close(C);
    unlink("./stop.flg");
    die;
};


if (open(C,"<pack.flg")){
    close(C);
    unlink("./pack.flg");
    system("mv -f $fwrite pack.txt");
    system("./arch.cgi &");
}else{
    sleep(3);
};
#die;
#print "Content-type: text/html; charset=windows-1251\n\n";
#print "Started!";

chdir("/home/xxxxxxx/public_html/kwscrape/");

$ua = LWP::UserAgent->new;
$ua->agent('Mozilla/5.0');

$url='http://www.dogpile.com/info.dogpl/searchspy/inc/data.xml?filter=0';

($absurl) = $url =~ m!(http://.+?)/!si;
($otnurl) = $url =~ m!(http://.+)/!si;

$re = $ua->request(GET "$url");
if (!$re->is_success) {
    print "Error at getting $url!\n";
    got_error();
    };
$mainresponse = $re->as_string;

($cont) = $mainresponse =~ m!(<.*>)!;

$cont =~ s!<.*?>!#!sig;
$cont =~ s!#+!#!sig;
$cont =~ s!&.*?;!!sig;
($cont) = $cont =~ m!#(.*)#!si;

@words = split("#",$cont);

open(F,">>$fwrite");
for($i=0;$i<scalar(@words);$i++){
    print F "$words[$i]\n";
};
close(F);

system("./xmlscrape.cgi &");

Roddy · Jun 28, 2009

My favorites so far ....

"i shot myself"

"when your kid wont friend you on facebook"

"big bra fanclub"

"assholes from canada"

"how to hunt and kill terrorists"

"goodluck bros"

"is little a word"

pickledegg · Jun 28, 2009

Heres a couple I spotted:

Can a woman get pregnant by a horse?

Yea, I've often wondered that...

3d Incest.
and yea, you don't want any of that 2 dimensional incest, that sucks, 3D's the way to go.

The human race - WTF

texic · Jun 28, 2009

"mother daughter father threesome movies"

what... the... fuck...

Fatbat · Jun 28, 2009

"how to firm up cats loose stools"

That one made me laugh.

efeezy · Jun 28, 2009

White Sluts Desire to Breed Black
I have small tits
bear ass scratcher
you jizz
easy way of anal
banana boobs

These came up in about a 2 minute span...too funny.

CitizenSmif · Jun 28, 2009

What the fuck is a reverse gangbang?

amanda11 · Jun 28, 2009

CitizenSmif said:
What the fuck is a reverse gangbang?

It's like 10 chicks, 1 guy.

rfss4 · Jun 28, 2009

the thing I will never get over is that people love typing thesitename.com into a search engine

aeisn · Jun 28, 2009

lmgtfy has a similar feature.

icwkd · Jun 28, 2009

infotiger has this, also. But the results are not as interesting. Here's a script to scrape it, though:

Code:

#!/usr/bin/env python3.0
import datetime
import time
import os
import random

log_path_base = '/home/user/infotiger'
outer_left = 'voyeur ***************** -->'
outer_right = '<!-- ************ /voyeur'
cont_left = 'blank">'
cont_right = '</a'
min_wait = 2
wait_factr = 2
file_searches = 0
directory_files = 0
approx_searches_per_file = 30000

def log_path():
    return log_path_base + '/' + str(datetime.date.today()) + '.' + str(directory_files)

def should_continue():
    return True

def random_wait():
        time.sleep(min_wait + (wait_factr * random.random()))

        
while should_continue():

    # time for a new file
    if file_searches > approx_file_searches_per_file:
        directory_files += 1
        file_searches = 0

    random_wait()
    f = os.popen("wget -qO - --no-cache --no-cookies header='Host: [URL="http://www.infotiger.com/"]Infotiger search engine homepage[/URL]' --header='User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.10) Gecko/2009042513 Ubuntu/8.04 (hardy) Firefox/3.0.10' --header='Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header='Accept-Language: en-us,en;q=0.5' --header='Accept-Encoding: gzip,deflate' --header='Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7' --header='Keep-Alive: 300' --header='Connection: keep-alive' --header='Content-Type: text/html; charset=utf-8' --header='Referer: [URL="http://en.wikipedia.org/wiki/Search_engine"]Web search engine - Wikipedia, the free encyclopedia[/URL]' --header='Pragma: no-cache' --header='Cache-Control: no-cache' http://www.infotiger.com/voyeur.html?filter=no")
    raw = f.read()
    f.close()

    # isolate voyeur section
    raw = raw.partition(outer_left)[2].partition(outer_right)[0]
    # create strings with the wanted content at the beginning
    left_chopped = [raw.partition(cont_left)[2]]
    n = 1
    while left_chopped[n-1] != "":
        left_chopped.append(left_chopped[n-1].partition(cont_left)[2])
        n += 1

    # chop off the extraneous text from the end
    cont = [left_chopped[0].partition(cont_right)[0]]
    n = 1
    lgth = len(left_chopped)
    while n < lgth:
        cont.append(left_chopped[n].partition(cont_right)[0])
        n += 1
    
    file_searches += len(cont) - 1
    output = '\n'.join(cont)
    f = open(log_path(), 'a')
    f.write(output)
    f.close()

rawbones · Jun 28, 2009

"creamy pussy pics"
"butt smotherer"

Dogpile Search Spy - Watch Real Time Searches Being Made

Incongruous Juxtaposition

New member

New member

New member

New member

New member

New member

The Purple Tentacle

Life's a Beach!

New member

Active member

New member

Advertise Here

New member

New member

Hustle hard

New member

New member

New member

New member