Free Online URL Scraper (Need Feedback!)



It looks pretty. I noticed a couple of alignment issues with the tabs as you click them. Some are showing varying degrees of blue on the edges when the black overlays it.

I'm not really sure how it is scraping URLS. Is it based off of PR, SE rank, etc.? Even then, I am noticing different results. Is it going based off of country? I have several results that are from the Philippines. That wouldn't be targeted towards the results I would need.

I am guessing it is mainly to help people find sites to get keywords off of? It would be nice to choose which country you want to target for the SE.
 
Thanks for checking it out.

It's scraping the URL using Google.com and it's not currently technically possible to make it country specific with the method I'm using, although you can always add a "site:.nl" or whatever you want to target it.

It's good for grabbing lists of pligg sites or whatever to feed other tools - I use it myself quite a lot now.
 
Heh a limit for FF 3.5 is really strict ;) Sadly I have 3.0.
If you want audience on your site you really should try to make it compatible with 3.0 at least. Many people have even older releases !

Wondering how you solved the google ban hammer ?
From my experience google bans pretty fast if you scrape them (I used 150 proxies and the problem was gone but that's not affordable for most).

Dunno if that works for you but you can target specific languages by adding "&lr=" as parameter to the search string.
 
very nice tool, so far the results have been very good. I am signing up and It is definitely not limited to finding a bunch of pligg sites ;)
 
Heh a limit for FF 3.5 is really strict ;) Sadly I have 3.0.
If you want audience on your site you really should try to make it compatible with 3.0 at least. Many people have even older releases !

Wondering how you solved the google ban hammer ?
From my experience google bans pretty fast if you scrape them (I used 150 proxies and the problem was gone but that's not affordable for most).

Dunno if that works for you but you can target specific languages by adding "&lr=" as parameter to the search string.

It has to be FF3.5 for the implementation I've used, which answers you second question: it doesn't get banned :)

Also I can't add url variables to target by country because of the specific methods involved.


You should make people sign up to use it, not have registering as an option, that way you build your list.

Yeah I thought about that but then just let anyone scrape 50 urls immediately so they can see if the tool is any good - if they like it they're probably gonna subscribe anyway.


very nice tool, so far the results have been very good. I am signing up and It is definitely not limited to finding a bunch of pligg sites ;)

Awesome - glad you like it ;)


<edit>
@monkeyman: Yeah I would normally use multi curl or pipelining or something to suck the goodness from Google but I thought it would be less of a headache for a free public tool to take a different approach ;)
</edit>
 
Last edited:
I just noticed that if you have the noscript plugin in FF the site will not function correctly.

To use the site you must add an exception for gscrape.

Just thought I should let you know.
 
@turbo: weird. What version of FF are you using?

@phrench: I'm collecting them in a weird way, but also I'm appending random keywords with each grab to extract more urls overall from G, so they will be different from a normal search.

Is this what you would ideally want?

If you just search for say, "apple pie" then Google will only return 1k max results, but if you search for "apple pie website", then "apple pie dog", then "apple pie england" you can get pretty much all the urls after a bunch of search's.

Lemmie know how you want it to work.
 
Looks like a good way to see what your competitors are researching :D

The aim's to build a list, not spy on people :)

Whip out Live Headers in FF and check what's going on - after the initial page loads there's no talking with my server.

All the scraping is done client side so it's totally private.
 
The aim's to build a list, not spy on people :)

Whip out Live Headers in FF and check what's going on - after the initial page loads there's no talking with my server.

All the scraping is done client side so it's totally private.


Ok, so I open the initial page at gScrape - Harvest Unlimited SE URLs online - FREE (funny how it thought the latest version of Safari on Snow Leopard was an "old browser", should probably just say 'incompatible browser').

Why is it when I click 'go', it contacts gscrape.com? But you said after the initial page loads there's no talking with your server... (when running vmware behind little snitch, little snitch alerts of a connection to gscrape right after clicking go)
 
@kblessinggr:

You need to learn how to use the tools you've got :)

Try using FF live headers and it's probably easier to see.
It doesn't contact the server with any of your data.

It does (of course) get the page images from the server which is probably what you are looking at.

Read my earlier posts and you'll see my explanation of why I append random words...

I've made a note to change the error message to incompatible. There's also some more in depth browser version checking I need to do as I think it's blocking some OK ones - and I know with some work IE8 can run it.