Do people still want databases?

mgtarheels · May 15, 2011

dchuk said:
I'm gonna be way too busy painting my face green and yellow each weekend

Buncha bums. :ak:

-joe- · May 15, 2011

I think I've seen IMDB about already.

chatmasta · May 15, 2011

Those of you who want a Yelp scraper (honestly I'm kind of interested in this too)...how do you think I should set it up?

I'm thinking you give it your query (or queries) and it outputs a .csv. For every attribute it finds, it makes a new column for that attribute and fills it in whenever it can. That way you can sort by attributes, etc. It would also grab the Yelp business id (i.e. the slug from the URL of a business's page) and save a bunch of text files, each one named for a specific business, containing the reviews.

Thoughts?

Oh also does anyone know if I am going to hit any kind of query or page limit? I know the API has one built in but I don't know if their actual site checks.

Korgo · May 15, 2011

I need restaurant owners emails with their names

Tony. · May 15, 2011

wat__ said:
an imdb db or any movie star db would be really good.

Interested in something like this too

BeerNuts · May 15, 2011

chatmasta said:
Those of you who want a Yelp scraper (honestly I'm kind of interested in this too)...how do you think I should set it up?

I'm thinking you give it your query (or queries) and it outputs a .csv. For every attribute it finds, it makes a new column for that attribute and fills it in whenever it can. That way you can sort by attributes, etc. It would also grab the Yelp business id (i.e. the slug from the URL of a business's page) and save a bunch of text files, each one named for a specific business, containing the reviews.

Thoughts?

Oh also does anyone know if I am going to hit any kind of query or page limit? I know the API has one built in but I don't know if their actual site checks.

It is really hard to stay out of this subject but since this is ShS I hope you don't take offense.

I already have a simple Yelp scraper on the market in the BST as LocalScraper (LocalScraper.com) which scrapes simple data from 5 different sources. Its gotten 0 attention and has sold one copy despite the insane drop in price I gave it to get some business. My own fault for marketing it like shit.

I had also had a few people contact me from its webpage about a more advanced Yelp scraper and I have already completed one that grabs everything but the reviews and images and outputs it to csv.

The first one I made took a search string like "Salons" and a location and then output all the businesses from the search. The new one I have made (releasing in a day or two) goes by category or subcategory and scrapes the entire thing rather than by keyword. I have found that Yelp includes reviews for search data, so if you look for "salon" you could get unrelated business details if a review had the word "salon" in it.

In developing both of these bots I have never hit a Yelp wall in regards to page count. I have done 300 pages in a run on test projects with out a problem. But I also have enough random delays in place to mimic a real user, and proxy support if that fails.

Again, sorry for shitting on your thread here. But its hard to see people looking for a product when you already have one for sale on the site and our about to release one that people are talking about.

chatmasta · May 15, 2011

Haha no worries dude I haven't even started working on it yet! Good to know though

chatmasta · May 15, 2011

Golf course DB is done. It's sweet. 19,000 courses. Reviews for all of them when they were available (50,000 reviews). Ratings, etc. Also includes a junction table containing data for how far each course is from nearby cities (so you can search for all courses within X miles of Y city). Right now I'm just putting together a readme.txt with some common queries (some are kind of difficult with the junction table).

PM me if you are going to want it. Price will likely be $39.

davidb · May 16, 2011

Do you take custom orders?

chatmasta · May 16, 2011

I'll take custom orders if the price is right - otherwise I'd be better off just making one for everybody.

BTW, golf database is up for sale, check link in signature.

alvinnnn · May 21, 2011

yelp scraper

I also wrote a perl script that gets the attributes as well as the reviews (in a separate text file). I keep getting blocked by yelp though. Anyone has experience with proxies to get around that? any recommendations?

mattseh · May 22, 2011

alvinnnn said:
I also wrote a perl script that gets the attributes as well as the reviews (in a separate text file). I keep getting blocked by yelp though. Anyone has experience with proxies to get around that? any recommendations?

Random time limits, randomise the order you hit the URLs, proxies are a good idea, realistic referrers never hurt.

Green Arrow DZN · May 22, 2011

wat__ said:
made a chunk of change off that college db you had out a couple years ago.

an imdb db or any movie star db would be really good.

lyrics would be good

athlete stats databases would be good too...

The athlete DB is a good idea across all sports.

chatmasta · May 22, 2011

mattseh said:
Random time limits, randomise the order you hit the URLs, proxies are a good idea, realistic referrers never hurt.

Also don't run 30,000 hits at once in an hour. My IP is banned haha

vgeek · May 22, 2011

lol, you disappeared for like 2 years then randomly show up

chatmasta · May 22, 2011

I follow a pretty consistent pattern of disappearing during school haha

Berto · May 22, 2011

PMing you - Let's talk business. I need some scraping done within a few months so maybe we can work something out. It will be a custom job and I'm going to want source code so that I can continue to scrape. Was also gonna buzz MattSeh when it all goes down.

alvinnnn · May 22, 2011

yelp scraper

chatmasta said:
Also don't run 30,000 hits at once in an hour. My IP is banned haha

My ips keep getting banned and i keep changing them

I'm happy to share my script (perl) for yelp scraping if someone can help add proxy support with a list of proxies and other tricks to get around being banned. Yelp is by far the strictest site i've ever scraped.

Fatbat · May 23, 2011

jannat8 said:
Oh, I should have added, with European courses too!

Somebody ban this bot/retard already. We need a new ban thread...

Rascagua · May 23, 2011

I could use a few databases... let me know what you get.

Do people still want databases?

New member

Britfag

Well-known member

New member

New member

New member

Well-known member

Well-known member

New member

Well-known member

New member

import this

Green Arrow Designs

Well-known member

New member

Well-known member

Movin to TX

New member

Advertise Here

New member