Scraping categorized Wikipedia articles to MySQL

trizzy871

New member
Mar 17, 2012
13
0
0
Hi all,

I want to build a MySQL database based off data scraped from certain categories on Wikipedia. For example, say I wanted to create a table of all Economists and when they were born. I would want the table to be kept up-to-date with any changes on Wikipedia.

Any idea how much a freelance programmer would charge for something like this? I did some research and I think it could be done with DBPedia

Thanks
 


Don't forget google base!
 
Thanks for the response. I know it's free, but I want to keep my database up-to-date with any changes on Wikipedia. So there still needs to be code that updates my DB.
 
@AdamC how is google base relevant? based on wikipedia it was downgraded to Google Merchant Center in 2010...
 
OK so downloading it periodically would be feasible. But...
1. I actually want to pull from a different official WikiMedia site beside Wikipedia (e.g. Wikispecies). Sorry I did not mention that. I could not find a DB Dump available for any of those WikiMedia sites.
2. I still have to extract/parse the data I need from those pages and put it into a DB.

So, back to the original question, any idea how much something like that would cost for a freelancer?
 
well, if you are certain no direct queries can be performed and you will have to scrape, then it's a matter of a db setup, some dom traversal in your language of choice and a few queries here and there.

Without better examples and more info would be hard for me to say, but it sounds like something that would take me a couple of hours to code in PHP, so I'd charge for 6.
 
@AdamC ok thanks. i'll probably PM you once I figure out exactly what i want, if u don't mind helping me out.

@droplister thanks. i tried it out, it's really easy to use. however it doesn't have enough customization options to pull the exact data i need because the RegEx expressions only apply to the text on the screen, not the HTML behind it.
 
I've written a few scraping tools in ruby that use open-uri and nokogiri, PM me with the page you're looking at and I'll try to see what I can come up with.
 
I found that the data I am looking for is available on Freebase, so I am going to try and leverage that. Will post back if that doesn't work out! Thanks for all the suggestions.