Looking to build a screenscraper. Need some advice.

flashpoint

Banned
Sep 18, 2014
39
0
0
I need to scrape some information from a forum and then post it to reddit periodically.

It's been a few years, but I used to do this kind of thing in php. Is there any reason that I should consider using another language or service to accomplish this today?

Thanks.
 


if php is what you know, then why not?
 
I've been thinking about this as well. I would think one of those scrapers as a service (ie Kimono labs) combined with IFTTT would do the trick, no? You should be able to stay in the free tier.
 
Really depends what you are scraping. If there is a lot dynamic content on the page, or complex javascript type navigation or other components, you may consider something like CaserJS. If the content is easy to scrap (ie it's relatively static), then really any language will do.
 
Python is one good option.

but since you know PHP and that's what you are comfortable with, then maybe stick with it, PHP can also do the trick.
 
PHP should do the trick for most sites. I find it easier to scrape content from sites with CMS (wp, joomla etc) rather than old sites - mostly because the base markup of the site varies from page to page.

For sites that render content using js, python has some extensions that get the job done.

@GUTTA - import.io looks promising
 
If you're not doing it in Haskell you're just wasting your life.
 
I need to scrape some information from a forum and then post it to reddit periodically.

It's been a few years, but I used to do this kind of thing in php. Is there any reason that I should consider using another language or service to accomplish this today?

Thanks.

There are tools like kimonolabs that may let you do this without some exclusive coding or hosting it yourself. You may want to check it out.