Program to rip/scrape entire blog into txt, database, or pdf?

Status
Not open for further replies.

ImagesAndWords

StayblCam.com
Nov 7, 2006
2,326
90
0
Scottsdale, Arizona
www.stayblcam.com
I have looked around the web for a program that will download each and every blog post (not just the most recent posts) off a given blog - and save it all into a file format.

Yes, I know rss readers "download" blog posts, but the ones I have tested only gets a certain amount of posts (FeedDemon says it can get 200 posts but always only gets max 20, and not older posts).

I have found a couple of great blogs in a niche I'm going after and I wanna download all their posts instead of sifting through the categories and archives - especially since there's soo many older posts. If I could just rip all the posts into some exportable file, it would save tons of time.

Does anyone know of such an app?

Thanks!
 


maybe look into wget. its a command line utility that you can use to recursively download an entire site to your hard drive. You can tell it to only download certain fily types (html), and there are many more options that could help you get at just the posts. Then you can probably convert all these files using (maybe) ghostscript, or some html2pdf utility, or even have a browser display the page and print it to a pdf driver. This will all take a long while, and I believe there are paid services that already provide this service, although many of them are focused on creating physical books out of blogs.
 
If you feel like learning a little you can do this with mechanize and hpricot pretty easily. Have to learn a little ruby in the process but I'm sure it will do you good :)
 
web scraper plus is a pretty good program to do this, takes some setting up, but after you do it will crawl the entire site pulling only the data you want.
 
Status
Not open for further replies.