rage9, I don't see why we need to write in the same language. I don't know ruby, and php = lol. You use what you want, I use what I want, the best code wins. I'm thinking something along the lines of yellowpages, which has lots of data to scrape. Maybe reads a text file, that has a category per line to scrape, goes through all pages of that category, dump results to a CSV. I'd say important things are the ability to recover from a page not loading, proxy support. I can scale to a couple hundred threads, so good luck beating me on speed with php
Latency will most likely be the speed limitation if you're single threaded.
"Latency will most likely be the speed limitation if you're single threaded."
My PHP forks can chop up your multi threads! Scared yet? Should be! :rasta: