Can I get some scraper advice (Wordpress)?

IMHopeful

Wicked Fire Elite Member
Mar 8, 2010
1,058
21
0
Canada
I just broke the first page for a couple of different terms I've been targeting on a news/current events site that I have been working on for a couple of months.. I want to throw a couple of extra widgets with current news from some much larger authority sites.

My question is: I know scrapers can hog a lot of resources, but can anyone give me some approximate usage estimates? For instance: Let's say that I was scraping NYTimes "Most Popular" section; I'm wondering if that would hammer my server resources (currently shared), or if I would be OK if it was mainly text-based content? Also, maybe I would be completely farking myself if I added this type of feed? Any advice is good advice for me at this point.

My site is more niche oriented than NYTimes, but I mentioned it because of their constant content updates and popularity.

Thanks in advance. Thinking of using a plugin such as "WP Scraper, or similar.
 


If you're on a shared host, you may as well just go for it. No, it's not ideal on shared hosting, but lets be honest - the other 300 dicks on your server are likely raping the resources anyway.
 
Scraping doesn't use that much server resources. I have pretty much constant scraping going on on one of my shared hosted sites
 
If its a WP widget it probably just grabbing a feed rather than scraping, it will not cause any noticable issues
 
  • Like
Reactions: IMHopeful
Thanks for the answers everyone. I'm definitely only using a shared host, but I appreciate the answers you guys gave.

I don't want to fuck with the server I'm on currently, since an awesome dude (blogspotter) was generous enough to float me an awesome coupon code for it and their support and interface is perfect.

I didn't think a feed would be a problem, but it is a scraping plugin nonetheless, and I'm still too much of a newbie to know any better without learning the hard way, or getting the info from people that know better. :D

Thanks bros.
 
don't scrape on demand, it will cause issues.

Instead set up a cron to save the feed locally [either complete widget cache or data for an fopen()]. This way you'll only have one script of overhead every 5 minutes or so that then powers every visitor's page load.

On the other hand, if it's really for the user's benefit and not bots, you can hook up a javascript widget and call it a day. That will move all the resource usage client side which won't bother your shared server at all.
 
  • Like
Reactions: IMHopeful