Using Google news to fill your Autoblog

smap · Oct 30, 2007

Emp - I'm better now.

I think its actually been working as described for a while now but I was doing one thing a little differently. Namely, instead of searching in news.g I was clicking one of the Category links on the left (e.g. 'Sports', 'Entertainment', etc.) then 'RSS'.

The category feed is a little different. In particular, one line in the resulting POST HTML is very long and contains at least 2 matching URLs. Regex rips out from the first 'news.google.com' through all the good HTML in between to the last 'amp;cid='. So ... a bunch of good content gets ripped out.

I still have to use PhpMyAdmin to fix up my RegEx's because " always ends up as \" in the DB, but I can live with that.

nickycakes · Oct 30, 2007

Remember kids, you can run it through feedburner to clean up the url. *hint hint*

smap · Oct 30, 2007

nickycakes said:
Remember kids, you can run it through feedburner to clean up the url. *hint hint*

F! Why didn't you say so earlier! To hell with rewriting. I'm pretty convinced now it's not working 100% of the time for anyone. That "double url" issue I posted above happens both in G news categories (as I mentioned) but also in searches *sometimes*. e.g. "news google com/news?hl=en&ned=us&q=celebrity&btnG=Search+News" <-- the first few entries will have links to "all xx related items" or whatever which hoses the regex.

Wonder if and/or when Feedburner (owned by G) will scan for this kind of feed.

By the way - the latest version of wp-o doesn't seem to work with WPMU. Talking about the bleeding edge release from right on the homepage of devthought.com where it says, "Update 2: Newest version here. WP-o-Matic is slowly reaching perfection".

smap · Oct 30, 2007

I don't think Feedburner is a solution either. URLs inside the feed are still news.goog.

Here's a picture that kind of makes the problem easy to see (screenshot of Regex Buddy by the way).

ImageShack - Hosting :: regexvn3.gif

The entire table is the post wp-o creates (with no regex specified). The shaded areas show where the regex will replace Goog URLs - light colors (e.g light yellow) will be the entire regex match, the darker (e.g. dark yellow) the group ($1) that would be used to rewrite. Notice the one outlined in red at the bottom - that is where the regex is grabbing too much text because there are multiple news.google. URLs on one line.

My regex is weak so if someone knows how to take care of the issue in red... The screenshot is using this:

Code:

http:..news.google.com.*amp;url=(.*).amp;cid=[^"]*

cashflowrusty · Oct 31, 2007

smap said:
Code:

http:..news.google.com.*amp;url=(.*).amp;cid=[^"]*

This worked perfect for me! The formatting for some of the articles (from E!) was off, but for the most part, it is building the links correctly!

Now, to add MY tracking script(still to be written) to redirect them...

smap · Oct 31, 2007

cashflowrusty said:
This worked perfect for me!

At least someone got something out of my .... 12 or so hours work.

I bagged rewriting and instead do a cursor:text with CSS over the news.g links. Helps a bit.

Mornington · Nov 9, 2007

What module(s) do you use for the aggregation?

nickycakes said:
I use google news quite a bit for drupal aggregators. It's pretty sweet.

nickycakes · Nov 9, 2007

Mornington said:
What module(s) do you use for the aggregation?

Aggregator

Bead · Nov 14, 2007

Has anyone played around with Feedwordpress and google rss feeds?

cachemoney · Dec 14, 2007

EMP,
I have this for my regex box:

Code:

/http...news.*amp;url=(.*)&cid.*\"{1}/

But if my rss headline has an apostrophe (') it gets replaced with this:

Code:

;#39;

what can i do to have the apostrophe kept.

Heres an example of my google news rss:

Baseball's Mitchell Report: Steroid Use Doesn’t Discriminate, But …

AdHustler · Dec 20, 2007

will this software get you nailed for dup content? Does this have any seo value whatsoever?

deejne · Dec 25, 2007

Thanks for the pointers...

machinecontrol · Dec 28, 2007

narsticle said:
will this software get you nailed for dup content? Does this have any seo value whatsoever?

You won't get penalized for dup content. Thousands of newspapers report AP news feeds without getting penalized.

I would say the biggest problem is that summary feeds scream spam- fulltext feeds is where it's at.

LazyD · Jan 12, 2008

I hate to bring up an old topic but....

Does this script take forever for anyone else? I actually stopped the script after it ran for 5 minutes with no sign of stopping...

cashflowrusty · Jan 14, 2008

Depends, I have ran it with only one feed, and it 'seemed' to run forever. Actually, I think a refresh just was not working, as the # posts had not gone up. I just clicked the wp-o-matic tab again, and started with the next one.

Now, if you are loading up with 10-50 feeds, then it's going to take some time!

emp · Jan 14, 2008

Never stops for me either, but I also get the impression that the script simply does not do the refresh.

I normally let it run in the back for however long I forget about it and then click on "manage" and.. lo and behold! There are the new articles.

::emp::

Search

Search

Using Google news to fill your Autoblog

smap

New member

nickycakes

Banned

smap

New member

smap

New member

cashflowrusty

New member

smap

New member

Mornington

New member

nickycakes

Banned

Bead

New member

cachemoney

New member

AdHustler

New member

deejne

New member

machinecontrol

юзверь

LazyD

$monies = false;

cashflowrusty

New member

emp

New member

Using Google news to fill your Autoblog

New member

Banned

New member

New member

New member

New member

New member

Banned

New member

New member

New member

New member

&#1102;&#1079;&#1074;&#1077;&#1088;&#1100;

$monies = false;

New member

New member

юзверь