Ruby Regex(?) Question

abroms

New member
May 17, 2009
257
12
0
Hi all –

So I'm scraping a webpage with ruby, and I get something like this as an output:

Code:
<a onclick="(new Image()).src='/rg/find-title-1/title_popular/images/b.gif?link=/title/tt0317740/';" href="title/tt0317740/">The Italian Job</a>

Now, if all I want is the
Code:
title/tt0317740/
how can I go about extracting that? I've been trying with Nokogiri and random regex patterns I've found online, but I can't manage it.

Thanks!
 


freehanding this one --
Code:
matches = mystr.scan /href="([^"]+)"/
matches.each { |x| puts 'Match:',x[0] }
edit -- just ran it through irb, you're good to go; notice x[0] -- because i used a capture group, str.scan returns an array of arrays (and subarrays are only 1 element long, because i used only 1 capture group)
 
  • Like
Reactions: abroms
freehanding this one --
Code:
matches = mystr.scan /href="([^"]+)"/
matches.each { |x| puts 'Match:',x[0] }
edit -- just ran it through irb, you're good to go; notice x[0] -- because i used a capture group, str.scan returns an array of arrays (and subarrays are only 1 element long, because i used only 1 capture group)

Nice :)

Only a slightly related subnote, irb is really handy for these little situations. Rather than hacking at a script over and over again, just fire up irb and you can keep tweaking until you find what you're looking for.
 
Thanks tons for the help (+rep), but this brings up another problem. I get an error:

Code:
path_to_my_script:13: private method 'scan' called for #<Nokogiri...

with a bit more stuff after that. What's wrong?

Thanks!
 
the method is String::scan ; e.g. the class you're calling it on needs to be of type string. the error your interpreter threw seems to imply that you called SomeNokigiriClass.scan(args) instead; i'm not familiar with nokigiri or wtf you're doing, but figure out how to get the body contents as a string, and then call ThatString.scan(args)

edit -- also, to use irb, open up a command prompt and type 'irb'. voila! you're in a ruby shell. type valid ruby code, see how it works, play with it in real time. hitting "enter" parses the line. not hard, dude :p
 
I just figured out what I was doing wrong. So I came here to post my error in case anyone else makes the same mistake. But uplinked just posted the problem.

I just needed to convert my long object to a string using .to_s, and then I was all set.

Also, the thing that was confusing me about irb was that you can only type one line at a time. I've got it now, haha.

Thanks a ton guys!