Looking for a great email extractor

avatar33

e-Hustler
Dec 5, 2009
3,839
52
48
Calgary, AB
...that can extract emails from a website (or through a search query) even from web pages that try to avoid being scraped by using formats like:
  • bob at gmail.com
  • bob at gmail dot com
  • bob [at] gmail [dot] com

Drop your suggestion!
 


Regular expressions FTW.

Code:
#!/usr/bin/perl
my $content=<<'END';
bezos at amazon.com blah blah blah chairman [at] unicef.org blah
blah jobs@apple.com filler filler <h2>whatever</h2>
sergey [at] google [dot] com blah blah blah <tag></tag>
im@info.biz text filler marissa.mayer "at" yahoo "dot" com
random text filler bill_gates@mail.msn.com <h2>whee</h2>
<span>blah</span> lastone [at] whatever [dot] com filler
END
my @tlds=qw(com net org info biz);
my @patterns;
foreach my $tld (@tlds) {
   $tld=quotemeta($tld);
   push(@patterns,'(\S+)\s+([[\"])*at[\]\"]*\s+(\S+(\.|\s*[\[\"]dot[\]\"]\s*)'.
               $tld.')'
   );
   push(@patterns,'(\S+)(@)(\S+\.'.$tld.')');
}
foreach my $p (@patterns) {
   foreach my $line (split(/\n/,$content)) {
       while ($line=~s/$p//gi) {
           my $l=$1; my $r=$3;
           $r=~s/\s*[\"\[]dot[\"\]]\s*/\./g;
           print "match! [$l\@$r]\n";
       }
   }
}
 
GSA email spider, 100% effective with a number of smart settings and filters, plus you can add your own (highly unlikely that you will need anything more, but still)

1rx6Son.jpg