The Botting Q&A Thread: Ask Away

What is the best language to write a bot in? I'm talking about efficiency in writing the code and producing a final program. The only experience I have is with C#, but will gladly move to a language that will let me make bots more easily. And these are bots that will fill out those online giveaways etc.
 


What is the best language to write a bot in? I'm talking about efficiency in writing the code and producing a final program. The only experience I have is with C#, but will gladly move to a language that will let me make bots more easily. And these are bots that will fill out those online giveaways etc.

ubot seems to be good, I have no interest in it though. A few of us use Ruby + watir to do stuff, some use splinter with python, and yet others use your c#. use what you know best until you're annoyed and then try something new.
 
ubot seems to be good, I have no interest in it though. A few of us use Ruby + watir to do stuff, some use splinter with python, and yet others use your c#. use what you know best until you're annoyed and then try something new.

^^^ this

I used to write all my bots in PHP (arrrggghhhh what was I thinking) then I switched to C#. I would rather stick pins in my eyes than write code in that cluster fuck of a language.

inb4languagewar
 
^^^ this

I used to write all my bots in PHP (arrrggghhhh what was I thinking) then I switched to C#. I would rather stick pins in my eyes than write code in that cluster fuck of a language.

inb4languagewar
Each to his own. I do 90% of my coding work in C# and think it's the best language out there, but I also recognize that everyone's personal tastes widely vary :) Language wars are stupid. Ultimately everyone should pick that they're the most productive and comfortable with, and use it. After all, a language is just a tool. It's the end product that matters :)
 
Each to his own. I do 90% of my coding work in C# and think it's the best language out there, but I also recognize that everyone's personal tastes widely vary :) Language wars are stupid. Ultimately everyone should pick that they're the most productive and comfortable with, and use it. After all, a language is just a tool. It's the end product that matters :)

I should clarify, I mean PHP is a cluster fuck of a language not C#, I love C#. If you haven't already take, a look at Azure. Gives me half a chubb just thinking about it :)
 
Okay, it's been exactly 7 days since my post and I'm just about ready. I'm going through the last 10 chapters of Learn Ruby the Hard Way after I grab something to eat. I've been able to put in ~5-10 hours/week the past 2 weeks to go through this nice and slow.

If there are any newbies to programming like myself reading this, I recommend checking that guide out. Before I went through LRTHW I had tried to grasp dozens of others and either got lost, confused, unmotivated, or a combination of each. Check out TryRuby as a primer, but I personally wouldn't recommend spending too much time on the interactive tutorials. I found I would just end up trying to beat the level rather than absorbing all the information I could.

Anyways, I've built an ultra-basic foundation of Ruby, so I think I'm ready to get started on the fun automation stuff. The goal is to use Ruby to become more efficient in SEO.

Unless some of the vets here suggest otherwise, I'm going to start with scraping. I had bookmarked these guides that looked pretty cool. I also spoke with Bofu a long while back when getting into this stuff was just a "that would be nice" idea and he dropped me this link. I'll probably revisit dchuk's beginner's guide to scraping with Ruby as well.

Let me know how I can keep this thread alive. I'll keep a list of things I'm having trouble with to split between this thread and StackOverflow.

:thumbsup:
 
Funny, I was thinking the same thing.

Actually I have a question, out of pure curiosity. How many people packet sniff when writing a scraper/bot/whatever?

I use wireshark for trouble shooting, IF LiveHeaders is not enough, but
most of the stuff in wireshark is just plain over my head. Usually, using
LiveHeaders *before* I write any code does the trick.


What is the best language to write a bot in? I'm talking about efficiency in writing the code and producing a final program. The only experience I have is with C#, but will gladly move to a language that will let me make bots more easily. And these are bots that will fill out those online giveaways etc.

I am not a multi-language coder like some of the guys here. I use perl
almost exclusively. Not because it's "the best", but because it's what
I got started with years ago near the end of the dot com bust.

Using just perl is no more efficient than any other language, however,
there are modules on cpan.org that can do some pretty nice things.

One of those is IEAutomation.pm. The beauty of automating IE, (or any
full featured browser), is that there is no need to deal with javascript,
and no need to deal with hidden input tags. IE handles all of that, and I
just need to plug in the same info that a human would type in.

If I had this code on my pc and double clicked it, MSIE would open up
and load http://www.cpan.org/

Code:
# INITIALIZE:
use Win32::IEAutomation;
$ie = Win32::IEAutomation->new( visible => 1, maximize => 1);


# GET A PAGE:
$ie->gotoURL('http://www.cpan.org/');

Once the page is loaded, IEautomation has a nice set of commands to
fill in text boxes and clicking. Very simple, short learning curve.

http://search.cpan.org/search?mode=all&query=IEautomation


Bompa
 
I wrote a few bots using multi threaded typhoeus/nokogiri, but whenever I use a large number of urls (few thousand) ruby eats up all my memory.

I tried making a loop to only do 100 urls at a time and resetting all the variables in between, that delayed the problem a bit but didn't get rid of it.

It's code I don't want to post on a public forum at the moment, but if someone thinks he can help me I wouldn't mind sharing on skype.

Any ideas?
 
I wrote a few bots using multi threaded typhoeus/nokogiri, but whenever I use a large number of urls (few thousand) ruby eats up all my memory.

I tried making a loop to only do 100 urls at a time and resetting all the variables in between, that delayed the problem a bit but didn't get rid of it.

It's code I don't want to post on a public forum at the moment, but if someone thinks he can help me I wouldn't mind sharing on skype.

Any ideas?

it's probably because you're hanging on to the nokogiri parsed pages and it's destroying your ram over time. Look at my Arachnid library, it uses Typhoeus and uses hardly any ram even on quite a bit of urls (it also stores visited urls in a bloom filter which is much much more efficient than a plain array, something you could also try)

http://www.github.com/dchuk/Arachnid
 
Hmm. I had a look at your script before I started coding and it looks quite similar, but I am doing 3 requests after each other per iteration.

Maybe it's something with the file open? Is it saving all those past lines somewhere?

File.open(@source).each do |u|
end

I write the result directly in a text file

File.open(@successful, 'a+') do |f|
f.puts url
end

Could that be a problem? Like generating and saving a file handler every time? Am I supposed to open it once and close it at the end like I would do in PHP?

I feel stupid :/
 
I think I found it. The problem is that the hydra/typhoeus queue just keeps growing until all the urls are in there (using up all the memory). Will look into it later, if I can't find a solution I will post the code.
 
Just started my travel in Ruby with Eloquent Ruby by Russ Olsen [ame="http://www.amazon.com/Eloquent-Ruby-Addison-Wesley-Professional-Series/dp/0321584104/ref=pd_sim_b_5"]Amazon.com: Eloquent Ruby (Addison-Wesley Professional Ruby Series) (9780321584106): Russ Olsen: Books[/ame]

Hope to write something in a week.
 
Code:
def autoapprove_check
    hydra = Typhoeus::Hydra.new(:max_concurrency => @threads) 
    File.open(@cfg.get('aa_working'), 'a+') do |f|
        File.open(@cfg.get('posts_from_rss')).each do |url|
            url = url.chomp
            begin
                blog_info = Typhoeus::Request.new(    url,
                                                       :timeout       => 10000 # milliseconds
                                                    )
                blog_info.on_complete do |response|
                      # get blog info
                      wordpress = Nokogiri::HTML.parse(response.body).xpath('.//form[@id="commentform" and contains(@action, "wp-comments-post.php")]/@action')
                      
                      if(wordpress)
                          post_id = Nokogiri::HTML.parse(response.body).xpath('.//input[@type="hidden"][@name="comment_post_ID"]/@value')
                          comment_parent = 0

                          blog_comment = Typhoeus::Request.new(    wordpress.to_s, 
                                                                  :params => {:comment_post_ID => post_id, 
                                                                              :comment_parent => 0,
                                                                              :author => @cfg.get('harvester_default_anchor'),
                                                                              :email => @cfg.get('harvester_default_email'),
                                                                              :url => @cfg.get('harvester_default_website'),
                                                                              :comment => @cfg.get('harvester_default_comment')
                                                                              },
                                                                  :method => :post,
                                                                  :follow_location => true,
                                                                  :timeout       => 10000 # milliseconds
                                                                )
                          blog_comment.on_complete do |response|
                              check_link = Typhoeus::Request.new(    url,
                                                                   :timeout       => 10000 # milliseconds
                                                                )
                            check_link.on_complete do |response|
                                  links = Nokogiri::HTML.parse(response.body).xpath('.//a[@href="'+@cfg.get('harvester_default_website')+'"]/@href')
                                  if(links.first)
                                        f.puts url
                                  end
                            end
                            hydra.queue check_link
                          end
                          hydra.queue blog_comment
                      end
                end
                hydra.queue blog_info
                rescue URI::InvalidURIError, NoMethodError => e
                end
            end
            hydra.run # problem -> hydra queue too big
        end
    end
Ok, problem is that the hydra queue is too big. How can I split it up so that the script waits with adding new items to the queue until there is space?

I tried making smaller chunks with 100.times and then the hydra.run but the code keeps executing after the hydra run, not waiting for it to finish.

(If you see anything else that could be done better, please comment)
 
I haven't completely read your code, as it looks like a huge mindfuck. I'd suggest trying to split it up a little, into smaller functions.

On another note, you might be interested in my ruby-web library, it's not production ready as such, but it works. If you get stuck, take a look in there and I might have allready implemented what you're trying. (https://github.com/JakeAustwick/ruby-web/blob/master/lib/web.rb).