Need some alpha testers for my hosted browser automation/scraping system



Actually, I do need to do some scraping. Don't need it this minute, but I can shove it in a database for future use, so sign me up as a tester as well if wanted. I still don't quite understand the concept, so will hold my criticism until I get to play. Not sure how this is going to be quicker, better or more efficient than just writing a quick Perl script myself.

BTW... how much is this bad boy going to cost down the road?
 
Actually, I do need to do some scraping. Don't need it this minute, but I can shove it in a database for future use, so sign me up as a tester as well if wanted. I still don't quite understand the concept, so will hold my criticism until I get to play. Not sure how this is going to be quicker, better or more efficient than just writing a quick Perl script myself.

BTW... how much is this bad boy going to cost down the road?

Hey Matt, ok so... Systemizer is ultimately going to be a sort of universal automation Lego kit for the web, in a sense, but with extra love for internet marketing folks. The first component of the product is what we're testing here, and it's basically browser automation in a real Chromium-based browser, and operated via a REST API so there's no setup on your end other than scripting up the calls you want and getting back results. You can operate your browser session in real time, responding to command results as they are returned, and you can run multiple sessions in parallel if you want, with each session completely private and isolated from other sessions, so no cookie interference between sessions or anything like that. This first release is an alpha version, so there's plenty of room for responding to feedback and suggestions.

Regarding price, hell I have no idea. Initially I just want to get a sense of how heavily, and in what ways, people will use it. People who helped me test early on won't pay a whole lot, in any case.
 
I just want to emphasise, that as a person who scrapes, this is interesting because it is a real browser.

Javascript.
CSS.
Images.

Modern scraping is growing beyong a simple GET in a lot of cases.
 
I can try, I have high itrader, and have been around! I also currently automate tasks with ubot, and php but would love a more stable way of doing things. Feel free to message me whenever!
 
I've now finished all the documentation I intended to write for the alpha version. Tonight after work I intend to try and finish up the context extraction engine, then I will deploy the docs and updated version and you can all have at it. Sorry for the delay! (When I started this thread, I might have underestimated how much I had left to do to make it testable by people other than myself...)
 
I just want to emphasise, that as a person who scrapes, this is interesting because it is a real browser.

Javascript.
CSS.
Images.

Modern scraping is growing beyong a simple GET in a lot of cases.

Ohhhh... now that could definitely come in handy. So this scraper will actually execute the Javascript? I don't care about CSS or images, because those can be easily scraped too.

However, I have gotten stuck a few times when writing bots due to Javascript dynamically changing form field values, and things of that sort. This will resolve that? If so, kick ass!
 
Ohhhh... now that could definitely come in handy. So this scraper will actually execute the Javascript? I don't care about CSS or images, because those can be easily scraped too.

However, I have gotten stuck a few times when writing bots due to Javascript dynamically changing form field values, and things of that sort. This will resolve that? If so, kick ass!

Yep, this is a fully cloud-scalable Chromium-based browser. You create your own private session then open windows using the REST API and automate away, with everything you'd expect a real Chrome browser to do to the pages within the windows you open.
 
Ohhhh... now that could definitely come in handy. So this scraper will actually execute the Javascript? I don't care about CSS or images, because those can be easily scraped too.

However, I have gotten stuck a few times when writing bots due to Javascript dynamically changing form field values, and things of that sort. This will resolve that? If so, kick ass!

doyouevenscrapebro?
 
Please consider adding xpath support (and common usages like internal links, title etc) and regex (it's useful for some stuff chuckers)
 
Please consider adding xpath support (and common usages like internal links, title etc) and regex (it's useful for some stuff chuckers)

Where we're going, you don't need xpath support...

80155_doc_brown.jpg


Regexes are supported though.
 
We have liftoff!

I have just deployed the updated API, background workers and browser API documentation.

Go here: Api Documentation « Systemizer API

1. Read the front page!
2. Look at the Account API documentation for instructions on how to create an account
3. Look at the Session API documentation for instructions on creating a session
4. Read the Browser API section for instructions on how to automate your browser

Ask me any questions you like, preferably in this thread. Expect it to crash as soon as someone does something I didn't expect - remember, this is an ALPHA test. That's not even beta!
 
Get a 404 error upon clicking the e-mail activation link:

http://systemizer.net/confirm-email/sdgdsgsd

I've emailed you. Your account should work now, though I inadvertently scrambled your password due to MongoHQ's interface which lets you edit records but then screws up binary data when you hit save. I'm about to fix the bug so I don't have to manually do anything from MongoHQ, but in the mean time, your account should be working. On the bright side, the password is reserved for future use anyway, so not having one at this stage is no problem and when it's required, the system will email you and ask you to set one, so it's nothing to worry about right now.
 
Alright, signup issues are solved! Well at least they appear to be. I've stepped through the whole process. So anyone else who wants to sign up, please feel free. Alpha tester accounts have a refill rate of 50 browser sessions per day, with a starting quota of 250 sessions just to get you started. After you go below 50 sessions remaining, the cap won't refill past 50. These values can be tweaked per account, so let me know if you need yours adjusted and we can discuss that.
 
Anyone have any interest in node.js support for this? My other project uses node.js so I've been building a client library for it, for my own needs.
 
Almost all of my time is spent building web apps and I get very little practice doing anything else.

So, for fun, I hacked together the start of a Ruby CLI client so that dchuk can finally register.

Code:
$ systemizer account --create
Steps you through a little parameter wizard.
L5b4o.png


Unorganized source so far: https://gist.github.com/4507671

Cool stuff. Thought it'd be vaporware for sure.

If anyone wants to try it, you have to `gem install ___` the gems at the top of the file, then run it with `ruby filename.rb account --create`.