Yo 1k post.
With nothing to lose a non-programmer would like to talk about programming.
Background and setup to a tirade of laughable attempts at future coding.
<coolstorybro>
A few weeks ago I got 2013 and a chunk of 2014 taken care of and decided to dive into programming. Other than overseeing a few things the majority of the day to day is taken care of leaving me with significant free time. I do not have any commercial intent for the new skilzs I just enjoy it. Call it my crossword puzzle. I’m not a natural. I don’t get a lot of it the first time around and some of it probably never. I ride the coding short bus but can count to potato in binary and hex so let’s just call it a hobby. I never plan on earning a living from programming I just find it fun for some reason.
Only other coding experience worth mentioning was a couple of C flavors a long time ago with a drop into assembly on occasion. Never did anything with it and definitely never got proficient to any degree. I remember almost nothing so it doesn’t count. The last few years have all been PHP for the IM projects. I could usually take existing code and hack it together to do what I want but again I never considered myself particularly good at it.
Deciding on a language this time around was not very efficient. Initially it was between Ruby or Python. Then mat or someone else would start a GO kind of thread and off I would go chasing that possibility for a day or so. In the end it kept coming back to Ruby or Python. After too much time I finally realized what some of the people that didn’t have a religious type bias to a particular language were saying. The language itself isn’t all important, it’s how you think about the challenge and how you solve it. Syntax or libraries may not matter it is you that matters. A better way to think about the problem is what I’m after. You can spend days reading articles on Ruby vs. Python, I never saw a clear winner. If anyone else is undecided look carefully at the dates on some of those comparisons. If they are claiming Ruby is slow that appears to not be the case in later versions if it ever was an issue. idk
Finally I decided to go to one of the well written tutorial sites that offered courses in both languages. I put Ruby on one screen and Python on the other. I would flip between languages on the same subject to see how both handled same at the beginner level. After a couple of hours Python seem to talk to me a little closer than Ruby. I have full intentions of revisiting Ruby at some point until then Python it is.
One of many mistakes I made with PHP was because of having almost instant success with it. I never really took the time to figure out why my Frankenstein code worked. It did what I wanted and I kept going. I was using frameworks before I fully understood classes. I used classes before I properly understood procedural. I don’t think I ever typed more than five lines without a syntax error but I never got totally stumped. It just worked for what I wanted atm.
This time with Python I was determined not to hack from day one but get the fundamentals down. I looked at many different books, tutorials, methods and teaching styles. I finally decided to start with Learn Python the Hard Way . I’ve never been in the military but this is what I picture programming boot camp would be like. I followed the rules and typed every single character like instructed. Not one copy and paste. For me the result was getting up to speed much faster. The extra time spent in the beginning paid off many times in not having to constantly lookup the basics over and over in reference. Getting the syntax right alone was worth the bruises. In comparison the hipster style of Head First Labs from O'Reilly Media, Inc. :: Head First Python wouldn’t let me get past the second chapter. To each his own.
I went through many tutorials and books as I started my first project. Again no commercial intent just an interest that needed a goal. I started with learning how to scrape and didn’t get very far. Not to trivialize the action but the actual retrieval of the data I quickly lost interest in. What got me was interacting with the website. I liked learning how to click on buttons that didn’t originally exist until a piece of JS / Ajax dynamically created them. Multiple menus hidden until the mouse hover displayed the link sometimes 2 or 3 tiers deep. Then I found asynchronousity. (I don’t think that is a word but I like saying it) I got off on multiple workers carrying out my instructions from a single console.
Since I was on this interacting like a human kick I started doing it with headless browsers and watched the server resources drop dramatically. Then I started creating accounts on different websites and doing the opposite of scraping. Instead of hit and runs with rotating proxies the objective was to create long term accounts where each had it’s on profile; Login ID, Password, IP, User Agent. Interacting with these websites like a human, not loading 20 pages a second. Each worker traversing hundreds of pages a day where the options to go next may be dynamically created on the fly depending on what choices you made on the current page.
Some of the websites I attempted to create long term accounts with specifically prohibit in their TOS a user with multiple accounts. Which is the reason I chose them. This is what interest me atm. The game is they try and detect me and I try to see how long I can go perceived as a human. In the beginning all workers got banned. I got better at blending in as a human with better control of the actions of each worker. Better use of different IP’s and UA’s to not link the accounts. With each improvement came greater worker longevity. This is where I’m at now. There is a lot more to learn.
</coolstorybro>
Future posts in this thread I would like to talk about where to go from here. Even if I end up talking to myself typing this out is helpful. Any advice would be appreciated and I will be glad to answer anything that I can on a novice level. A few off the top of my head in no particular order.
1) What resources, methods would you recommend to become a better programmer? I eventually get the job done and aware there may be more than way to do something right. Any suggested links, guidelines that would help me become more proper? When you read someone else’s code what irritates you that I can avoid?
2) Refactoring code for more server efficiency. Currently the way I use servers I red line the CPU then a distant second is RAM. Storage and bandwidth never come into play. I know this is too vague but at some point would like to have a conversation on threading (aware of GIL) / event based best practices for better use of server resources.
3) Beyond the login credentials and proxy / ua there are many other methods a website can find out information from you. Dealing with this is high on my list.
4) Proxies. Jesus Christ what is wrong with these people? Most that I’ve dealt with are in serious need of a realignment. How hard can creating a proxy server be? I want to try this. Not ever for re-sale, internal use only. I’m currently buying a couple hundred private proxies each month.
5) EC2 and other cloud based solutions. I’ve got a few Linodes and couldn’t be happier but would like to experiment. Get a baseline on ease of scaling vs. cost.
6) OCR and pre-filtering before it gets to the OCR engine.
7) Automated unit / suite testing, constant integration, Git kind of system flow.
8) Transitioning from 2.7.x to 3.x
Obviously I’m at the shallow end of the coding pool and if I ever appear to know much about the subject it was unintentional. Any replies, flames, ignore is all good.
Cheers,
Up
With nothing to lose a non-programmer would like to talk about programming.
Background and setup to a tirade of laughable attempts at future coding.
<coolstorybro>
A few weeks ago I got 2013 and a chunk of 2014 taken care of and decided to dive into programming. Other than overseeing a few things the majority of the day to day is taken care of leaving me with significant free time. I do not have any commercial intent for the new skilzs I just enjoy it. Call it my crossword puzzle. I’m not a natural. I don’t get a lot of it the first time around and some of it probably never. I ride the coding short bus but can count to potato in binary and hex so let’s just call it a hobby. I never plan on earning a living from programming I just find it fun for some reason.
Only other coding experience worth mentioning was a couple of C flavors a long time ago with a drop into assembly on occasion. Never did anything with it and definitely never got proficient to any degree. I remember almost nothing so it doesn’t count. The last few years have all been PHP for the IM projects. I could usually take existing code and hack it together to do what I want but again I never considered myself particularly good at it.
Deciding on a language this time around was not very efficient. Initially it was between Ruby or Python. Then mat or someone else would start a GO kind of thread and off I would go chasing that possibility for a day or so. In the end it kept coming back to Ruby or Python. After too much time I finally realized what some of the people that didn’t have a religious type bias to a particular language were saying. The language itself isn’t all important, it’s how you think about the challenge and how you solve it. Syntax or libraries may not matter it is you that matters. A better way to think about the problem is what I’m after. You can spend days reading articles on Ruby vs. Python, I never saw a clear winner. If anyone else is undecided look carefully at the dates on some of those comparisons. If they are claiming Ruby is slow that appears to not be the case in later versions if it ever was an issue. idk
Finally I decided to go to one of the well written tutorial sites that offered courses in both languages. I put Ruby on one screen and Python on the other. I would flip between languages on the same subject to see how both handled same at the beginner level. After a couple of hours Python seem to talk to me a little closer than Ruby. I have full intentions of revisiting Ruby at some point until then Python it is.
One of many mistakes I made with PHP was because of having almost instant success with it. I never really took the time to figure out why my Frankenstein code worked. It did what I wanted and I kept going. I was using frameworks before I fully understood classes. I used classes before I properly understood procedural. I don’t think I ever typed more than five lines without a syntax error but I never got totally stumped. It just worked for what I wanted atm.
This time with Python I was determined not to hack from day one but get the fundamentals down. I looked at many different books, tutorials, methods and teaching styles. I finally decided to start with Learn Python the Hard Way . I’ve never been in the military but this is what I picture programming boot camp would be like. I followed the rules and typed every single character like instructed. Not one copy and paste. For me the result was getting up to speed much faster. The extra time spent in the beginning paid off many times in not having to constantly lookup the basics over and over in reference. Getting the syntax right alone was worth the bruises. In comparison the hipster style of Head First Labs from O'Reilly Media, Inc. :: Head First Python wouldn’t let me get past the second chapter. To each his own.
I went through many tutorials and books as I started my first project. Again no commercial intent just an interest that needed a goal. I started with learning how to scrape and didn’t get very far. Not to trivialize the action but the actual retrieval of the data I quickly lost interest in. What got me was interacting with the website. I liked learning how to click on buttons that didn’t originally exist until a piece of JS / Ajax dynamically created them. Multiple menus hidden until the mouse hover displayed the link sometimes 2 or 3 tiers deep. Then I found asynchronousity. (I don’t think that is a word but I like saying it) I got off on multiple workers carrying out my instructions from a single console.
Since I was on this interacting like a human kick I started doing it with headless browsers and watched the server resources drop dramatically. Then I started creating accounts on different websites and doing the opposite of scraping. Instead of hit and runs with rotating proxies the objective was to create long term accounts where each had it’s on profile; Login ID, Password, IP, User Agent. Interacting with these websites like a human, not loading 20 pages a second. Each worker traversing hundreds of pages a day where the options to go next may be dynamically created on the fly depending on what choices you made on the current page.
Some of the websites I attempted to create long term accounts with specifically prohibit in their TOS a user with multiple accounts. Which is the reason I chose them. This is what interest me atm. The game is they try and detect me and I try to see how long I can go perceived as a human. In the beginning all workers got banned. I got better at blending in as a human with better control of the actions of each worker. Better use of different IP’s and UA’s to not link the accounts. With each improvement came greater worker longevity. This is where I’m at now. There is a lot more to learn.
</coolstorybro>
Future posts in this thread I would like to talk about where to go from here. Even if I end up talking to myself typing this out is helpful. Any advice would be appreciated and I will be glad to answer anything that I can on a novice level. A few off the top of my head in no particular order.
1) What resources, methods would you recommend to become a better programmer? I eventually get the job done and aware there may be more than way to do something right. Any suggested links, guidelines that would help me become more proper? When you read someone else’s code what irritates you that I can avoid?
2) Refactoring code for more server efficiency. Currently the way I use servers I red line the CPU then a distant second is RAM. Storage and bandwidth never come into play. I know this is too vague but at some point would like to have a conversation on threading (aware of GIL) / event based best practices for better use of server resources.
3) Beyond the login credentials and proxy / ua there are many other methods a website can find out information from you. Dealing with this is high on my list.
4) Proxies. Jesus Christ what is wrong with these people? Most that I’ve dealt with are in serious need of a realignment. How hard can creating a proxy server be? I want to try this. Not ever for re-sale, internal use only. I’m currently buying a couple hundred private proxies each month.
5) EC2 and other cloud based solutions. I’ve got a few Linodes and couldn’t be happier but would like to experiment. Get a baseline on ease of scaling vs. cost.
6) OCR and pre-filtering before it gets to the OCR engine.
7) Automated unit / suite testing, constant integration, Git kind of system flow.
8) Transitioning from 2.7.x to 3.x
Obviously I’m at the shallow end of the coding pool and if I ever appear to know much about the subject it was unintentional. Any replies, flames, ignore is all good.
Cheers,
Up