The WF Python Functions War Chest



New stuff in python-web, allows you to easily run gevent pools in multiple processes. This is useful if you are hitting 100% cpu before running out of bandwidth, but also simplifies your code, as most stuff is just managed for you.So I'm about to launch a private proxy provider. Part of that requires allocating from available proxies when the user signs up. So given a list of unfilled proxies, I want to give the user the most diverse set possible. That is, I want to avoid non-sequential IPs.
 
WordAI API

Code:
class Api():

    def __init__(self):
        self.wai_slider = 50;
        self.wai_user = 'XXXXXXXXXXXXXXXXXXXXXXXXX';
        self.wai_pass = 'XXXXXXXXXXXXXXXXXXXXXXXXX';
        self.wai_speed = False;
        self.wai_protected = False;
        self.wai_nooriginal = False;

    def wai_spin(self, body):

        import urllib
        import urllib2

        api_url = "http://beta.wordai.com/spinit.php"

        data = {}
        data['s'] = body
        data['slider'] = self.wai_slider
        data['api'] = "true"
        data['email'] = self.wai_user
        data['pass'] = self.wai_pass
        
        if self.wai_speed:
            data['speed'] = self.wai_speed
        
        if self.wai_protected:
            data['protected'] = self.wai_protected

        if self.wai_nooriginal:
            data['nooriginal'] = self.wai_nooriginale

        spin_req = urllib2.Request(api_url, urllib.urlencode(data))
        return urllib2.urlopen(spin_req).read()
 
Code:
class Api():

    def __init__(self):
        self.wai_slider = 50;
        self.wai_user = 'XXXXXXXXXXXXXXXXXXXXXXXXX';
        self.wai_pass = 'XXXXXXXXXXXXXXXXXXXXXXXXX';
        self.wai_speed = False;
        self.wai_protected = False;
        self.wai_nooriginal = False;

    def wai_spin(self, body):

        import urllib
        import urllib2

        api_url = "http://beta.wordai.com/spinit.php"

        data = {}
        data['s'] = body
        data['slider'] = self.wai_slider
        data['api'] = "true"
        data['email'] = self.wai_user
        data['pass'] = self.wai_pass
        
        if self.wai_speed:
            data['speed'] = self.wai_speed
        
        if self.wai_protected:
            data['protected'] = self.wai_protected

        if self.wai_nooriginal:
            data['nooriginal'] = self.wai_nooriginale

        spin_req = urllib2.Request(api_url, urllib.urlencode(data))
        return urllib2.urlopen(spin_req).read()

I think this would be helpful to you:

pyvideo.org - Stop Writing Classes

There's no reason that code couldn't be 3-4 lines and more flexible (how do you pass options to it right now?)
 
how do you pass options to it right now?
I don't, as I just wanted something simple to replace the best spinner code i was using previously

I will (or won't) add relevant functions for getting and setting variables if I need more flexibility

I see what you're getting at anyway and appreciate the link which I'll have a look at now
 
I don't, as I just wanted something simple to replace the best spinner code i was using previously

I will (or won't) add relevant functions for getting and setting variables if I need more flexibility

I see what you're getting at anyway and appreciate the link which I'll have a look at now

I had another stab at it because my wordai function was a rush job:

Code:
import urllib
import urllib2

def spin(content, username, password, slider=50, speed=False, protected=False, nooriginal=False):
    return urllib2.urlopen(urllib2.Request('http://beta.wordai.com/spinit.php', urllib.urlencode({'s': content, 'slider': slider, 'api': 'true', 'email': username, 'pass': password, speed: str(speed).lower(), protected: str(protected).lower(), nooriginal: str(nooriginal).lower()}))).read()

Simple run:
Code:
>>> wordai.spin('hello world today is a nice day', username, password)
'{hello|hi} world today is {a nice|a good|a great|a wonderful} {day|time}'

You pretty much never need setters and getters in python (@property FTW in complex cases). Hope that helps.
 
C'mon just google it, its a programming language

BTW many thanks... I hope in a near future to post my own contributions to this thread!
 
I'm in the process of converting a lot of my PHP stuff to python and I figured someone here might find this useful. I use something like this in PHP in basically all of my projects and the same will go for python. Hope this helps someone out.

Code:
###################################
# A iterable proxy handler class. #
# Author: crackp0t                #
# Filename: proxyhandler.py       #
###################################

from csv import reader

class ProxyHandler:
    # proxyFile: The full path to the file containing the proxies
    # shouldRewind: Boolean variable that controls the file looping. Set to False if you want to raise StopIteration after the first pass.
    #               Set the variable to True (defaults to True) if you want to go back to the start of the file and loop again.
    def __init__(self, proxyFile, shouldRewind = True):
        self._shouldRewind = shouldRewind
        self._proxyFile = proxyFile

        self.__LoadProxies()

    def next(self):
        try:
            return self.proxies.next()
             
        except:
            if self._shouldRewind is True:
                self.__LoadProxies()
                return self.proxies.next()
            else:
                raise StopIteration

    def __iter__(self):
        return self

    def __LoadProxies(self):
        fileContents = open(self._proxyFile).read().splitlines()
        self.proxies = reader(fileContents, delimiter = ':')
 
Also, why does the __LoadProxies method exist when it is only called by __init__ ?

Right now it only support reading in from a file. In the future I'm going to have it read from a file or URL, group by proxy type (socks or http), randomize, etc. I'd rather have all of that stuff go into a load method like __LoadProxies than in the __init__ method. It's really just a matter of preference I guess. I don't think having a big constructor is wrong or right. I just don't like looking at them :p.

I really prefer to have a bunch of small methods as well over a couple of big methods as well. If I can break something into it's own method (if it makes sense) chances are I probably will, because for me it makes testing easier. Testing is something that I find important and being new to python means I don't have any existing code wrote in it. I'd like to have all of my python code tested.
 
if its delay for proxies, its prob so they can "sleep" or not be used within X seconds to prevent an IP ban ( like with Google or other site that notices a certain IP hit it within the last 3 seconds )
 
if its delay for proxies, its prob so they can "sleep" or not be used within X seconds to prevent an IP ban ( like with Google or other site that notices a certain IP hit it within the last 3 seconds )

Ah, that makes sense. I always put that logic else where. Makes sense to put it in there though. The proxies I use are a little different than what most people use I guess. They rotate every X min, access is IP based, and I also have a set (2 /24s) that I own I use for various things.
 
Strip a URL down to just the domain

Here's a function I use to strip URL's down to just their domain.

Code:
 def domain(url):
    """Strip a given URL to get just the domain (removing any
    sub-domains, the path, query, and fragments."""
    
    from urlparse import urlparse
    
    # Extract the netloc
    urlo = urlparse(url)
    
    # Split by period and join based on number of segments
    domain = urlo.netloc.split('.')
    domain = '.'.join(len(domain[-2]) < 4 and domain[-3:] or domain[-2:])
    
    return '%s://%s' % (urlo.scheme, domain)

EDIT: I've got the bbcodes in there but for some reason they aren't being displayed properly. Says BB Code is turned on in the footer for me too :-/
 
Code:
def function_cache(function, cache_days=5, call_function=True, **kwargs):
    function_name = function.func_name
    recent_days = [str(datetime.date.today() - datetime.timedelta(days=i)) for i in range(cache_days)]
    kwargs_str = re.sub(' at 0x[0-9a-z]+', '', str(kwargs)) #get rid of memory references
    kwargs_hash = hashlib.sha1(kwargs_str).hexdigest()
    for recent_day in recent_days:
        filename = 'function_cache/{day}-{function_name}-{hash}.pickle'.format(day=recent_day, function_name=function_name, hash=kwargs_hash)
        if os.path.isfile(filename):
            return pickle.load(open(filename, 'rb'))
    if call_function:
        result = function(**kwargs)
        pickle.dump(result, open('function_cache/{day}-{function_name}-{hash}.pickle'.format(day=recent_days[0], function_name=function_name, hash=kwargs_hash), 'wb'))
        return result
    else:
        return False
generic is good.
massive speedups are good.
*args support would be good, maybe.
could be adapted to use redis.
meta is better.

Disclaimer, may need SSD if you use it hard.

Jake232 doesn't understand it, so it must be good ;)
 
Nice to see there is a python code in this forum, usually I just found PHP thingy... :)

I want to ask to @mattseh about this code..

Code:
import web

what module is that? definitely its not web.py isn't it?