goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Tuesday, April 29, 2008

Python - RRDTool Utilities (module and scripts for RRDs)

I started a project on Google Code to create a set of Python tools to make dealing with Round Robin Databases (RRD) less painful.  Setting up RRD's can be tough if you don't know what you are doing.

anyone interested can check it out here:  rrdpy

I used it to create a simple HTTP monitoring script (included in source) to graph web response latency like this:

#    Comments [4] |
 Thursday, April 24, 2008

Joined Twitter

I just joined Twitter.  I have no idea how much I'll use it, but I guess I need to see what all the [good/bad] hype is about.

Follow me there: twitter.com/cgoldberg



#    Comments [0] |
 Wednesday, April 16, 2008

Python - Slurping CSV Files Into Nested Lists

When working with data sets, a common task I need to do is slurp a csv file into a nested data structure that contains a sequence of lists correlating to the rows and values in the csv file.

For example...

File contents (foo.csv):

10,20,30,40
19,29,39,55
16,21,31,59

Result:

[['10', '20', '30', '40'],
['19', '29', '39', '55'],
['16', '21', '31', '59']]

To accomplish this. you could parse it inside a big honkin' list comprehension and build our structure in one step:

csv_file = 'foo.csv'
value_lists = [line.split(',') for line in
[line.strip() for line in open(csv_file, 'r').readlines()]]

You could also use the csv module from Python's standard library:

import csv
csv_file = 'foo.csv'
value_lists = list(csv.reader(open(csv_file, 'r')))

The csv module has some useful tools for reading/writing csv files.  Check it out.

#    Comments [1] |

Developer/Testers Are Hard To Find

Jesse Noller just blogged "Finding Python people is hard":

Here is a good quote regarding the difficulty in finding skilled Test Engineers with Python experience:

"Either you teach QA people automation/test engineering, or you try to find a programmer who wants to learn/do test engineering and teaching them python. It's a hard sell either way. I technically view QA as one discipline, Development as another, but Test Engineering as the Hybrid of the two - and you need a strong background in both."

I have seen lots of QA Engineers and Testers with little to no development/programming experience. This seems to be such a valuable skill; why not learn some? The bar is set really low with today's dynamic languages. Getting into some quick scripting for data manipulation and building test harnesses is not a huge task. If a QA engineer can't learn some simple programming in a week, would you trust his efficiency and technical skills?

I agree with Jesse on this one. We need to see more Test Engineers and Developers In Test. Unfortunately, this hybrid roll often falls through cracks as many people view quality/testing vs. developing as a binary choice.

#    Comments [5] |
 Monday, April 14, 2008

Pylot 1.1 - New Release With Test Case Recorder

New Pylot 1.1 release is available
Visit: www.pylot.org/download.html

It contains some minor code cleanup and a new test case recorder contributed by David Solomon. The recorder works with Windows and IE only.

It is a script that launches your web browser and records HTTP requests as you navigate. While it records, it prints Pylot's XML test cases. The test cases are printed to STDOUT, so just redirect it to a file and you will have a valid testcases.xml file to use as Pylot input.

The pylot_recorder script is included in the lib directory of Pylot 1.1.

View the recorder's source code from the SVN trunk:
http://code.google.com/p/pylt/source/browse/trunk/lib/pylot_recorder.py

(It can't handle some complex scenarios, but is useful for recording simple GET and POST requests from web applications)

#    Comments [4] |
 Thursday, April 10, 2008

Split A List Into Roughly Equal Sized Pieces

The Python Cookbook has a recipe for splitting a list into roughly equal-sized pieces:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/425397

In the comments, there are several alterate implementations. Sebastian Hempel has an interesting take on it using slicing for the calculation of the list lengths. It basically looks like this:

def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

This version of the function distributes the remaindered items evenly over the first few splits.

Example Usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
#    Comments [2] |
 Wednesday, April 09, 2008

Python - Host/Device Ping Utility for Windows

This script uses your system's ping utility to send an ICMP ECHO_REQUEST to a list of hosts or devices. It uses a separate thread to ping each host/device. This can be useful for measuring network latency and verifying hosts are alive.

Check out more info here: http://www.goldb.org/python_pinger.html


#!/usr/bin/env python

import re
from subprocess import Popen, PIPE
from threading import Thread


class Pinger(object):
    def __init__(self, hosts):
        for host in hosts:
            pa = PingAgent(host)
            pa.start()
        
class PingAgent(Thread):
    def __init__(self, host):
        Thread.__init__(self)        
        self.host = host

    def run(self):
        p = Popen('ping -n 1 ' + self.host, stdout=PIPE)
        m = re.search('Average = (.*)ms', p.stdout.read())
        if m: print 'Round Trip Time: %s ms -' % m.group(1), self.host
        else: print 'Error: Invalid Response -', self.host
              
                             
if __name__ == '__main__':
    hosts = [
        'www.pylot.org',
        'www.goldb.org',
        'www.google.com',
        'www.this_one_wont_work.com'
       ]
    Pinger(hosts)

Output:

Round Trip Time: 14 ms - www.yahoo.com
Round Trip Time: 17 ms - www.goldb.org
Round Trip Time: 30 ms - www.google.com
Round Trip Time: 82 ms - www.pylot.org
Error: Invalid Response - www.this_one_wont_work.com

Note: I only tested this on Windows. To run on other systems, it would only require a one-line change.

#    Comments [3] |
 Thursday, April 03, 2008

Python - Script - Which Webserver Does That Site Run?

You can use this little Python function to see what type of web server a site is running.  All it does is send an HTTP request to the host and reads the 'server' header in the response.


import httplib

def get_server_type(host):
    conn = httplib.HTTPConnection(host)
    conn.request('GET', '/')
    resp = conn.getresponse()
    return resp.getheader('server')


print get_server_type('www.pylot.org')
print get_server_type('www.techcrunch.com')

Output:

lighttpd/1.4.19
Apache/2.0.52


Note: This doesn't work for all sites

#    Comments [7] |
 Thursday, March 20, 2008

Transitioning To Python From Java or C#

"compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain."
    - Phillip J. Eby

If you are new to Python and coming from Java (or C#, or other similar statically typed OO language), these classic articles from PJE and Ryan Tomayko are necessary reading:

#    Comments [0] |
 Wednesday, March 19, 2008

Pylot Dev Update - Web Performance - Release 1.0

Finally did the version 1.0 release! visit www.pylot.org to download.

Pylot is still lacking some features I want to add for it to become a serious performance/load testing tool, but the current release delivers very usable functionality.

Current Features:

  • multi-threaded load generator
  • HTTP and HTTPS (SSL) support
  • response verification with regular expressions
  • execution/monitoring console
  • real-time stats
  • results reports with graphs
  • GUI mode
  • shell/console mode
  • cross-platform

Aside from the GUI, there is also a new shell/console interface mode with real-time output for quickly profiling performance your application/service under test from the command line. In this mode, Pylot can run cross-platform. (tested on Windows XP, Vista, Cygwin, Ubuntu, MacOS)

Note: Extra special thanks to Vasil Vangelovski for implementing the original console output and C++ extension


Screenshots of the GUI and new shell/console UI output:





#    Comments [0] |
 Tuesday, March 04, 2008

Python - Bytes Received and Transmitted for Windows

This script will output bytes received and transmitted for a local Windows machine since the last reboot:


import re
from subprocess import Popen, PIPE

p = Popen('net statistics workstation', stdout=PIPE)
for line in p.stdout:
    m = re.search('Bytes received\W+(.*)', line)
    if m: print 'Bytes received: %s' % (m.group(1))
    m = re.search('Bytes transmitted\W+(.*)', line)
    if m: print 'Bytes transmitted: %s' % (m.group(1))
#    Comments [0] |

Python - Get Last Windows Reboot Date/Time

This script will output the last reboot date/time for a local Windows machine:


import re
from subprocess import Popen, PIPE

p = Popen('net statistics workstation', stdout=PIPE)
m = re.search('(\d+/\d+/\d{4}.*[A|P]M)', p.stdout.read())
if m: print 'Last Reboot: %s' % (m.group(1)) 

Output:

>> Last Reboot: 3/1/2008 1:51:41 PM





* updated the original script thanks to Ian's comment below

#    Comments [2] |
 Monday, March 03, 2008

Python - Padding Single Digits In Dates

Here is how to zero-pad single digit days or months in a date string:


date = '3/2/2008'
padded_date = time.strftime('%m/%d/%Y', time.strptime(date,'%m/%d/%Y'))
print padded_date
>> 03/02/2008
#    Comments [2] |
 Sunday, March 02, 2008

Python - Palindrome Checker

A palindrome is a sequence that reads the same in either direction.

Here is function I wrote to check if a phrase is a palindrome:


import re

def is_palindrome(txt):
    txt = re.sub('\W+', '', txt).lower()
    return txt == txt[::-1]



phrase = "Go hang a salami, I'm a lasagna hog"
print is_palindrome(phrase)

>> True
#    Comments [4] |
 Tuesday, February 12, 2008

Python - 15 Line HTTP Server - Web Interface For Your Tools

I write a lot of command line tools and scripts in Python. Sometimes I need to kick them off remotely. A simple way to do this is to launch a tiny web server that listens for a specific request to start the script.

I add a "WebRequestHandler" class to my script and call it from my main method. There is a "do_something()" method in the class. You call your code from this method.

All you have to do is launch your script and it will sit there and wait for requests. If the request is bad, it spits back a 404 error. If the request path matches what we are looking for (in this case "/foo"), the code is launched.

Now you have an easy way to call your script remotely. Just open a browser and type in the URL: http://your_server/foo, or call it with a tool like 'wget' or 'curl'.


import BaseHTTPServer

class WebRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/foo':
            self.send_response(200)
            self.do_something()
        else: 
            self.send_error(404)
            
    def do_something(self):
        print 'hello world'
        
server = BaseHTTPServer.HTTPServer(('',80), WebRequestHandler)
server.serve_forever()

(this was adapted from a code sample in "Python In A Nutshell" by Alex Martelli)

#    Comments [1] |
 Monday, February 11, 2008

Python - Terminating Threads - Boolean Flag and threading.Event()

In many programming languages you can't terminate a thread directly.  Python is no different.  Rather than termintaing a thread from the code that spawned it, you just a pass a flag to the thread that tells it to terminate itself.  Typically a thread will run in a loop, periodically checking this flag so it knows if it should continue or not.  To terminate the thread from the outside, you just set its flag to die.

I was using this idiom in Python by setting a boolean flag in my spawned thread.

So a simplified thread class would look something like this:


class MyThread(threading.Thread):
    def __init__(self, num):
        threading.Thread.__init__(self)
        self.running = True
        self.num = num
        
    def stop(self):
        self.running = False
        
    def run(self):
        while self.running:
            print 'hello from thread %d' % self.num
            time.sleep(1)

I just read an old post in comp.lang.python that pointed to a recipe in the Python Cookbook that suggests using threading.Event() rather than a simple boolean flag.

So the thread class would look something like this:


class MyThread(threading.Thread):
    def __init__(self, num):
        threading.Thread.__init__(self)
        self.stop_event = threading.Event()
        self.num = num
        
    def stop(self):
        self.stop_event.set()
        
    def run(self):
        while not self.stop_event.isSet():
            print 'hello from thread %d' % self.num
            time.sleep(1)

They work exactly the same.

I am just wondering what other flexibility threading.Event() gives you, and if there is anything bad about using simple boolean checks to kill threads. I guess I will have to look it up and play around a bit.

#    Comments [5] |
 Sunday, February 10, 2008

Rockin' Python 3000 Alpha (3.0a2)

I just installed the latest Alpha of Python 3000.

So far so good...

#    Comments [0] |
 Tuesday, February 05, 2008

Python - Convert Secs Into Human Readable Time String (HH:MM:SS)

Convert a number of seconds into a human readable time string HH:MM:SS

7046 seconds is: 1 hour 57 mins 26 secs, or 01:57:26

The Function:

def humanize_time(secs):
mins, secs = divmod(secs, 60)
hours, mins = divmod(mins, 60)
return '%02d:%02d:%02d' % (hours, mins, secs)

The Output:

print humanize_time(7046)
>> 01:57:26
#    Comments [5] |
 Saturday, February 02, 2008

wxPython Installer on Windows Vista?

Bug Report To LazyWeb:

Has anyone had success installing wxPython on Windows Vista using the binary installer package?  I get a generic Windows error and the install crashes.  I'm running Python 2.5 and trying to install wxPython 2.8 (wxPython2.8-win32-ansi-2.8.7.1-py25.exe)

I have never tried wx on Vista.  Has anyone else encountered this?


Update 03/05/08: the installer now works fine on Vista

#    Comments [6] |
 Tuesday, December 18, 2007

The Python Papers - Screen Scraping Article

The new issue of the Python Papers is out.  It includes a small article I wrote called: Screen Scraping Web Pages

The issue can be downloaded here:  The Python Papers, Volume 2, Issue 4 (pdf)

This tutorial shows how to programmatically retrieve a stock quote from Google Finance.  It uses Python's high level Web API and screen scraping with regular expressions.
#    Comments [2] |
 Monday, December 17, 2007

Python Experts - Why They Do Python

I was recently interviewed for the article:
Python Experts - Why They Do Python

I don't think I am even close to an "expert", but it was nice being asked to participate.

#    Comments [0] |
 Tuesday, November 27, 2007

Python - Extracting Files From Zip Archives

Here is a way to unzip files in Python.  If you have a zip containing multiple files, you can unzip it like this:

import zipfile

fh = open('foo.zip', 'rb')
z = zipfile.ZipFile(fh)
for name in z.namelist():
outfile = open(name, 'wb')
outfile.write(z.read(name))
outfile.close()
fh.close()
#    Comments [6] |
 Monday, November 26, 2007

wxPython - Hello World!

Here is a simple example for those getting started with Python GUI Programming, wxWidgets, and the wxPython Bindings.

This small program will display a Frame and the static text "Hello World!", positioned with a BoxSixer.

Output looks like this:



#!/usr/bin/env python

import wx

class Application(wx.Frame):
    def __init__(self, parent):
        wx.Frame.__init__(self, parent, -1, 'My GUI', size=(300, 200))
        panel = wx.Panel(self)
        sizer = wx.BoxSizer(wx.VERTICAL)
        panel.SetSizer(sizer)
        txt = wx.StaticText(panel, -1, 'Hello World!')
        sizer.Add(txt, 0, wx.TOP|wx.LEFT, 20)
        self.Centre()
        self.Show(True)

app = wx.App(0)
Application(None)
app.MainLoop()
#    Comments [0] |
 Wednesday, November 14, 2007

Regex Capture Groups In Python and Perl

I am a Python programmer and ex-Perl hacker.

Regular Expressions are possibly the quintessential feature of Perl and are directly part of the language syntax.

Rather than being part of the syntax, Python's Regular expressions are available via the 're' module. For some reason, I had some trouble figuring out matching groups when I first started using Python's Regular Expressions.

He are examples of extracting capture groups in both Perl and Python.

Lets say we have a string containing a date: '11/14/2007', and we want to capture only the year from this string.

A regex to match this format might be something like this:

[0-9]{2}/[0-9]{2}/[0-9]{4}

We can then put parenthesis around the piece we want to extract (the 4-digit year) to denote a capture group.

So now our regex would look like this:

[0-9]{2}/[0-9]{2}/([0-9]{4})


Perl Example:

$foo = '11/14/2007';

if ($foo =~ m^[0-9]{2}/[0-9]{2}/([0-9]{4})^) {
    print $1;
}

output:

2007

* Note the string we captured ended up in the special variable $1


Python Example:

import re

foo = '11/14/2007'

match = re.search('[0-9]{2}/[0-9]{2}/([0-9]{4})', foo)
if match:
    print match.group(1)

output:

2007

* Note the string we captured ended up in a match object, which can be accessed with the 'group()' method.

#    Comments [6] |
 Wednesday, November 07, 2007

Python - Processing Large Text Files One Line At A Time

I want to process some very large text files one line at a time.  Normally when I process text files, I slurp them into a list using the readlines() method.   However, sometimes the files are huge and it isn't feasible or optimal to read the entire content into memory upfront.   In this case, it makes sense to process them one line at a time.

The best solution I can come up with is this:


fh = open('foo.txt', 'r')
line = fh.readline()
while line:
    # do something here
    line = fh.readline()

It doesn't feel very pythonic/idiomatic.  Anyone have a better solution?


Update
Thanks to the comments below, I found a few different ways to do it. The best and most Pythonic way seems to be this:


for line in open('foo.txt', 'r'):
    # do something here

Python file objects support the iterator protocol, so you can just open it and go.   This is the same as using a while loop and calling readline() but more compact.

#    Comments [7] |
 Wednesday, October 31, 2007

Which Version Of Python Ships With Mac OS X Leopard?

I am not a Mac user, but in case anyone is interested in knowing which version of Python ships with OS X Leopard, the answer is Python 2.5.

#    Comments [0] |
 Wednesday, October 24, 2007

Python - List Comprehensions Leak Variables

One thing to remember when using List Comprehensions is that they "leak" their temporary iteration variable to the outside.

what does that mean?

In the following example, we still have access to 'x' after we run the list comprehension.

foo = ['a', 'b', 'c']
my_list = [x for x in foo]
print x

output:
>> c

This behaviour is different from how a Generator Expression works. We could have wrote the List Comprehension using a Generator Expression like this:

my_list = list(x for x in foo)

Now, the temporary variable we used is not accessible from outside the scope of the expression.

foo = ['a', 'b', 'c']
my_list = list(x for x in foo)
print x

output:
>> NameError: name 'x' is not defined

Note: This is fixed in Python 3000

#    Comments [5] |
 Sunday, October 14, 2007

Python - Simple Multithreaded HTTP Load Generator/Timer

This is a module for generating concurrent requests to an HTTP server.  Each thread makes HTTP GET requests to a single URL at the specified interval.  Threads are added over a given rampup time if you want to generate increasing load.  Response times are printed to STDOUT.  Can be used for cursory performance benchmarking or load testing a web resource.

load_generator.py module

sample usage:


#!/usr/bin/env python

from load_generator import LoadManager

lm = LoadManager()
lm.msg = ('www.example.com', '/')
lm.start(threads=5, interval=2, rampup=2)
#    Comments [3] |
 Wednesday, September 26, 2007

Python - Tk Graph Example

I found a snippet to draw bar graphs in Python using Tk:
http://www.daniweb.com/code/snippet583.html

The output looks like this:


Here is a modified version that creates a bar graph in a Tk panel:

import Tkinter as tk

def graph_points(seq, width=375, height=325):
root = tk.Tk()
c = tk.Canvas(root, width=width, height=height, bg='white')
c.pack()
y_stretch = 15
y_gap = 20
x_stretch = 10
x_width = 20
x_gap = 20
for x, y in enumerate(data):
x0 = x * x_stretch + x * x_width + x_gap
y0 = height - (y * y_stretch + y_gap)
x1 = x * x_stretch + x * x_width + x_width + x_gap
y1 = height - y_gap
c.create_rectangle(x0, y0, x1, y1, fill="red")
c.create_text(x0+2, y0, anchor=tk.SW, text=str(y))
root.mainloop()

data = (18, 15, 10, 7, 5, 4, 2, 5, 8, 10, 13)
graph_points(data)
#    Comments [0] |
 Monday, September 17, 2007

Python - Yahoo Stock Quote Module

Last week I wrote a small Python module for retrieving stock prices.

It used screen scraping to get data from Google Finance.  Yahoo offers stock data in a much more digestible form which allowed me to get values without screen scraping and regular expressions.  So, I wrote a module based around this.

This new module is much more comprehensive and exposes a Python API for retrieving all sorts of stock data from Yahoo Finance.

My ystockquote module provides a Python API for retrieving stock data from Yahoo Finance.  This module contains the following functions:

  • get_all(symbol)
  • get_price(symbol)
  • get_change(symbol)
  • get_volume(symbol)
  • get_avg_daily_volume(symbol)
  • get_stock_exchange(symbol)
  • get_market_cap(symbol)
  • get_book_value(symbol)
  • get_ebitda(symbol)
  • get_dividend_per_share(symbol)
  • get_dividend_yield(symbol)
  • get_earnings_per_share(symbol)
  • get_52_week_high(symbol)
  • get_52_week_low(symbol)
  • get_50day_moving_avg(symbol)
  • get_200day_moving_avg(symbol)
  • get_price_earnings_ratio(symbol)
  • get_price_earnings_growth_ratio(symbol)
  • get_price_sales_ratio(symbol)
  • get_price_book_ratio(symbol)
  • get_short_ratio(symbol)

Sample Usage:


>>> import ystockquote
>>> print ystockquote.get_price('GOOG')
529.46
>>> print ystockquote.get_all('MSFT')
{'stock_exchange': '"NasdaqNM"', 'market_cap': '268.6B', 
'200day_moving_avg': '29.2879', '52_week_high': '31.84', 
'price_earnings_growth_ratio': '1.45', 'price_sales_ratio': '5.33',
'price': '28.65', 'earnings_per_share': '1.423', 
'50day_moving_avg': '28.7981', 'avg_daily_volume': '55579700',
'volume': '25330856', '52_week_low': '26.48', 'short_ratio': '1.60', 
'price_earnings_ratio': '28.65', 'dividend_yield': '1.38', 
'dividend_per_share': '0.40', 'price_book_ratio': '8.76', 
'ebitda': '20.441B', 'change': '-0.39', 'book_value': '3.315'}

The module is available here:  http://www.goldb.org/ystockquote.html

#    Comments [11] |
 Friday, September 14, 2007

Python - Stock Quote Module

I just wrote a tiny Python module for programmatically retrieving stock quotes from Google Finance:

The module:


import urllib
import re

def get_quote(symbol):
    base_url = 'http://finance.google.com/finance?q='
    content = urllib.urlopen(base_url + symbol).read()
    m = re.search('class="pr".*?>(.*?)<', content)
    if m:
        quote = m.group(1)
    else:
        quote = 'no quote available for: ' + symbol
    return quote


Sample usage:


#!/usr/bin/env python

import stockquote

print stockquote.get_quote('goog')


Output:


>> 529.56
#    Comments [8] |
 Tuesday, September 11, 2007

Python httplib2 - Handling Cookies in HTTP Form Posts

I often need to automate tasks in web based applications.  I like to do this at the protocol level by simulating a real user's interactions via HTTP.  Python comes with two built-in modules for this: urllib (higher level Web interface) and httplib (lower level HTTP interface).

However, I usually don't use either of these.  I prefer to use Joe Gregario's excellent httplib2 module (btw, I really wish this could make its way into Python's Standard Library).  It is a much richer library and has a lot of nice features for dealing with HTTP.  

When automating something, you often need to "login" to maintain some sort of session/state with the server.  This is usually achieved with form-based authentication. You post a form to the server, and it responds with a cookie in the incoming HTTP header.  You need to pass this cookie back to the server in subsequent requests to maintain state or to keep a session alive.

Here is an example of how to deal with cookies when doing your HTTP Post.


First, lets import the modules we will use:


import urllib
import httplib2


Now, lets define the data we will need: In this case, we are doing a form post with 2 fields representing a username and a password.


url = 'http://www.example.com/login'   
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}


Now we can send the HTTP request:


http = httplib2.Http()
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))


At this point, our "response" variable contains a dictionary of HTTP header fields that were returned by the server. If a cookie was returned, you would see a "set-cookie" field containing the cookie value. We want to take this value and put it into the outgoing HTTP header for our subsequent requests:


headers['Cookie'] = response['set-cookie']

Now we can send a request using this header and it will contain the cookie, so the server can recognize us.



So... here is the whole thing in a script. We login to a site and then make another request using the cookie we received:


#!/usr/bin/env python

import urllib
import httplib2

http = httplib2.Http()

url = 'http://www.example.com/login'   
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))

headers = {'Cookie': response['set-cookie']}

url = 'http://www.example.com/home'   
response, content = http.request(url, 'GET', headers=headers)
#    Comments [2] |
 Friday, August 31, 2007

Python 3000 alpha 1 Released!

wow... congrats to Guido and everyone else involved.

get it here:

http://python.org/download/releases/3.0/
#    Comments [0] |
 Friday, August 24, 2007

Pylot - Dev Update #6 - Web Performance/Load Test Tool (Results Report and GUI)

(Pylot is the open source web performance/load test tool that I am developing)

When a test run is finished, a report is automatically generated to summarize the test results. It includes various statistics and graphs on response times and throughput from the run. A sample of the results report can be seen here:

http://www.pylot.org/samples/results/results.html

Pylot also writes results to CSV files so you can import them into your favorite spreadsheet to crunch numbers, generate statistics, and create graphs. I have been working


I have also been working on the GUI. Here is the latest:

http://www.pylot.org/samples/ui/pylot_ui_screenshot_2007_08_20.png



Related:

#    Comments [4] |
 Wednesday, August 22, 2007

My Text Editor - What SciTE Says About Me

In a recent post: "What does your favorite text editor say about you", the author lists popular text editors and what they say about their users.  Here is the Editor or IDE I use with various programming languages:

Python:  SciTE
Perl:  SciTE
C#:  Visual Studio
Java:  Eclipse

I do all of my writing and a large portion of my programming in a plain old text editor.  Most of the code I write is in Python.  I love using a lightweight text editor instead of a big bloated IDE.  So... I pretty much live inside a text editor.

... and I love SciTE.  It rocks equally on Windows and GNU/Linux.  So what does this say about me?


SciTE:
"Your text editor is lightweight, full featured, extensible and cross platform. In addition, it can work as a stand-alone executable which requires no installation. Fits perfectly with all your other portable tools on your USB thumb drive. You also love how SciTE let’s you write Lua scripts to extend it’s functionality. You take your text editor choice very seriously. You like tinkering, and minimalistic, portable applications."

#    Comments [3] |
 Wednesday, August 08, 2007

Pylot - Dev Update #5 - Web Performance/Load Test Tool (Graphs With MatPlotlib)

We performance practioners love our graphs!  Visualizing data is helpful in analyzing performance results.  Sometimes a quick glance at a graph provides better understanding than a mound of raw or summarized data.  Pylot's Results Reporting feature creates graphs of response times (latency) and Throughput.

For the graphing toolkit, Pylot uses Matplotlib to produce fancy graphs like these:

python matplotlib line graph

Matplotlib allows you to graph data from Python. Here is a simple script that gives a glimpse of how a line/marker graph is created as a png image:


#!/usr/bin/env python

from pylab import * # Matplotlib

def main():
# sequence of data points to graph (x, y coordinates)
points = [(1, 3), (2, 6), (3, 2), (4, 5)]
graph(points)


def graph(points):
fig = figure(figsize=(6, 2)) # image dimensions
ax = fig.add_subplot(111)
ax.grid(True, color='#666666')
xticks(size='x-small')
yticks(size='x-small')
x_seq = [item[0] for item in points]
y_seq = [item[1] for item in points]
ax.plot(x_seq, y_seq,
color='blue', linestyle='-', linewidth=1.0, marker='o',
markeredgecolor='blue', markerfacecolor='yellow', markersize=2.0)
savefig('graph.png')


if __name__ == '__main__':
main()

The output looks like this:

pylot matplotlib latency graph

Related:

#    Comments [0] |
 Friday, July 27, 2007

Recommended Reading For Learning Python

I have the opportunity to spread Python to some junior/newbie programmers. In doing so, I wanted to compile a concise list of reccomended learning materials. The intended audience is someone who has a basic familiarity with programming but no specific Python experience.

There are a ton of books and online materials available, but where should you start? Here is my very brief list:

First Book:

Python Tutorials Online:

#    Comments [5] |
 Tuesday, July 24, 2007

Pylot - Dev Update #4 - Web Performance/Load Test Tool (New Name and Defining Test Cases)

PyLT has been renamed to Pylot (some sort of abbreviation for "Python Load Test")

I realized that having a pronounceable name for a piece of software is pretty important :)

So...
As I develop my load test tool, I need a way to define test cases.  Here is my first attempt:


What Is A Pylot Test Case?

You must declare your test cases in an XML file. This is the format that the test engine understands. Editing XML may seem natural to some people, but awkward to others. The nice thing about this structure is that it will be very easy to create a more friendly user interface [sometime in the future] for defining test cases.

A test case is defined using the following syntax. Only the URL element is required.

<case>
<url>URL</url>
<method>HTTP METHOD</method>
<body>REQUEST BODY CONTENT</body>
<add_header>ADDITIONAL HTTP HEADER</add_header>
<verify>STRING OR REGULAR EXPRESSION</verify>
<verify_negative>STRING OR REGULAR EXPRESSION</verify_negative>
</case>

Below is an example of the simplest possible test case file. It contains a single test case which will be executed continuously during the test run. The test case contains a URL for the service under test. Since no method or body defined, it will default to sending an HTTP GET to this resource. Since no verifications are defined, it will pass/fail the test case based on the HTTP status code it receives (pass if status is < 400).

<testcases>
<case>
<url>http://www.example.com/foo</url>
</case>
</testcases>

Related:

#    Comments [0] |
 Friday, June 29, 2007

PyLT - Dev Update #3 - Web Performance/Load Test Tool

(Update: PyLT has been renamed to Pylot)

(PyLT is the open source web performance/load test tool that I am developing)

A quick update on PyLT development...

The load generating engine is looking pretty solid and seems to work really well so far. It uses threading for concurrency and seems to scale well (though I haven't put it through its paces enough yet).

The GUI is evolving more and starting to look like a real performance/load testing tool:

This is my first project using wxWidgets and wxPython.  I am finding it to be very powerful and relatively straight forward to design nice user interfaces.  However, this is a big jump for me.  The past few years I have mostly done web programming and work with distributed systems.  It took a bit to get my head back into traditional GUI application development and event-driven programming

More to come...

Related:

#    Comments [0] |
 Friday, June 15, 2007

PyLT - Dev Update #2 - Web Performance/Load Test Tool

(Update: PyLT has been renamed to Pylot)

(PyLT is the web performance/load test tool that I am developing)

A quick update on PyLT development...

This week I rewrote the GUI using wxPython.  It still needs a *lot* of work, but here is what it's starting to look like:


Related:
PyLT - Dev Update #1 - Web Performance/Load Test Tool
PyLT - Scratching My Itch - New Web Performance/Load Test Tool (Open Source)

#    Comments [0] |
 Tuesday, June 12, 2007

Does Python Meet The Definitions Of An OO Programming Language?

Who cares.  After all, we are all consenting adults here.  Python is most definitely a multi-paradigm language.  This flexibility is one of Python's great features.

Tim Peters responding to accusations of Python not being a "true OO programming language" (1998):

Jeff:
> So how does Python implement encapsulation? From
> what I have seen it does not, and therefore may contain
> many OO concepts, but cannot be considered a
> true OO programming language.

Tim Peters:
Indeed, and because it doesn't support closures, it's not a true
functional programming language either. And because you have to import
all sorts of modules to do the simplest things (e.g., regular
expressions), neither is it a true scripting language. Indeed, because
it doesn't support labeled break or continue statements, it's not even
a true structured programming language.
#    Comments [0] |
 Monday, June 11, 2007

PyLT - Dev Update #1 - Web Performance/Load Test Tool

(Update: PyLT has been renamed to Pylot)

A quick update on PyLT development...

I have a working version of the guts of my tool (the multi-threaded load generator).  I have now started working on the user interface.  My initial idea was to use Tk for the GUI Toolkit.  I started developing a minimal GUI and quickly realized I need a Toolkit more powerful than Tk.

My original justification for using Tkinter (from blog comments):

"I will probably eventually move to a richer toolkit (like wxPython) if I take this thing far. For right now, Tk works. It comes distributed with core python, it's super fast and light, it's easy to use, and I know it pretty well. Though it looks like crap and is limited in many ways."

As of today I am rewriting the GUI with wxPython, which uses the wxWidgets Toolkit.  This should give me the ability to create a rich cross-platform UI for my tool.

[For posterity] Here is what the original prototype of the Tk UI looked like:


R.I.P. Tk... Hello wxWidgets


Related:
PyLT - Scratching My Itch - New Web Performance/Load Test Tool (Open Source) 

#    Comments [2] |
 Friday, June 01, 2007

PyLT - Scratching My Itch - New Web Performance/Load Test Tool (Open Source)

(Update: PyLT has been renamed to Pylot)

I have started development on a new web performance/load testing tool.  It is targeted at testing Web Services.


Here is some Q&A with myself:


You know you are reinventing the wheel, right?

Yes, I know.  There are already open source web load testing tools available (OpenSTA, JMeter, Grinder, WebLOAD, etc).  I have used all of these as well as proprietary tools for years.  I am a performance engineer and I feel like I need a tool set that I am intimately familiar with.  I need the ability to easily alter and tweak the tool at will.  I don't have the time, budget, or patience enough to wait on vendors when I need something.  I also want a tool that is fun to hack and adapt.  For this, I need to understand the code base deeply.

What language are you using?

Python.  The initial GUI uses Tk, but this may be changed down the road. I use Python's threading module for concurrency. If this doesn't scale well enough, I will be exploring other models of concurrency (perhaps generator based coroutines).

Why do you think you can write a tool like this?

I have worked in performance testing for nearly 10 years.  I have written many tools that work with various protocols to do distributed load generation and testing.  Creating a simple HTTP load generator is sort of my Hello World 2.0 for each language I try (I have written these from scratch in Python, Perl, Java, and C#).  This tool takes that basic concept and organizes it into a robust application.

Will it be Free and Open Source?

Of course!  Licensed under GNU GPL.



For an early look, check out the source repository at:  http://pylt.googlecode.com/svn/trunk

More details to come.

-Corey

#    Comments [6] |
 Tuesday, May 29, 2007

Simple Python Web Server Example

Note to self...
use this:

Roll your own server in 50 lines of Python code (by Muharem Hrnjadovic):

"Just in case you wondered why there are so many frameworks in Python land, here’s a basic server (including a request dispatch mechanism) in only 50 lines of code."

Why *not* add a server interface to every tool I write?  :)

#    Comments [0] |
 Sunday, May 27, 2007

Python Multitask - Generator-based Multitasking and Asynchronous I/O

This looks cool:  http://o2s.csail.mit.edu/o2s-wiki/multitask
"multitask allows Python programs to use generators (aka coroutines) to perform cooperative multitasking and asynchronous I/O. Applications written using multitask consist of a set of cooperating tasks that yield to a shared task manager whenever they perform a (potentially) blocking operation, such as I/O on a socket or getting data from a queue. The task manager temporarily suspends the task (allowing other tasks to run in the meantime) and then restarts it when the blocking operation is complete. Such an approach is suitable for applications that would otherwise have to use select() and/or multiple threads to achieve concurrency."


It is built on some of the new generator features in Python 2.5.  I wrote about this a few months back and actually tried to implement a version of a coroutine scheduler myself.  Glad to see someone packaged up a nice version I can use :)


#    Comments [0] |
 Thursday, May 17, 2007

RESTful Web Services - 10 Years of 'Programmable Web' Books

I just got the RESTful Web Services book (Leonard Richardson & Sam Ruby, O'Reilly, 2007) in the mail today.  I've only read the beginning, but so far it is great.  In fact, it brings me back to when I first started working with the "programmable web".  I got into the programmable web back when the web was only a few years old.  I spent years doing performance/scalability testing and tuning for large Web 1.0 applications and bizarre custom Web API's (think huge financial services rushing to get online).  Building tools to run realistic workloads through a system involves writing custom clients to simulate real user/browser interaction.  This is pretty ugly stuff when you are dealing with an application that was designed with only humans in mind (AKA all).  It involves lots of HTTP protocol level work.. screen scraping.. protocol sniffing and analyzing.. requests.. header mangling.. cookie handling.. redirects.. authentication.. session information parsing.. etc, etc.

Application simulation is pretty messy work.  There is no simple API to hide behind; you had to figure out what the API was for yourself.  See.. *every* web application has an API.  Though it might have been designed by accident.  This allowed me to see first hand how developers and frameworks butchered the use of the "Web" as a platform.  Staring at naked HTTP let me see every little bit of the hairball underneath.  Alas, any standardization around web services (or the concept to be officially named) was far off.

A friend (bearded Perl hacker) let me borrow a book to show me how Perl can do this cool web stuff:  Web Client Programming with Perl (Clinton Wong, O'Reilly, 1997).  This book helped me build my first web clients to do application simulation and testing.  There wasn't a ton of documentation at the time to do this sort of thing, so i relied heavily on this book.

So now.. 10 years later..  the Web has changed..  it has morphed into *the* distributed platform..  it is becoming organized.

As I flip through Restful Web Services, it all just looks right..  REST looks right..   It is simple..  it is HTTP..  it is all the guts I already know.  It almost feels like a sequel to my old favorite:

I have traded Perl for Python as my preferred scripting language the past few years, but I am still building simulators, web clients, and virtual users. I am excited to work on some new stuff in this area.

#    Comments [0] |
 Wednesday, May 09, 2007

PerfLog - Performance Analysis Tool for Web Server Logs (Python)

I wrote a small tool that I have found useful.  It is a Python script that parses and analyzes web log files (in W3C Extended Log File Format).  It creates and HTML report with data and PNG images showing graphs of things like: request throughput, error rates, HTTP method distribution, content type distribution, time-series, etc.

Many log parsing/analysis tools exist, but I was looking for something more specific to Performance than something a webmaster would want to look at.

The script is pretty basic. It was very useful for my own needs, but others might want to modify it.  If anyone has good suggestions to add to it, I am willing to enhance it at some point (or just grab my code and hack it yourself if you know Python).


Project Home

Features

  • Produces metrics and graphs from web logs (W3C Extended Log File Format)
  • Useful during performance testing and analysis
  • Output is created in XHTML/CSS with embedded PNG images
  • PerfLog is written in Python and uses Matplotlib for graphs and plotting

License

Project Info

Requirements

  • Python 2.4+
  • Matplotlib (requires Numeric or Numpy)

Platforms

  • Cross-Platform.  PerfLog will run on any system that supports Python and Matplotlib.
#    Comments [1] |
 Monday, April 30, 2007

I Am LISP?

I just took the "Which Programming Language Are You?" quiz. Was hoping to be Python.

Apparently I am LISP?

You are Lisp.  Very few people like you (Probably because you use too many parenthesis (You better stop it (Reallly)))
Which Programming Language are You?

#    Comments [2] |
 Tuesday, April 10, 2007

Python and IEC - Stupid-Simple Windows Browser Automation

I have been using IEC lately for automating repetitive administrative tasks within my company:

IEC.py - Automating Internet Explorer with Python

IEC is a simple library with a nice API for automating an IE browser. I found it simple to work with for basic automation needs. I have also used it as the core of a small UI testing framework.

From Mayukh Bose:

IEC is a python library designed to help you automate and control an Internet Explorer window. You can use this library to navigate to web pages, read the values of various HTML elements, set the values of checkboxes, text boxes, radio buttons etc., click on buttons and submit forms.

Yeah I know.. pretty lame it only works with IE, but in the environment I was working in, the applications ran on *IE Only*.


A personal story:

My company is very analytical and detail oriented when it comes to tracking/planning project resource allocation. We track all sorts of projections, budgets, resources, etc. The workflow is basically: some business guys (no idea what they actually do) take data from some reports and enter them into some arcane hosted tracking software. This is done by entering copious amounts of data into web form after web form. Then they submit the form to run a report. Once that is finished, they cut & paste the data into MS Excel. Then they take the Excel spreadsheet and follow some wild sequence of copying, cutting, pasting, converting, running macros, graphing, etc. At the end of this, a few images are produced so some wizz-bang graphs can go into a monthly Powerpoint... wow.

So... I wrote a Python script that takes their input data, drives a web browser to do the report, screen scrapes the result, processes it, generates some fancy graphs with Matplotlib, and presents a web page with the results.  End result: Converted a multi-hour manual process into the click of an icon and 20 seconds of processing.

I could have done this with HTTP directly, but this UI automation technique made it very quick to develop; and it looked impressive ("whoa it's like.. making my browser move on its own").


To use IEC, you need the Python for Windows Extensions. If you use the ActiveState Python distribution, these are already included.

I used to use ActiveState Python for Windows programming (because I was a big fan of ActiveState Perl, where the installer and PPM package manager rocked). I recently spent close to an hour getting SSL (HTTP) to work with ActiveState.  I couldn't get it to work so I ditched it for the standard Python distro.


--
Happy Hacking.

#    Comments [4] |
 Monday, April 09, 2007

Geo Location Mashup - Python, Yahoo Maps AJAX API

Mapping User Metro Concentration by IP Address

I just posted this: http://www.goldb.org/geo_maps

It is a tutorial/example showing how to create a geolocation mashup by generting HTML/JavaScript code from a Python script.  The resulting code is an HTML page with embedded JavaScript that you can open with your browser.  It works with the Yahoo Maps AJAX API to plot markers at specified locations.  I also explain how this technique can be used to create a [near] real-time map of user concentration based on IP addresses.

... feedback welcome.


It generates cool AJAXy eye-candy like this:

and this:

Since I use the AJAX control, the rendered map has a zooming, panning, dynamic, tiled interface.  Pretty Slick.

#    Comments [1] |
 Sunday, April 01, 2007

Massive Concurreny with PyPy Stackless

(via)
PyPy had its 1.0 release recently.

Now, This looks *really* interesting:

PyPy Stackless

PyPy can expose to its user language features similar to the ones present in Stackless Python: no recursion depth limit, and the ability to write code in a massively concurrent style. It actually exposes three different paradigms to choose from:
  • Tasklets and Channels
  • Greenlets
  • Plain Coroutines
#    Comments [0] |

One Laptop Per Child - More Prototype Pics and Info

I posted some pics of the latest OLPC prototypes a few weeks ago.  Well... I got to see them 2 weeks in a row; so here are some more pics of the machine up close.

... Seems the whole "hand crank" idea is gone.  There is now a pullchord on the external power supply with a 10:1 ratio (1 minute of pulling = 10 mins of computing) for manually recharging power... The keyboard is tiny and soft feeling.  The screen is small but is very viewable in direct light without backlighting (which is probably the #1 power drain on laptops).

OLPC rocks!

Me geeking out:

Old school meets new school...
Gerald J. Sussman (yes, the MIT Scheme guy) playing with the latest OLPC prototype:

Closeups:


.. these machines run a scaled down version of Fedora Linux that is loaded with Python applications.

-Corey

#    Comments [3] |
 Thursday, March 29, 2007

Python - Remove Duplicate Items From a Sequence

Say you have a sequence like:

[1, 1, 2, 2, 2, 3, 4, 4, 4]

... and you want a sequence containing all the unique items (remove duplicates) like:

[1, 2, 3, 4]


Here is a function to do it:

def remove_dups(seq):
    x = {}
    for y in seq:
        x[y] = 1
    u = x.keys()
    return u


or a one-liner:

u = [x for x in seq if x not in locals()['_[1]']]



update: in the comments below, some other ways were suggested..

with 'set'.. like this:

u = list(set(seq))

or with a dictionary.. like this:

u = dict.fromkeys(seq).keys()
#    Comments [4] |
 Friday, March 23, 2007

Python - Creating Bar Graphs with Matplotlib

Matplotlib is an open source 2D plotting library for Python.  It is very impressive and robust, but the API and documentation is maddeningly difficult to follow.

Here I have provided a function that will create a bar graph [as a png image] from a Python dictionary using the Matplotlib API.

It will auto-size the bars and auto-adjust the axis labels for you. All you need to pass into it is a dictionary data structure (and optionally a graph title and output name).


We start with a Python dictionary like this:

{'A': 70, 'B': 290, 'C': 130}


... and the function will use Matplotlib to create a graph like this:


Here is a sample script that uses my function:


#!/usr/bin/env python

from pylab import *

def main():  
    my_dict = {'A': 70, 'B': 290, 'C': 130}
    bar_graph(my_dict, graph_title='ABC')


def bar_graph(name_value_dict, graph_title='', output_name='bargraph.png'):
    figure(figsize=(4, 2)) # image dimensions  
    title(graph_title, size='x-small')
   
    # add bars
    for i, key in zip(range(len(name_value_dict)), name_value_dict.keys()):
        bar(i + 0.25 , name_value_dict[key], color='red')
   
    # axis setup
    xticks(arange(0.65, len(name_value_dict)),
        [('%s: %d' % (name, value)) for name, value in
        zip(name_value_dict.keys(), name_value_dict.values())],
        size='xx-small')
    max_value = max(name_value_dict.values())
    tick_range = arange(0, max_value, (max_value / 7))
    yticks(tick_range, size='xx-small')
    formatter = FixedFormatter([str(x) for x in tick_range])
    gca().yaxis.set_major_formatter(formatter)
    gca().yaxis.grid(which='major')
   
    savefig(output_name)


if __name__ == "__main__":
    main()


enjoy.

-Corey

#    Comments [6] |
 Thursday, March 22, 2007

Python - Convert Date/Time to Epoch

I'm not sure why, but this took me forever to figure out; so I'm posting it here for others...

Let's say you have a string representing a date and a time and you want to convert it to epoch time (# secs since the epoch).

First you will need to create a pattern for your time format, using time format directives.

For example, the pattern for:

'2007-02-05 16:15:18'

Would be:

'%Y-%m-%d %H:%M:%S'

You can then convert it to epoch like this:

int(time.mktime(time.strptime('2007-02-05 16:15:18', '%Y-%m-%d %H:%M:%S')))


Now in a script:

#!/usr/bin/env python

import time

date_time = '2007-02-05 16:15:18'
pattern = '%Y-%m-%d %H:%M:%S'
epoch = int(time.mktime(time.strptime(date_time, pattern)))
print epoch
#    Comments [0] |
 Saturday, March 17, 2007

Python3000 vs. Perl6 ... Wanna Bet?

Perl6...
Python3000...

Both are redesigns of very popular dynamic/scripting languages.  Both have very strong, though very different, communities supporting them.

Out of the gate, Perl's plans were much more ambitious, including a new generic virtual machine.  Python's plans were more pragmatic; more of a language cleanup than a drastic redesign.

Perl 6 was officially announced nearly 7 years ago and I don't see a stable production release coming *any* time soon.  On the other hand, the idea of Python3000 was sorta tossed around for a while and swung into gear 2 years ago.

Guido (Python's BDFL) has been spearheading the effort, whereas Perl's leadership structure is much more anarchic (Where is Larry Wall these days?).  Guido has been very transparent and kept the community aware of his worries.

Some people saw this as a slippery slope...

Chromatic:

"Language redesign is difficult, isn’t it?  Once you start challenging base assumptions, you find that a lot of your previous conclusions are shaky, and good luck reigning in blue-sky ideas!

See you in 2007… or 2008… or 2009.

Best wishes,
a Perl 6 hacker"

I disagree..

I'd bet anyone money that I will be hacking on a stable release of Python3000 long before I'm using a stable version of Perl6... any takers?


(disclosure: I have written lots of code in both Perl and Python and am a fan of both)

#    Comments [6] |
 Wednesday, March 14, 2007

Regex "Match" in Python vs. C#

I have been writing a lot of code in both C# and Python lately... flipping back and forth between both languages.  One thing I keep getting tripped up on is the terminology used in regular expression syntax, and what a "match" is.

So for my own disambiguation:

  • Python's re.match() is different than C#'s Regex.IsMatch()
  • Python's re.search() is similar to C#'s Regex.IsMatch()


Better explained in code:


Using Regex.IsMatch() in C# to match a pattern with some text:

if (Regex.IsMatch("foobar", "bar"))
{
    Console.WriteLine("Match");
}
else
{
    Console.WriteLine("No Match");
}

this prints 'Match'


Same thing, using re.match() in Python:

if re.match('bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'No Match'


oops.. didn't get a match. What happened?

match() only checks if the regex matches at the beginning of the string, while search() will scan forward through the string for a match.


If you were expecting the pattern to match anywhere in the string, you need to use re.search() instead:

if re.search('bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'Match'


... or else you must supply a pattern that will match from the beginning of the string:

if re.match('.*bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'Match'


#    Comments [0] |
 Saturday, March 10, 2007

Python - Iterating Multiple Sequences

Here are some examples of iterating through multiple sequences simultaneously in Python:


I start with 2 lists of numbers:

foos = [0, 1, 2, 3, 4]
bars = [1, 2, 3, 4, 5]

I want to create a new list that is made up of the sum of the items at each position in the original lists.  So I will end up with this:

>>> print foobars

[1, 3, 5, 7, 9]


Starting with an unpythonic way...
Here I use a counter to iterate through the indexes of each sequence and build a new list:

foobars = []
for i in range(len(foos)):
    foo = foos[i]
    bar = bars[i]
    foobars.append(foo + bar)


Getting more pythonic...
Here I use zip. Zip allows me to iterate each sequence simultaneously, assigning the current sequence values each time through the loop:

foobars = []
for foo, bar in zip(foos, bars):
    foobars.append(foo + bar)


The older pythonic way to do this was with map:

foobars = []
for foo, bar in map(None, foos, bars):
    foobars.append(foo + bar)


Getting even more pythonic and more concise...
I can combine zip with a list comprehension and do it in a one-liner like this:

foobars = [foo + bar for (foo, bar) in zip(foos, bars)]


*note:  zip will not be part of Python 3000.  It will be replaced by izip and iterators to achieve similar results.

#    Comments [2] |
 Thursday, March 08, 2007

PLEAC - Programming Language Examples Alike Cookbook

I just stumbled across the PLEAC Project (Programming Language Examples Alike Cookbook).

Project Description:

"Following the great Perl Cookbook (by Tom Christiansen & Nathan Torkington, published by O'Reilly; you can freely browse an excerpt of the book here) which presents a suite of common programming problems solved in the Perl language, this project aims to gather fans of programming, in order to implement the solutions in other programming languages."


There is sample code in many popular languages.  The Python examples are really good.  They would serve as an excellent primer for someone moving from Perl to Python, or as a general Python reference with cookbook-style examples.

It is hosted at SourceForge and licensed under the GNU Free Documentation License (GFDL).

#    Comments [0] |
 Sunday, March 04, 2007

Python Parameters - Pass-By-Value or Pass-By-Reference?

Passing parameters to functions and methods.  Pass-by-value?  Pass-by-reference?  Which does your language use?

You probably learned this in your first CS class... so did I.

Then why did it take me a frakin' month to understand what Python does? :)

Well... if you look online, you will find some very ambiguous answers about Python being pass-by-reference or pass-by-value.  (which ends up boiling down to semantics and how you use certain terminology, but forget that for now)


To review, how do other languages handle this concept?

C is pass-by-value

Straightforward. You can simulate pass-by-reference with pointers.  Not much else to say here.

Java is pass-by-value

Primitive Types (non-object built-in types) are simply passed by value.  Passing Object References feels like pass-by-reference, but it isn't.  What you are really doing is passing references-to-objects by value.

OK, so what about Python?

Python passes references to objects by value (like Java), and everything in Python is an object. This sounds simple, but then you will notice that some data types seem to exhibit pass-by-value characteristics, while others seem to act like pass-by-reference... what's the deal?

It is important to understand mutable and immutable objects. Some objects, like strings, tuples, and numbers, are immutable.  Altering them inside a function/method will create a new instance and the original instance outside the function/method is not changed.  Other objects, like lists and dictionaries are mutable, which means you can change the object in-place.  Therefore, altering an object inside a function/method will also change the original object outside.



For entirely too much information about this topic in Python and across many other languages (Java, Scheme, C#, C, C++, Python), read the thread where these quotes come from:

Is Python By Value Or By Reference?

Alex Martelli:

The terminology problem may be due to the fact that, in python, the value of a name is a reference to an object. So, you always pass the value (no implicit copying), and that value is always a reference.
[...]
Now if you want to coin a name for that, such as "by object reference", "by uncopied value", or whatever, be my guest. Trying to reuse terminology that is more generally applied to languages where "variables are boxes" to a language where "variables are post-it tags" is, IMHO, more likely to confuse than to help.

Michael Hoffman:

Alex is right that trying to shoehorn Python into a "pass-by-reference" or "pass-by-value" paradigm is misleading and probably not very helpful. In Python every variable assignment (even an assignment of a small integer) is an assignment of a reference. Every function call involves passing the values of those references.


word.

#    Comments [0] |
 Wednesday, February 28, 2007

One Laptop Per Child - It's All About the Python!

Wow. I just read something interesting about the One Laptop Per Child (OLPC) project in Guido's PyCon writeup:

"The software is far from finished.  An early version of the GUI and window manager are available, and a few small demo applications: chat, video, two games, and a web browser, and that's about it!  The plan is to write all applications in Python (except for the web browser), and a "view source" button should show the Python source for the currently running application.  In the tradition of Smalltalk (Alan Kay is on the OLPC board, and has endorsed the project's use of Python) the user should be able to edit any part of a "live" application and see the effects of the change immediately in the application's behavior."


So... they are going to be running a GNU/Linux OS (a stripped down version of Fedora), with essentially all applications in Python.

This is very cool on many levels. It is the ultimate endorsement of Python.  It also makes me think about the future...  If OLPC is successful, a few years down the road we might be looking at several million young new Open Source/Python hackers.  Nice!


#    Comments [0] |

Reading Outlook/Exchange Email Programatically with Python

With Python's Windows Extensions, you can talk via COM to an Exchange Server and read/process your email.  You must have the Outlook Client installed on the box you are running this from.

Here is a sample script that will:

  • connect to your mailbox
  • print the inbox name
  • print the message count
  • print the subjects for all your email messages


#!/usr/bin/env python

from win32com.client import Dispatch

session = Dispatch("MAPI.session")
session.Logon('OUTLOOK')  # MAPI profile name
inbox = session.Inbox

print "Inbox name is:", inbox.Name
print "Number of messages:", inbox.Messages.Count

for i in range(inbox.Messages.Count):
    message = inbox.Messages.Item(i + 1)
    print message.Subject


#    Comments [0] |
 Monday, February 26, 2007

Python 3000 Video and Slides

Guido van Rossum just published his slides from the PyCon 2007 Keynote where he discusses Python 3000.

His talk is also available on Google Video.

I'm psyched to watch this!

#    Comments [0] |
 Sunday, February 25, 2007

4-Space Indents - 4Eva!

I personally use 4-space indents when I write code (in any language, period).


Oddly... in Python, where whitespace matters, there is no single common practice (which would fit in nicely with Python's TOOWTDI ideology).

Some observations of source code indentations in Python:

  • I mostly see 4-space indents in code I read (colleagues, libraries, 3rd party code)
  • Python's own source (the Python parts) use 4-space indents
  • Google uses 2-space indents in their Python
  • I hate tabs (if I randomly sucker-punch you, this is why)


Guido van Rossum (creator, lead developer, and BDFL of Python) writes

"If it uses two-space indents, it's corporate code; if it uses four-space indents, it's open source. (If it uses tabs, I didn't write it! :)"
#    Comments [0] |

IronPython Community Edition - Free IronPython

IronPython is the Python implementation that runs on the .NET platform... originally created by Jim Hugginin, but later backed (overtaken?) by Microsoft.

Microsoft's ambivalence towards Free software is a bit hard to follow sometimes and it really makes me question their entire approach to the software community  (wait.. have I ever *not* questioned that? :).

As a Python advocate and *nix geek trying to make my way working in a .NET shop, I am really excited about IronPython.  I was also initially impressed with Microsoft's embracement of the Python community and toe dipping into Open Source.  But then I hear Microsoft will not take patches from non-Microsoft developers and will not bundle IronPython with other applications which have certain Free licenses(LGPL, BSD). To me, this is really a shame. That is not how you approach a community.

Well... at least somebody has stepped up and is maintaining IronPython Community Edition (IPCE).

props.


So check out IPCE and the FePy Project!

#    Comments [0] |
 Friday, February 23, 2007

Developer + Tester == Develester ?

I saw this posted in the toolsmith-guild group (Danny Faught):

"The developers were very surprised to find a whole room full of testers who didn't cringe at the thought of reading and writing code. The rather odd terms Developer-Tester/Tester-Developer emerged from AWTA." (Austin Workshop on Test Automation)

ahh, the elusive "develester"


... and.. the develester algorithm in Python :)

#!/usr/bin/env python

def what_am_i(skills):
    if 'developer' in skills:
        role = 'developer'
    if ('tester' in skills):
        role = 'tester'
    if ('developer' and 'tester') in skills:
        role = 'develester'
    return role
        

skills = 'developer-tester'
print 'you are a %s' % what_am_i(skills)
#    Comments [0] |
 Tuesday, February 20, 2007

Compiling Python Scripts to Windows Executables

I often write quick Python scripts that I need to run on other machines. It is sometimes easier to just drop a windows .EXE onto a machine (with a Python Interpreter compiled into it), rather than doing a full Python installation. To do this, I use py2exe

py2exe is a Python Distutils extension which converts Python scripts into executable Windows programs. This enables your Python scripts to be run on Windows platforms without a Python installation.

You can run py2exe directly from the command line, or you can script it. I wrote a small convenience script that I use for general compilation.

Let's call the compilation script: compile.py
Let's say we have a script we want to compile named: foo.py

You would then invoke it from the command line like this:

>python compile.py foo.py

This will create a 'dist' subdirectory containing the newly created executable along with some necessary DLL's.


Here is the code I use for my compile.py:


#!/usr/bin/env python
# Corey Goldberg

from distutils.core import setup
import py2exe
import sys

if len(sys.argv) == 2:
    entry_point = sys.argv[1]
    sys.argv.pop()
    sys.argv.append('py2exe')
    sys.argv.append('-q')
else:
    print 'usage: compile.py <python_script>\n'
    raw_input("press ENTER to exit...")
    sys.exit(1)

opts = {
    'py2exe': {
        'compressed': 1,
        'optimize': 2,
        'bundle_files': 1
    }
}

setup(console=[entry_point], options=opts, zipfile=None)

(note: you need to have Python and py2exe installed on a Windows box to run this)

#    Comments [0] |
 Tuesday, February 13, 2007

Trampolining With Generators - Roll Your Own Scheduler?

Even the subject sounds confusing huh?

I was reading Neil Mix's: Threading in JavaScript 1.7 post and was really fascinated by the concept he discusses: trampolining

Basically, trampolining it is a way to achieve concurrency by using Generators to create a coroutine scheduler.

In JavaScript 1.7 (which Firefox 2 supports), you can already do concurrent programming with this technique.


Neil Mix:

"The way trampolining works is that a scheduler object (written in JavaScript) manages the execution of a series of generators, cobbling together a stack-like execution. Here’s how it works: The scheduler sets the starting generator as the base “frame” in the call stack. The scheduler then calls next() on the generator to obtain a yield value. If the yielded value is itself a generator, the scheduler pushes this new generator on the stack and calls next() on it, again obtaining a yield value. This continues until the top generator yields a non-generator value. This value could be a special directive to the scheduler (for example, a SUSPEND value that tells the scheduler to freeze execution of the “stack” of generators we’ve piled up). If not, the scheduler treats it as a return value. The scheduler then pops and closes the now complete generator and sends the return value back into the next generator in the stack."

pretty sick, huh?  ... definitely a twisted idea :)

The interesting takeaway is that this technique could be used to implement concurrency in any language that supports Generators.  It looks like Python has a similar capability.  This is described in detail in: PEP 342 - Coroutines via Enhanced Generators.

Generator-based state machines sound really interesting.  Hopefully I'll find some time to play with them [in python] as an alternate to threading.

#    Comments [0] |

Screen Scraping in Python

Mads Kristensen just posted an article: Screen scraping in C#, where he shows several ways to make HTTP requests in C# that can be used for screen scraping.

from Mads:

"Some say that screen scraping is a lost art because it is no longer an advanced discipline. That may be right, but there are different ways of doing it. Here are some different ways that all are perfectly acceptable, but can be used for various different purposes."


Not to be outdone... here are 2 examples of how to do the same thing in Python:

using httplib:

conn = httplib.HTTPConnection("www.python.org")
conn.request("GET", '/')
print conn.getresponse().read()


using urllib:

f = urllib.urlopen('http://www.python.org/')
print f.read()


#    Comments [2] |
 Friday, February 09, 2007

Python - use Psyco (x86 JIT-like compiler) for a speed boost

Psyco is a Python extension module which can speed up the execution of any Python code.

from the Psyco site:

"Think of Psyco as a kind of just-in-time (JIT) compiler, a little bit like what exists for other languages, that emit machine code on the fly instead of interpreting your Python program step by step. The difference with the traditional approach to JIT compilers is that Psyco writes several version of the same blocks (a block is a bit of a function), which are optimized by being specialized to some kinds of variables (a "kind" can mean a type, but it is more general). The result is that your unmodified Python programs run faster"


I have been working on some Python projects recently where Pysco has given me a a really substantial performance increase. The type of work I am doing mostly involves statistcal analysis of large numerical data sets.. array math.. percentiles.. time-series.. etc, etc.

To use it, all I do is copy Psyco to my system (to Python's Lib/site-packages), and add the following to the top of my python source file:

import psyco
psyco.full()


Thats all ...

... or even better; wrap it in a try/except so your program still runs on systems without Psyco installed.

try:
    import psyco
    psyco.full()
except:
    pass
#    Comments [0] |

Live Brain-Surgery With Python

Another example of improved productivity with dynamic languages...

Gojko Adzic on prototyping in Python:

"writing the prototype in Python allowed us to start a web server and open an interactive console to re-wire it and perform live brain-surgery while the server is running. I cannot imagine doing that in Java or C#. In a month, we wrote the functional equivalent of at least 4-5 months of C# code."

#    Comments [0] |
 Friday, January 26, 2007

Python - Sort A Nested Sequence With DSU

The DSU (Decorate, Sort, Undecorate) idiom originates from Lisp.  I first learned it in Perl, where it is called the Schwartzian Transform (coolest name ever?), named after longtime Perl hacker Randal L. Schwartz.

I find myself using this same DSU idiom in Python when I need to sort a nested sequence (single level sequence of sequences).

Lets say I have the following list of lists:

seq = [
    ['a', 1, 5],
    ['b', 3, 4],
    ['c', 2, 2],
    ['d', 4, 3],
    ['e', 5, 1],
]

... and I want the outer list to contain the inner lists sorted by their last column (in this case, index 2).

How would I do this?

Here is an implementations of the DSU (Decorate, Sort, Undecorate) idiom in a Python function:

def dsu_sort(idx, seq):
    for i, e in enumerate(seq):
        seq[i] = (e[idx], e)
    seq.sort()
    for i, e in enumerate(seq):
        seq[i] = e[1]
    return seq
   
(Keep in mind that lists in Python are mutable and this will transform your original sequence.)


So applying this to the sequence above like this:

dsu_sort(2, seq)

gives us:

[['e', 5, 1], ['c', 2, 2], ['d', 4, 3], ['b', 3, 4], ['a', 1, 5]]

which is the original sequence, transformed so it is sorted by the last column (index 2).



Randal's original implementation in Perl from 1994:
#!/usr/bin/perl
 print
     map { $_->[0] }
     sort { $a->[1] cmp $b->[1] }
     map { [$_, /(\S+)$/] }
     <>;

#    Comments [3] |
 Tuesday, January 16, 2007

Python - Merge a Sequence of Lists Into a Single List

the function:

def merge(seq):
    merged = []
    for s in seq:
        for x in s:
            merged.append(x)
    return merged


sample usage:

foo = [['a', 'b'],['c'],['d', 'e', 'f']]
print merge(foo)

>>>['a', 'b', 'c', 'd', 'e', 'f']

Update:
Here is another implementation that uses a Python dictionary. This version merges the lists and only keeps unique entries.

def merge(seq):
d = {}
for s in seq:
for x in s:
d[x] = 1
return d.keys()
#    Comments [0] |
 Tuesday, January 09, 2007

Python - Formatted Dates and Times

I am not sure why, but every time I need to use some formatted dates or times in Python, I end up spending about 20 minutes going through the docs and reading up on the datetime module; which leads to more confusion.

So for my own clarity, here is how we do it using only the time module:

>>> import time
>>> print time.strftime("%m/%d/%y %H:%M:%S", time.localtime())
01/09/07 12:17:25

All of the formatting options for strftime() can be found here: http://docs.python.org/lib/module-time.html

#    Comments [0] |
 Wednesday, January 03, 2007

Python - Find And Replace A String In Every File In A Directory

The Python Cookbook has a recipe to find and replace a string in every file in a directory.

I needed to do something like this today, so I cleaned up the script a little to make it [hopefully] a little more pythonic:


#!/usr/bin/env python
# replace a string in multiple files

import fileinput
import glob
import sys
import os


if len(sys.argv) < 2:
    print 'usage: %s search_text replace_text directory' \
        % os.path.basename(sys.argv[0])
    sys.exit(0)


stext = sys.argv[1]
rtext = sys.argv[2]
if len(sys.argv) == 4:
    path = os.path.join(sys.argv[3], '*')
else:
    path = '*'


print 'finding: %s and replacing with: %s' % (stext, rtext)


files = glob.glob(path)
for line in fileinput.input(files, inplace=1):
    if stext in line:
        line = line.replace(stext, rtext)
    sys.stdout.write(line)


#    Comments [0] |
 Friday, November 24, 2006

Python, IDEs, and Drones

Python is a very popular programming language with adoption and advocacy from many corporations, and large factions of open source programmers using it extensively.  However, in the world of "corporate drone programming", it is still pretty niche. 

Have a look at this indication of popularity among programming languages:
TIOBE Programming Community Index

One thing I like about Python is the simpilcity it strives for.  I find myself writing all my code in SciTE, a simple text editor; rather than a full blown IDE.

I always looked at this is a strong point for dynamic languages.

Over in the cult of corporate drone programmers, static languages (C++, C#, Java) are the norm, and life is spent inside an IDE.


from Robert on comp.lang.python:

"Flat Web/DB programming is one major field where programmer masses are born.  The other big one is RAD-GUI/DB programming. This field is probably still wide open. Best tooled Borland RAD systems are going down meanwhile because of the stiff compiler language. Programmers look around for the next language & toolset. Python is the language - but with Python there is again a similar confusion around IDE's and GUI-libs. There is no really good IDE (but fat ones). And the major gui libs there are not Python, but are fat sickening layers upon layers upon other OO-langs."

Not that I necesarilly want Python to become the next default language for drones, but it makes me think about further adaption and mainstreamability of Python and other dynamic languages (which typically aren't as well suited to the features of many IDEs)

#    Comments [0] |
 Friday, November 17, 2006

Python - The New Choice For Computer Science Academia?

I have seen a few articles in the past couple days talking about how MIT is revamping its introductory computer science course from using Scheme/Lisp to using Python.  Apparantly, other CS programs are using Python as well.

As an undergrad in 1993, I took CS classes in a program that was somewhat modeled after the MIT curriculum.  We used the first edition of the [in]famous wizard book.  Head first into the weird ways of functional programming was a bit of a shock for me and Scheme nearly scarred me for life.  I think the move to using Python is certainly a good one.

I just took a look around the net and was surprised by how many people are pushing for Python as an introductory language that is well suited to be taught in an academic setting.


Some links to related articles:

Teaching with Python
Using Python in a High School Computer Science Program
EDU-SIG: Python in Education
Teaching Introductory Computer Science with Python

#    Comments [0] |
 Sunday, November 12, 2006

CPU Monitor With Python And WMI

Tim Golden's WMI module for Python is a lightweight wrapper around the WMI classes available for all Win32 platforms.

Windows Management Instrumentation (WMI) is Microsoft's implementation of Web-Based Enterprise Management (WBEM), an industry initiative to provide a Common Information Model (CIM) for pretty much any information about a computer system.

I will give a simple example of monitoring your local CPU using the WMI module from a Python program.


First, we can explore the WMI Win32_Processor class:

import wmi
c = wmi.WMI()
for s in c.Win32_Processor():
    print s


Output looks like this:

instance of Win32_Processor
{
    AddressWidth = 32;
    Architecture = 0;
    Availability = 3;
    Caption = "x86 Family 6 Model 13 Stepping 6";
    CpuStatus = 1;
    CreationClassName = "Win32_Processor";
    CurrentClockSpeed = 1794;
    CurrentVoltage = 33;
    DataWidth = 32;
    Description = "x86 Family 6 Model 13 Stepping 6";
    DeviceID = "CPU0";
    ExtClock = 133;
    Family = 2;
    L2CacheSize = 2048;
    Level = 6;
    LoadPercentage = 6;
    Manufacturer = "GenuineIntel";
    MaxClockSpeed = 1794;
    Name = "        Intel(R) Pentium(R) M processor 1.80GHz";
    PowerManagementSupported = FALSE;
    ProcessorId = "AFE9F9BF000006D6";
    ProcessorType = 3;
    Revision = 3334;
    Role = "CPU";
    SocketDesignation = "Microprocessor";
    Status = "OK";
    StatusInfo = 3;
    Stepping = "6";
    SystemCreationClassName = "Win32_ComputerSystem";
    SystemName = "GOLDB";
    UpgradeMethod = 6;
    Version = "Model 13, Stepping 6";
    VoltageCaps = 2;
};



Here I use it in a script that prints CPU utilization every 5 seconds:

import wmi
import time

c = wmi.WMI()
while True:
    for cpu in c.Win32_Processor():
        timestamp = time.strftime('%a, %d %b %Y %H:%M:%S', time.localtime())
        print '%s | Utilization: %s: %d %%' % (timestamp, cpu.DeviceID, cpu.LoadPercentage)
        time.sleep(5)


      
Output looks like this:

Sun, 12 Nov 2006 19:26:25 | Utilization: CPU0: 4 %
Sun, 12 Nov 2006 19:26:31 | Utilization: CPU0: 8 %
Sun, 12 Nov 2006 19:26:37 | Utilization: CPU0: 1 %
Sun, 12 Nov 2006 19:26:43 | Utilization: CPU0: 6 %
Sun, 12 Nov 2006 19:26:49 | Utilization: CPU0: 13 %

#    Comments [0] |
 Tuesday, November 07, 2006

Python - Removing Duplicates From A Sequence

Sequences (lists and tuples) are common data structures used in Python programming.

Here is a simple function that will remove duplicates from a sequence and return a sorted sequence of the unique items:


def remove_dups(seq):
    x = {}
    for y in seq:
        x[y] = 1
    u = x.keys()
    u.sort()
    return u


(Caveat:  It requires that all the sequence elements be hashable, and support equality comparison)


And another implementation (not sure which is better):

def remove_dups(seq):
    u = [x for x in seq if x not in locals()['_[1]']]
    u.sort()
    return u



Example using them from the Python Interpreter:

>>> my_seq = [1, 1, 3, 1, 2, 2, 7.75, 'foo', 7.75, 'foo']
>>> print remove_dups(my_seq)
[1, 2, 3, 7.75, 'foo']




Tim Peters has an excellent recipe in the Python Cookbook that dives into this much further:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
#    Comments [0] |
 Sunday, November 05, 2006

Dynamic Languages On The CLR - IronPython For ASP.NET

OK... Dynamic Languages on the .NET CLR are getting serious:
(especially Python)

IronPython

... and infiltrating the .NET stack further:

IronPython for ASP.NET

The New Dynamic Language Extensibility Model for ASP.NET

#    Comments [0] |