goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Wednesday, April 16, 2008

Python - Slurping CSV Files Into Nested Lists

When working with data sets, a common task I need to do is slurp a csv file into a nested data structure that contains a sequence of lists correlating to the rows and values in the csv file.

For example...

File contents (foo.csv):

10,20,30,40
19,29,39,55
16,21,31,59

Result:

[['10', '20', '30', '40'],
['19', '29', '39', '55'],
['16', '21', '31', '59']]

To accomplish this. you could parse it inside a big honkin' list comprehension and build our structure in one step:

csv_file = 'foo.csv'
value_lists = [line.split(',') for line in
[line.strip() for line in open(csv_file, 'r').readlines()]]

You could also use the csv module from Python's standard library:

import csv
csv_file = 'foo.csv'
value_lists = list(csv.reader(open(csv_file, 'r')))

The csv module has some useful tools for reading/writing csv files.  Check it out.

#    Comments [1] |

Developer/Testers Are Hard To Find

Jesse Noller just blogged "Finding Python people is hard":

Here is a good quote regarding the difficulty in finding skilled Test Engineers with Python experience:

"Either you teach QA people automation/test engineering, or you try to find a programmer who wants to learn/do test engineering and teaching them python. It's a hard sell either way. I technically view QA as one discipline, Development as another, but Test Engineering as the Hybrid of the two - and you need a strong background in both."

I have seen lots of QA Engineers and Testers with little to no development/programming experience. This seems to be such a valuable skill; why not learn some? The bar is set really low with today's dynamic languages. Getting into some quick scripting for data manipulation and building test harnesses is not a huge task. If a QA engineer can't learn some simple programming in a week, would you trust his efficiency and technical skills?

I agree with Jesse on this one. We need to see more Test Engineers and Developers In Test. Unfortunately, this hybrid roll often falls through cracks as many people view quality/testing vs. developing as a binary choice.

#    Comments [5] |
 Monday, April 14, 2008

Pylot 1.1 - New Release With Test Case Recorder

New Pylot 1.1 release is available
Visit: www.pylot.org/download.html

It contains some minor code cleanup and a new test case recorder contributed by David Solomon. The recorder works with Windows and IE only.

It is a script that launches your web browser and records HTTP requests as you navigate. While it records, it prints Pylot's XML test cases. The test cases are printed to STDOUT, so just redirect it to a file and you will have a valid testcases.xml file to use as Pylot input.

The pylot_recorder script is included in the lib directory of Pylot 1.1.

View the recorder's source code from the SVN trunk:
http://code.google.com/p/pylt/source/browse/trunk/lib/pylot_recorder.py

(It can't handle some complex scenarios, but is useful for recording simple GET and POST requests from web applications)

#    Comments [4] |
 Thursday, April 10, 2008

Split A List Into Roughly Equal Sized Pieces

The Python Cookbook has a recipe for splitting a list into roughly equal-sized pieces:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/425397

In the comments, there are several alterate implementations. Sebastian Hempel has an interesting take on it using slicing for the calculation of the list lengths. It basically looks like this:

def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

This version of the function distributes the remaindered items evenly over the first few splits.

Example Usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
#    Comments [2] |
 Wednesday, April 09, 2008

Python - Host/Device Ping Utility for Windows

This script uses your system's ping utility to send an ICMP ECHO_REQUEST to a list of hosts or devices. It uses a separate thread to ping each host/device. This can be useful for measuring network latency and verifying hosts are alive.

Check out more info here: http://www.goldb.org/python_pinger.html


#!/usr/bin/env python

import re
from subprocess import Popen, PIPE
from threading import Thread


class Pinger(object):
    def __init__(self, hosts):
        for host in hosts:
            pa = PingAgent(host)
            pa.start()
        
class PingAgent(Thread):
    def __init__(self, host):
        Thread.__init__(self)        
        self.host = host

    def run(self):
        p = Popen('ping -n 1 ' + self.host, stdout=PIPE)
        m = re.search('Average = (.*)ms', p.stdout.read())
        if m: print 'Round Trip Time: %s ms -' % m.group(1), self.host
        else: print 'Error: Invalid Response -', self.host
              
                             
if __name__ == '__main__':
    hosts = [
        'www.pylot.org',
        'www.goldb.org',
        'www.google.com',
        'www.this_one_wont_work.com'
       ]
    Pinger(hosts)

Output:

Round Trip Time: 14 ms - www.yahoo.com
Round Trip Time: 17 ms - www.goldb.org
Round Trip Time: 30 ms - www.google.com
Round Trip Time: 82 ms - www.pylot.org
Error: Invalid Response - www.this_one_wont_work.com

Note: I only tested this on Windows. To run on other systems, it would only require a one-line change.

#    Comments [3] |
 Thursday, March 20, 2008

Transitioning To Python From Java or C#

"compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain."
    - Phillip J. Eby

If you are new to Python and coming from Java (or C#, or other similar statically typed OO language), these classic articles from PJE and Ryan Tomayko are necessary reading:

#    Comments [0] |
 Sunday, March 16, 2008

Dynamic Languages Don't Live In The Scripting Language Ghetto

I love this sarcastic rant by Ryan Tomayko from March 2006.  In the article, he rubutts some of James Gosling's comments regarding Java vs. dynamic languages.  Ryan pokes some fun at comments about dynamic languages and why they should be taken seriously; rather than thrown into the "scripting language ghetto".

My favorite part:

"Dealing with questions on dynamic languages:

First, call anything not statically compiled a “scripting language”. Attempt to insinuate that all languages without an explicit compilation step are not to be taken seriously and that they are all equivalently shitty. Best results are achieved when you provide no further explanation of the pros and cons of static and dynamic compilation and/or typing and instead allow the reader to simply assume that there are a wealth of benefits and very few, if any, disadvantages to static compilation. While the benefits of dynamic languages–first realized millions of years ago in LISP and Smalltalk–are well understood in academia, IT managers and Sun certified developers are perfectly accepting of our static = professional / dynamic = amateurish labeling scheme.

This technique is also known to result in dynamic language advocates going absolute bat-shit crazy and making complete fools of themselves. There have been no fewer than three head explosions recorded as a result of this technique.

Also, avoid the term “dynamic language” at all cost. It’s important that the reader not be exposed to the concepts separating scripting languages like bash, MS-DOS batch, and perl-in-the-eighties from general purpose dynamic languages like Ruby, Python, Smalltalk, and Perl present day."

#    Comments [0] |
 Tuesday, March 04, 2008

Python - Bytes Received and Transmitted for Windows

This script will output bytes received and transmitted for a local Windows machine since the last reboot:


import re
from subprocess import Popen, PIPE

p = Popen('net statistics workstation', stdout=PIPE)
for line in p.stdout:
    m = re.search('Bytes received\W+(.*)', line)
    if m: print 'Bytes received: %s' % (m.group(1))
    m = re.search('Bytes transmitted\W+(.*)', line)
    if m: print 'Bytes transmitted: %s' % (m.group(1))
#    Comments [0] |

Python - Get Last Windows Reboot Date/Time

This script will output the last reboot date/time for a local Windows machine:


import re
from subprocess import Popen, PIPE

p = Popen('net statistics workstation', stdout=PIPE)
m = re.search('(\d+/\d+/\d{4}.*[A|P]M)', p.stdout.read())
if m: print 'Last Reboot: %s' % (m.group(1)) 

Output:

>> Last Reboot: 3/1/2008 1:51:41 PM





* updated the original script thanks to Ian's comment below

#    Comments [2] |
 Monday, March 03, 2008

Python - Padding Single Digits In Dates

Here is how to zero-pad single digit days or months in a date string:


date = '3/2/2008'
padded_date = time.strftime('%m/%d/%Y', time.strptime(date,'%m/%d/%Y'))
print padded_date
>> 03/02/2008
#    Comments [2] |
 Sunday, March 02, 2008

Python - Palindrome Checker

A palindrome is a sequence that reads the same in either direction.

Here is function I wrote to check if a phrase is a palindrome:


import re

def is_palindrome(txt):
    txt = re.sub('\W+', '', txt).lower()
    return txt == txt[::-1]



phrase = "Go hang a salami, I'm a lasagna hog"
print is_palindrome(phrase)

>> True
#    Comments [4] |
 Friday, February 15, 2008
 Tuesday, February 12, 2008

Python - 15 Line HTTP Server - Web Interface For Your Tools

I write a lot of command line tools and scripts in Python. Sometimes I need to kick them off remotely. A simple way to do this is to launch a tiny web server that listens for a specific request to start the script.

I add a "WebRequestHandler" class to my script and call it from my main method. There is a "do_something()" method in the class. You call your code from this method.

All you have to do is launch your script and it will sit there and wait for requests. If the request is bad, it spits back a 404 error. If the request path matches what we are looking for (in this case "/foo"), the code is launched.

Now you have an easy way to call your script remotely. Just open a browser and type in the URL: http://your_server/foo, or call it with a tool like 'wget' or 'curl'.


import BaseHTTPServer

class WebRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/foo':
            self.send_response(200)
            self.do_something()
        else: 
            self.send_error(404)
            
    def do_something(self):
        print 'hello world'
        
server = BaseHTTPServer.HTTPServer(('',80), WebRequestHandler)
server.serve_forever()

(this was adapted from a code sample in "Python In A Nutshell" by Alex Martelli)

#    Comments [1] |
 Monday, February 11, 2008

Python - Terminating Threads - Boolean Flag and threading.Event()

In many programming languages you can't terminate a thread directly.  Python is no different.  Rather than termintaing a thread from the code that spawned it, you just a pass a flag to the thread that tells it to terminate itself.  Typically a thread will run in a loop, periodically checking this flag so it knows if it should continue or not.  To terminate the thread from the outside, you just set its flag to die.

I was using this idiom in Python by setting a boolean flag in my spawned thread.

So a simplified thread class would look something like this:


class MyThread(threading.Thread):
    def __init__(self, num):
        threading.Thread.__init__(self)
        self.running = True
        self.num = num
        
    def stop(self):
        self.running = False
        
    def run(self):
        while self.running:
            print 'hello from thread %d' % self.num
            time.sleep(1)

I just read an old post in comp.lang.python that pointed to a recipe in the Python Cookbook that suggests using threading.Event() rather than a simple boolean flag.

So the thread class would look something like this:


class MyThread(threading.Thread):
    def __init__(self, num):
        threading.Thread.__init__(self)
        self.stop_event = threading.Event()
        self.num = num
        
    def stop(self):
        self.stop_event.set()
        
    def run(self):
        while not self.stop_event.isSet():
            print 'hello from thread %d' % self.num
            time.sleep(1)

They work exactly the same.

I am just wondering what other flexibility threading.Event() gives you, and if there is anything bad about using simple boolean checks to kill threads. I guess I will have to look it up and play around a bit.

#    Comments [5] |
 Sunday, February 10, 2008

Rockin' Python 3000 Alpha (3.0a2)

I just installed the latest Alpha of Python 3000.

So far so good...

#    Comments [0] |
 Wednesday, February 06, 2008

C# .NET 2.0 HTTP GET Class

Sending HTTP Requests from a C# program seems unnecessarily hard.  I wrote a small helper class to deal with sending and timing GET requests:
http://www.goldb.org/httpgetcsharp.html

You use it like this:


public class Program
{
    static void Main(string[] args)
    {
        HTTPGet req = new HTTPGet();
        req.Request("http://www.google.com");
        Console.WriteLine(req.StatusLine);
        Console.WriteLine(req.ResponseTime);
    }
}
#    Comments [2] |
 Tuesday, February 05, 2008

Python - Convert Secs Into Human Readable Time String (HH:MM:SS)

Convert a number of seconds into a human readable time string HH:MM:SS

7046 seconds is: 1 hour 57 mins 26 secs, or 01:57:26

The Function:

def humanize_time(secs):
mins, secs = divmod(secs, 60)
hours, mins = divmod(mins, 60)
return '%02d:%02d:%02d' % (hours, mins, secs)

The Output:

print humanize_time(7046)
>> 01:57:26
#    Comments [5] |
 Tuesday, December 18, 2007

The Python Papers - Screen Scraping Article

The new issue of the Python Papers is out.  It includes a small article I wrote called: Screen Scraping Web Pages

The issue can be downloaded here:  The Python Papers, Volume 2, Issue 4 (pdf)

This tutorial shows how to programmatically retrieve a stock quote from Google Finance.  It uses Python's high level Web API and screen scraping with regular expressions.
#    Comments [2] |
 Monday, December 17, 2007

Python Experts - Why They Do Python

I was recently interviewed for the article:
Python Experts - Why They Do Python

I don't think I am even close to an "expert", but it was nice being asked to participate.

#    Comments [0] |
 Tuesday, November 27, 2007

Python - Extracting Files From Zip Archives

Here is a way to unzip files in Python.  If you have a zip containing multiple files, you can unzip it like this:

import zipfile

fh = open('foo.zip', 'rb')
z = zipfile.ZipFile(fh)
for name in z.namelist():
outfile = open(name, 'wb')
outfile.write(z.read(name))
outfile.close()
fh.close()
#    Comments [6] |
 Monday, November 26, 2007

wxPython - Hello World!

Here is a simple example for those getting started with Python GUI Programming, wxWidgets, and the wxPython Bindings.

This small program will display a Frame and the static text "Hello World!", positioned with a BoxSixer.

Output looks like this:



#!/usr/bin/env python

import wx

class Application(wx.Frame):
    def __init__(self, parent):
        wx.Frame.__init__(self, parent, -1, 'My GUI', size=(300, 200))
        panel = wx.Panel(self)
        sizer = wx.BoxSizer(wx.VERTICAL)
        panel.SetSizer(sizer)
        txt = wx.StaticText(panel, -1, 'Hello World!')
        sizer.Add(txt, 0, wx.TOP|wx.LEFT, 20)
        self.Centre()
        self.Show(True)

app = wx.App(0)
Application(None)
app.MainLoop()
#    Comments [0] |
 Wednesday, November 14, 2007

Regex Capture Groups In Python and Perl

I am a Python programmer and ex-Perl hacker.

Regular Expressions are possibly the quintessential feature of Perl and are directly part of the language syntax.

Rather than being part of the syntax, Python's Regular expressions are available via the 're' module. For some reason, I had some trouble figuring out matching groups when I first started using Python's Regular Expressions.

He are examples of extracting capture groups in both Perl and Python.

Lets say we have a string containing a date: '11/14/2007', and we want to capture only the year from this string.

A regex to match this format might be something like this:

[0-9]{2}/[0-9]{2}/[0-9]{4}

We can then put parenthesis around the piece we want to extract (the 4-digit year) to denote a capture group.

So now our regex would look like this:

[0-9]{2}/[0-9]{2}/([0-9]{4})


Perl Example:

$foo = '11/14/2007';

if ($foo =~ m^[0-9]{2}/[0-9]{2}/([0-9]{4})^) {
    print $1;
}

output:

2007

* Note the string we captured ended up in the special variable $1


Python Example:

import re

foo = '11/14/2007'

match = re.search('[0-9]{2}/[0-9]{2}/([0-9]{4})', foo)
if match:
    print match.group(1)

output:

2007

* Note the string we captured ended up in a match object, which can be accessed with the 'group()' method.

#    Comments [6] |
 Wednesday, November 07, 2007

Python - Processing Large Text Files One Line At A Time

I want to process some very large text files one line at a time.  Normally when I process text files, I slurp them into a list using the readlines() method.   However, sometimes the files are huge and it isn't feasible or optimal to read the entire content into memory upfront.   In this case, it makes sense to process them one line at a time.

The best solution I can come up with is this:


fh = open('foo.txt', 'r')
line = fh.readline()
while line:
    # do something here
    line = fh.readline()

It doesn't feel very pythonic/idiomatic.  Anyone have a better solution?


Update
Thanks to the comments below, I found a few different ways to do it. The best and most Pythonic way seems to be this:


for line in open('foo.txt', 'r'):
    # do something here

Python file objects support the iterator protocol, so you can just open it and go.   This is the same as using a while loop and calling readline() but more compact.

#    Comments [7] |
 Wednesday, October 31, 2007

Which Version Of Python Ships With Mac OS X Leopard?

I am not a Mac user, but in case anyone is interested in knowing which version of Python ships with OS X Leopard, the answer is Python 2.5.

#    Comments [0] |
 Wednesday, October 24, 2007

Python - List Comprehensions Leak Variables

One thing to remember when using List Comprehensions is that they "leak" their temporary iteration variable to the outside.

what does that mean?

In the following example, we still have access to 'x' after we run the list comprehension.

foo = ['a', 'b', 'c']
my_list = [x for x in foo]
print x

output:
>> c

This behaviour is different from how a Generator Expression works. We could have wrote the List Comprehension using a Generator Expression like this:

my_list = list(x for x in foo)

Now, the temporary variable we used is not accessible from outside the scope of the expression.

foo = ['a', 'b', 'c']
my_list = list(x for x in foo)
print x

output:
>> NameError: name 'x' is not defined

Note: This is fixed in Python 3000

#    Comments [5] |
 Thursday, October 18, 2007

Charts And Graphs - Modern Solutions

To all the chart/graph/plot/visualization weenies out there...
Here is a great overview of some modern charting and graphing technologies.

Some options I will be exploring:

#    Comments [0] |
 Sunday, October 14, 2007

Python - Simple Multithreaded HTTP Load Generator/Timer

This is a module for generating concurrent requests to an HTTP server.  Each thread makes HTTP GET requests to a single URL at the specified interval.  Threads are added over a given rampup time if you want to generate increasing load.  Response times are printed to STDOUT.  Can be used for cursory performance benchmarking or load testing a web resource.

load_generator.py module

sample usage:


#!/usr/bin/env python

from load_generator import LoadManager

lm = LoadManager()
lm.msg = ('www.example.com', '/')
lm.start(threads=5, interval=2, rampup=2)
#    Comments [3] |
 Wednesday, September 26, 2007

Python - Tk Graph Example

I found a snippet to draw bar graphs in Python using Tk:
http://www.daniweb.com/code/snippet583.html

The output looks like this:


Here is a modified version that creates a bar graph in a Tk panel:

import Tkinter as tk

def graph_points(seq, width=375, height=325):
root = tk.Tk()
c = tk.Canvas(root, width=width, height=height, bg='white')
c.pack()
y_stretch = 15
y_gap = 20
x_stretch = 10
x_width = 20
x_gap = 20
for x, y in enumerate(data):
x0 = x * x_stretch + x * x_width + x_gap
y0 = height - (y * y_stretch + y_gap)
x1 = x * x_stretch + x * x_width + x_width + x_gap
y1 = height - y_gap
c.create_rectangle(x0, y0, x1, y1, fill="red")
c.create_text(x0+2, y0, anchor=tk.SW, text=str(y))
root.mainloop()

data = (18, 15, 10, 7, 5, 4, 2, 5, 8, 10, 13)
graph_points(data)
#    Comments [0] |
 Monday, September 17, 2007

Old-School Pair Programming And My Inclination To Become A Tester

My first programming course was as an undergrad freshman in 1993. It was the basic introductory programming class for CS majors. The course was pretty difficult and was a great filter to separate the real CS students from the wannabes. About one-third of the students dropped the class, and out of the remaining two-thirds, many changed majors after this course was complete.

The course was taught using Scheme, with SICP as the text book. We programmed on a VAX cluster with Ultrix (DEC's Unix flavor) as the Operating System. We had to learn the Unix shells, VI, and all sorts of fun stuff to get us up and running.

The computer lab ("the cluster") was a large sterile room with rows of green-screen dumb terminals. I remember our professor told us that the VAX had 128 MB of memory and I was blown away by how huge that was (my rippin' fast PC had 4 MB at the time).

Spending hours in the computer lab was no fun at all. But I was one of the lucky ones. I owned a brand new 486-DX33 PC running Windows 3.1. I had a blazing fast 14.4 bps Zoom modem and could use Procomm Plus to dial into the VAX and program from the comfort of my own dorm room. I also found a Scheme interpreter that ran on DOS, giving me further options to do my work offline.

The programming assignments were brutal. All-nighters were the norm. Collaboration on the assignments was encouraged, but we were all expected to turn in our own original work. I found a fellow student that I got along with well and we decided to work together (unfortunately, I don't even remember his name... all I remember is that he was a lot smarter than me).

So the basic workflow was that we would get together, work out the basics of the assignment, get most of the algorithms working, then each take the code and finish it on our own. Since I had the bad-ass PC, we would work in my room. Two things quickly became apparent: He was a much better programmer than me, but I had a better eye for subtle details and debugging. Eventually we settled into a pattern where he would do the programming and I would look over his shoulder to give advice and input. Every few minutes, he would shoot a copy of the code to my dot-matrix printer. I would grab it, go through it line by line, and mark errors with my red pen and hand-write parts of the code that weren't correct. I would then hand the printout back to him and let him enter the changes. We iterated like this until we had all of the core code working.

For some reason, that instinct for attention to detail and debugging has always stuck with me. Because of that, my career has always been influenced by testing. I am a develop/tester, rather than just a developer. Most of the impact I have had in all of my jobs is from creating test tools and bringing in new ways to test software.

Just an interesting observation. I wonder how many others were naturally drawn to testing as soon as they started writing code?

#    Comments [2] |

Python - Yahoo Stock Quote Module

Last week I wrote a small Python module for retrieving stock prices.

It used screen scraping to get data from Google Finance.  Yahoo offers stock data in a much more digestible form which allowed me to get values without screen scraping and regular expressions.  So, I wrote a module based around this.

This new module is much more comprehensive and exposes a Python API for retrieving all sorts of stock data from Yahoo Finance.

My ystockquote module provides a Python API for retrieving stock data from Yahoo Finance.  This module contains the following functions:

  • get_all(symbol)
  • get_price(symbol)
  • get_change(symbol)
  • get_volume(symbol)
  • get_avg_daily_volume(symbol)
  • get_stock_exchange(symbol)
  • get_market_cap(symbol)
  • get_book_value(symbol)
  • get_ebitda(symbol)
  • get_dividend_per_share(symbol)
  • get_dividend_yield(symbol)
  • get_earnings_per_share(symbol)
  • get_52_week_high(symbol)
  • get_52_week_low(symbol)
  • get_50day_moving_avg(symbol)
  • get_200day_moving_avg(symbol)
  • get_price_earnings_ratio(symbol)
  • get_price_earnings_growth_ratio(symbol)
  • get_price_sales_ratio(symbol)
  • get_price_book_ratio(symbol)
  • get_short_ratio(symbol)

Sample Usage:


>>> import ystockquote
>>> print ystockquote.get_price('GOOG')
529.46
>>> print ystockquote.get_all('MSFT')
{'stock_exchange': '"NasdaqNM"', 'market_cap': '268.6B', 
'200day_moving_avg': '29.2879', '52_week_high': '31.84', 
'price_earnings_growth_ratio': '1.45', 'price_sales_ratio': '5.33',
'price': '28.65', 'earnings_per_share': '1.423', 
'50day_moving_avg': '28.7981', 'avg_daily_volume': '55579700',
'volume': '25330856', '52_week_low': '26.48', 'short_ratio': '1.60', 
'price_earnings_ratio': '28.65', 'dividend_yield': '1.38', 
'dividend_per_share': '0.40', 'price_book_ratio': '8.76', 
'ebitda': '20.441B', 'change': '-0.39', 'book_value': '3.315'}

The module is available here:  http://www.goldb.org/ystockquote.html

#    Comments [11] |
 Friday, September 14, 2007

Python - Stock Quote Module

I just wrote a tiny Python module for programmatically retrieving stock quotes from Google Finance:

The module:


import urllib
import re

def get_quote(symbol):
    base_url = 'http://finance.google.com/finance?q='
    content = urllib.urlopen(base_url + symbol).read()
    m = re.search('class="pr".*?>(.*?)<', content)
    if m:
        quote = m.group(1)
    else:
        quote = 'no quote available for: ' + symbol
    return quote


Sample usage:


#!/usr/bin/env python

import stockquote

print stockquote.get_quote('goog')


Output:


>> 529.56
#    Comments [8] |
 Tuesday, September 11, 2007

Python httplib2 - Handling Cookies in HTTP Form Posts

I often need to automate tasks in web based applications.  I like to do this at the protocol level by simulating a real user's interactions via HTTP.  Python comes with two built-in modules for this: urllib (higher level Web interface) and httplib (lower level HTTP interface).

However, I usually don't use either of these.  I prefer to use Joe Gregario's excellent httplib2 module (btw, I really wish this could make its way into Python's Standard Library).  It is a much richer library and has a lot of nice features for dealing with HTTP.  

When automating something, you often need to "login" to maintain some sort of session/state with the server.  This is usually achieved with form-based authentication. You post a form to the server, and it responds with a cookie in the incoming HTTP header.  You need to pass this cookie back to the server in subsequent requests to maintain state or to keep a session alive.

Here is an example of how to deal with cookies when doing your HTTP Post.


First, lets import the modules we will use:


import urllib
import httplib2


Now, lets define the data we will need: In this case, we are doing a form post with 2 fields representing a username and a password.


url = 'http://www.example.com/login'   
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}


Now we can send the HTTP request:


http = httplib2.Http()
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))


At this point, our "response" variable contains a dictionary of HTTP header fields that were returned by the server. If a cookie was returned, you would see a "set-cookie" field containing the cookie value. We want to take this value and put it into the outgoing HTTP header for our subsequent requests:


headers['Cookie'] = response['set-cookie']

Now we can send a request using this header and it will contain the cookie, so the server can recognize us.



So... here is the whole thing in a script. We login to a site and then make another request using the cookie we received:


#!/usr/bin/env python

import urllib
import httplib2

http = httplib2.Http()

url = 'http://www.example.com/login'   
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))

headers = {'Cookie': response['set-cookie']}

url = 'http://www.example.com/home'   
response, content = http.request(url, 'GET', headers=headers)
#    Comments [2] |
 Friday, August 31, 2007

Python 3000 alpha 1 Released!

wow... congrats to Guido and everyone else involved.

get it here:

http://python.org/download/releases/3.0/
#    Comments [0] |
 Wednesday, August 22, 2007

My Text Editor - What SciTE Says About Me

In a recent post: "What does your favorite text editor say about you", the author lists popular text editors and what they say about their users.  Here is the Editor or IDE I use with various programming languages:

Python:  SciTE
Perl:  SciTE
C#:  Visual Studio
Java:  Eclipse

I do all of my writing and a large portion of my programming in a plain old text editor.  Most of the code I write is in Python.  I love using a lightweight text editor instead of a big bloated IDE.  So... I pretty much live inside a text editor.

... and I love SciTE.  It rocks equally on Windows and GNU/Linux.  So what does this say about me?


SciTE:
"Your text editor is lightweight, full featured, extensible and cross platform. In addition, it can work as a stand-alone executable which requires no installation. Fits perfectly with all your other portable tools on your USB thumb drive. You also love how SciTE let’s you write Lua scripts to extend it’s functionality. You take your text editor choice very seriously. You like tinkering, and minimalistic, portable applications."

#    Comments [3] |
 Friday, August 17, 2007

Lines Of Code In Popular Open Source Code Bases

(via Matt Asay)

I found this pretty interesting.

Lines Of Code (LOC) in some popular Open Source code bases:

  • Linux Kernel: 6 million
  • Sun Java Development Kit: 6.5 million
  • Sun StarOffice: 9 million
  • Eclipse: 17 million
#    Comments [1] |
 Friday, July 27, 2007

Recommended Reading For Learning Python

I have the opportunity to spread Python to some junior/newbie programmers. In doing so, I wanted to compile a concise list of reccomended learning materials. The intended audience is someone who has a basic familiarity with programming but no specific Python experience.

There are a ton of books and online materials available, but where should you start? Here is my very brief list:

First Book:

Python Tutorials Online:

#    Comments [5] |

C# - Simple TCP Server

Most of network programming I do is Web/HTTP oriented. So it has been a while since I had to work with TCP and Socket programming directly. Yesterday I needed to write a quick TCP Server. C# and .NET made this really easy to do:


using System;
using System.Text;
using System.Net;
using System.Net.Sockets;

public class TCPServer
{
    private static int port = 8001;

    public static void Main()
    {
        IPAddress ipAddress = IPAddress.Any;
        TcpListener listener = new TcpListener(ipAddress, port);
        listener.Start();
        Console.WriteLine("Server is running");
        Console.WriteLine("Listening on port " + port);
        Console.WriteLine("Waiting for connections...");
        while (true)
        {
            Socket s = listener.AcceptSocket();
            Console.WriteLine("Connection accepted from " + s.RemoteEndPoint);
            byte[] b = new byte[65535];
            int k = s.Receive(b);
            Console.WriteLine("Received:");
            for (int i = 0; i < k; i++)
                Console.Write(Convert.ToChar(b[i]));
            ASCIIEncoding enc = new ASCIIEncoding();
            s.Send(enc.GetBytes("Server responded"));
            Console.WriteLine("\nSent Response");
            s.Close();
    }
}

#    Comments [0] |
 Friday, June 29, 2007

C# - Convert ASCII String To Hex

C# method to convert an ascii string to hex:

public string ConvertToHex(string asciiString)
{
    string hex = "";
    foreach (char c in asciiString)
    {
        int tmp = c;
        hex += String.Format("{0:x2}", (uint)System.Convert.ToUInt32(tmp.ToString()));
    }
    return hex;
}

#    Comments [0] |
 Tuesday, May 29, 2007

Simple Python Web Server Example

Note to self...
use this:

Roll your own server in 50 lines of Python code (by Muharem Hrnjadovic):

"Just in case you wondered why there are so many frameworks in Python land, here’s a basic server (including a request dispatch mechanism) in only 50 lines of code."

Why *not* add a server interface to every tool I write?  :)

#    Comments [0] |
 Wednesday, May 23, 2007

StockQuote Google Gadget - Usage Stats

A few months ago I deployed my StockQuote Google Gadget, which is used for retrieving stock quotes and daily price graphs.

Behind the gadget is a remote .NET/C# service I created which scrapes stock quotes and charts from Google Finance.

You can see it and play with the demo: cgoldberg.googlepages.com

- Add my gadget to your Google Personalized Homepage
- Add my gadget to your own web page

I have been logging usage stats; just to see how many people are using it and how many transactions it is doing. Stats have been collected for about 4 months:

12000 transactions per day and growing fast.. yikes.


Update: My StockQuote gadget is no longer in service.  I Received a takedown notice from Google Finance on 05/23/2007.   umm...  sorta saw that comin' :)

#    Comments [0] |
 Thursday, May 17, 2007

RESTful Web Services - 10 Years of 'Programmable Web' Books

I just got the RESTful Web Services book (Leonard Richardson & Sam Ruby, O'Reilly, 2007) in the mail today.  I've only read the beginning, but so far it is great.  In fact, it brings me back to when I first started working with the "programmable web".  I got into the programmable web back when the web was only a few years old.  I spent years doing performance/scalability testing and tuning for large Web 1.0 applications and bizarre custom Web API's (think huge financial services rushing to get online).  Building tools to run realistic workloads through a system involves writing custom clients to simulate real user/browser interaction.  This is pretty ugly stuff when you are dealing with an application that was designed with only humans in mind (AKA all).  It involves lots of HTTP protocol level work.. screen scraping.. protocol sniffing and analyzing.. requests.. header mangling.. cookie handling.. redirects.. authentication.. session information parsing.. etc, etc.

Application simulation is pretty messy work.  There is no simple API to hide behind; you had to figure out what the API was for yourself.  See.. *every* web application has an API.  Though it might have been designed by accident.  This allowed me to see first hand how developers and frameworks butchered the use of the "Web" as a platform.  Staring at naked HTTP let me see every little bit of the hairball underneath.  Alas, any standardization around web services (or the concept to be officially named) was far off.

A friend (bearded Perl hacker) let me borrow a book to show me how Perl can do this cool web stuff:  Web Client Programming with Perl (Clinton Wong, O'Reilly, 1997).  This book helped me build my first web clients to do application simulation and testing.  There wasn't a ton of documentation at the time to do this sort of thing, so i relied heavily on this book.

So now.. 10 years later..  the Web has changed..  it has morphed into *the* distributed platform..  it is becoming organized.

As I flip through Restful Web Services, it all just looks right..  REST looks right..   It is simple..  it is HTTP..  it is all the guts I already know.  It almost feels like a sequel to my old favorite:

I have traded Perl for Python as my preferred scripting language the past few years, but I am still building simulators, web clients, and virtual users. I am excited to work on some new stuff in this area.

#    Comments [0] |
 Friday, May 11, 2007

Mnesia - Scalable Data Persistence in Erlang

SlideAway - There is a world outside of Ruby on Rails:

"Who needs Oracle/Mysql when you have Mnesia, a free, distributed, in memory database ? The ability to store native Erlang structures out of the box is so liberating: suddenly the need for your object-database mapping layer almost vanishes (well, not 100% to be fairly honest, but a big chunk of it: no need to create a 1-to-n relationship or a n-to-n relationship and a mapping table in many simple cases)

Not to mention that Mnesia supports table replication and is fully distributed, with the ability to add new 'nodes' on the fly. All of this out of the box ! (did I mention it was free too ?) This makes scaling up almost a joke. Compare this to the usual nightmares (and cost) of trying to implement a distributed Mysql/Oracle."


Awesome.