goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Wednesday, April 16, 2008

Python - Slurping CSV Files Into Nested Lists

When working with data sets, a common task I need to do is slurp a csv file into a nested data structure that contains a sequence of lists correlating to the rows and values in the csv file.

For example...

File contents (foo.csv):

10,20,30,40
19,29,39,55
16,21,31,59

Result:

[['10', '20', '30', '40'],
['19', '29', '39', '55'],
['16', '21', '31', '59']]

To accomplish this. you could parse it inside a big honkin' list comprehension and build our structure in one step:

csv_file = 'foo.csv'
value_lists = [line.split(',') for line in
[line.strip() for line in open(csv_file, 'r').readlines()]]

You could also use the csv module from Python's standard library:

import csv
csv_file = 'foo.csv'
value_lists = list(csv.reader(open(csv_file, 'r')))

The csv module has some useful tools for reading/writing csv files.  Check it out.

#    Comments [1] |

Developer/Testers Are Hard To Find

Jesse Noller just blogged "Finding Python people is hard":

Here is a good quote regarding the difficulty in finding skilled Test Engineers with Python experience:

"Either you teach QA people automation/test engineering, or you try to find a programmer who wants to learn/do test engineering and teaching them python. It's a hard sell either way. I technically view QA as one discipline, Development as another, but Test Engineering as the Hybrid of the two - and you need a strong background in both."

I have seen lots of QA Engineers and Testers with little to no development/programming experience. This seems to be such a valuable skill; why not learn some? The bar is set really low with today's dynamic languages. Getting into some quick scripting for data manipulation and building test harnesses is not a huge task. If a QA engineer can't learn some simple programming in a week, would you trust his efficiency and technical skills?

I agree with Jesse on this one. We need to see more Test Engineers and Developers In Test. Unfortunately, this hybrid roll often falls through cracks as many people view quality/testing vs. developing as a binary choice.

#    Comments [5] |
 Monday, April 14, 2008

Pylot 1.1 - New Release With Test Case Recorder

New Pylot 1.1 release is available
Visit: www.pylot.org/download.html

It contains some minor code cleanup and a new test case recorder contributed by David Solomon. The recorder works with Windows and IE only.

It is a script that launches your web browser and records HTTP requests as you navigate. While it records, it prints Pylot's XML test cases. The test cases are printed to STDOUT, so just redirect it to a file and you will have a valid testcases.xml file to use as Pylot input.

The pylot_recorder script is included in the lib directory of Pylot 1.1.

View the recorder's source code from the SVN trunk:
http://code.google.com/p/pylt/source/browse/trunk/lib/pylot_recorder.py

(It can't handle some complex scenarios, but is useful for recording simple GET and POST requests from web applications)

#    Comments [4] |
 Thursday, April 10, 2008

Split A List Into Roughly Equal Sized Pieces

The Python Cookbook has a recipe for splitting a list into roughly equal-sized pieces:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/425397

In the comments, there are several alterate implementations. Sebastian Hempel has an interesting take on it using slicing for the calculation of the list lengths. It basically looks like this:

def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

This version of the function distributes the remaindered items evenly over the first few splits.

Example Usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
#    Comments [2] |
 Wednesday, April 09, 2008

Python - Host/Device Ping Utility for Windows

This script uses your system's ping utility to send an ICMP ECHO_REQUEST to a list of hosts or devices. It uses a separate thread to ping each host/device. This can be useful for measuring network latency and verifying hosts are alive.

Check out more info here: http://www.goldb.org/python_pinger.html


#!/usr/bin/env python

import re
from subprocess import Popen, PIPE
from threading import Thread


class Pinger(object):
    def __init__(self, hosts):
        for host in hosts:
            pa = PingAgent(host)
            pa.start()
        
class PingAgent(Thread):
    def __init__(self, host):
        Thread.__init__(self)        
        self.host = host

    def run(self):
        p = Popen('ping -n 1 ' + self.host, stdout=PIPE)
        m = re.search('Average = (.*)ms', p.stdout.read())
        if m: print 'Round Trip Time: %s ms -' % m.group(1), self.host
        else: print 'Error: Invalid Response -', self.host
              
                             
if __name__ == '__main__':
    hosts = [
        'www.pylot.org',
        'www.goldb.org',
        'www.google.com',
        'www.this_one_wont_work.com'
       ]
    Pinger(hosts)

Output:

Round Trip Time: 14 ms - www.yahoo.com
Round Trip Time: 17 ms - www.goldb.org
Round Trip Time: 30 ms - www.google.com
Round Trip Time: 82 ms - www.pylot.org
Error: Invalid Response - www.this_one_wont_work.com

Note: I only tested this on Windows. To run on other systems, it would only require a one-line change.

#    Comments [3] |
 Thursday, March 20, 2008

Transitioning To Python From Java or C#

"compared to Java code, XML is agile and flexible. Compared to Python code, XML is a boat anchor, a ball and chain."
    - Phillip J. Eby

If you are new to Python and coming from Java (or C#, or other similar statically typed OO language), these classic articles from PJE and Ryan Tomayko are necessary reading:

#    Comments [0] |
 Sunday, March 16, 2008

Dynamic Languages Don't Live In The Scripting Language Ghetto

I love this sarcastic rant by Ryan Tomayko from March 2006.  In the article, he rubutts some of James Gosling's comments regarding Java vs. dynamic languages.  Ryan pokes some fun at comments about dynamic languages and why they should be taken seriously; rather than thrown into the "scripting language ghetto".

My favorite part:

"Dealing with questions on dynamic languages:

First, call anything not statically compiled a “scripting language”. Attempt to insinuate that all languages without an explicit compilation step are not to be taken seriously and that they are all equivalently shitty. Best results are achieved when you provide no further explanation of the pros and cons of static and dynamic compilation and/or typing and instead allow the reader to simply assume that there are a wealth of benefits and very few, if any, disadvantages to static compilation. While the benefits of dynamic languages–first realized millions of years ago in LISP and Smalltalk–are well understood in academia, IT managers and Sun certified developers are perfectly accepting of our static = professional / dynamic = amateurish labeling scheme.

This technique is also known to result in dynamic language advocates going absolute bat-shit crazy and making complete fools of themselves. There have been no fewer than three head explosions recorded as a result of this technique.

Also, avoid the term “dynamic language” at all cost. It’s important that the reader not be exposed to the concepts separating scripting languages like bash, MS-DOS batch, and perl-in-the-eighties from general purpose dynamic languages like Ruby, Python, Smalltalk, and Perl present day."

#    Comments [0] |
 Tuesday, March 04, 2008

Python - Bytes Received and Transmitted for Windows

This script will output bytes received and transmitted for a local Windows machine since the last reboot:


import re
from subprocess import Popen, PIPE

p = Popen('net statistics workstation', stdout=PIPE)
for line in p.stdout:
    m = re.search('Bytes received\W+(.*)', line)
    if m: print 'Bytes received: %s' % (m.group(1))
    m = re.search('Bytes transmitted\W+(.*)', line)
    if m: print 'Bytes transmitted: %s' % (m.group(1))
#    Comments [0] |

Python - Get Last Windows Reboot Date/Time

This script will output the last reboot date/time for a local Windows machine:


import re
from subprocess import Popen, PIPE

p = Popen('net statistics workstation', stdout=PIPE)
m = re.search('(\d+/\d+/\d{4}.*[A|P]M)', p.stdout.read())
if m: print 'Last Reboot: %s' % (m.group(1)) 

Output:

>> Last Reboot: 3/1/2008 1:51:41 PM





* updated the original script thanks to Ian's comment below

#    Comments [2] |
 Monday, March 03, 2008

Python - Padding Single Digits In Dates

Here is how to zero-pad single digit days or months in a date string:


date = '3/2/2008'
padded_date = time.strftime('%m/%d/%Y', time.strptime(date,'%m/%d/%Y'))
print padded_date
>> 03/02/2008
#    Comments [2] |
 Sunday, March 02, 2008

Python - Palindrome Checker

A palindrome is a sequence that reads the same in either direction.

Here is function I wrote to check if a phrase is a palindrome:


import re

def is_palindrome(txt):
    txt = re.sub('\W+', '', txt).lower()
    return txt == txt[::-1]



phrase = "Go hang a salami, I'm a lasagna hog"
print is_palindrome(phrase)

>> True
#    Comments [4] |
 Friday, February 15, 2008
 Tuesday, February 12, 2008

Python - 15 Line HTTP Server - Web Interface For Your Tools

I write a lot of command line tools and scripts in Python. Sometimes I need to kick them off remotely. A simple way to do this is to launch a tiny web server that listens for a specific request to start the script.

I add a "WebRequestHandler" class to my script and call it from my main method. There is a "do_something()" method in the class. You call your code from this method.

All you have to do is launch your script and it will sit there and wait for requests. If the request is bad, it spits back a 404 error. If the request path matches what we are looking for (in this case "/foo"), the code is launched.

Now you have an easy way to call your script remotely. Just open a browser and type in the URL: http://your_server/foo, or call it with a tool like 'wget' or 'curl'.


import BaseHTTPServer

class WebRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
    def do_GET(self):
        if self.path == '/foo':
            self.send_response(200)
            self.do_something()
        else: 
            self.send_error(404)
            
    def do_something(self):
        print 'hello world'
        
server = BaseHTTPServer.HTTPServer(('',80), WebRequestHandler)
server.serve_forever()

(this was adapted from a code sample in "Python In A Nutshell" by Alex Martelli)

#    Comments [1] |
 Monday, February 11, 2008

Python - Terminating Threads - Boolean Flag and threading.Event()

In many programming languages you can't terminate a thread directly.  Python is no different.  Rather than termintaing a thread from the code that spawned it, you just a pass a flag to the thread that tells it to terminate itself.  Typically a thread will run in a loop, periodically checking this flag so it knows if it should continue or not.  To terminate the thread from the outside, you just set its flag to die.

I was using this idiom in Python by setting a boolean flag in my spawned thread.

So a simplified thread class would look something like this:


class MyThread(threading.Thread):
    def __init__(self, num):
        threading.Thread.__init__(self)
        self.running = True
        self.num = num
        
    def stop(self):
        self.running = False
        
    def run(self):
        while self.running:
            print 'hello from thread %d' % self.num
            time.sleep(1)

I just read an old post in comp.lang.python that pointed to a recipe in the Python Cookbook that suggests using threading.Event() rather than a simple boolean flag.

So the thread class would look something like this:


class MyThread(threading.Thread):
    def __init__(self, num):
        threading.Thread.__init__(self)
        self.stop_event = threading.Event()
        self.num = num
        
    def stop(self):
        self.stop_event.set()
        
    def run(self):
        while not self.stop_event.isSet():
            print 'hello from thread %d' % self.num
            time.sleep(1)

They work exactly the same.

I am just wondering what other flexibility threading.Event() gives you, and if there is anything bad about using simple boolean checks to kill threads. I guess I will have to look it up and play around a bit.

#    Comments [5] |
 Sunday, February 10, 2008

Rockin' Python 3000 Alpha (3.0a2)

I just installed the latest Alpha of Python 3000.

So far so good...

#    Comments [0] |
 Wednesday, February 06, 2008

C# .NET 2.0 HTTP GET Class

Sending HTTP Requests from a C# program seems unnecessarily hard.  I wrote a small helper class to deal with sending and timing GET requests:
http://www.goldb.org/httpgetcsharp.html

You use it like this:


public class Program
{
    static void Main(string[] args)
    {
        HTTPGet req = new HTTPGet();
        req.Request("http://www.google.com");
        Console.WriteLine(req.StatusLine);
        Console.WriteLine(req.ResponseTime);
    }
}
#    Comments [2] |
 Tuesday, February 05, 2008

Python - Convert Secs Into Human Readable Time String (HH:MM:SS)

Convert a number of seconds into a human readable time string HH:MM:SS

7046 seconds is: 1 hour 57 mins 26 secs, or 01:57:26

The Function:

def humanize_time(secs):
mins, secs = divmod(secs, 60)
hours, mins = divmod(mins, 60)
return '%02d:%02d:%02d' % (hours, mins, secs)

The Output:

print humanize_time(7046)
>> 01:57:26
#    Comments [5] |
 Tuesday, December 18, 2007

The Python Papers - Screen Scraping Article

The new issue of the Python Papers is out.  It includes a small article I wrote called: Screen Scraping Web Pages

The issue can be downloaded here:  The Python Papers, Volume 2, Issue 4 (pdf)

This tutorial shows how to programmatically retrieve a stock quote from Google Finance.  It uses Python's high level Web API and screen scraping with regular expressions.
#    Comments [2] |
 Monday, December 17, 2007

Python Experts - Why They Do Python

I was recently interviewed for the article:
Python Experts - Why They Do Python

I don't think I am even close to an "expert", but it was nice being asked to participate.

#    Comments [0] |
 Tuesday, November 27, 2007

Python - Extracting Files From Zip Archives

Here is a way to unzip files in Python.  If you have a zip containing multiple files, you can unzip it like this:

import zipfile

fh = open('foo.zip', 'rb')
z = zipfile.ZipFile(fh)
for name in z.namelist():
outfile = open(name, 'wb')
outfile.write(z.read(name))
outfile.close()
fh.close()
#    Comments [6] |
 Monday, November 26, 2007

wxPython - Hello World!

Here is a simple example for those getting started with Python GUI Programming, wxWidgets, and the wxPython Bindings.

This small program will display a Frame and the static text "Hello World!", positioned with a BoxSixer.

Output looks like this:



#!/usr/bin/env python

import wx

class Application(wx.Frame):
    def __init__(self, parent):
        wx.Frame.__init__(self, parent, -1, 'My GUI', size=(300, 200))
        panel = wx.Panel(self)
        sizer = wx.BoxSizer(wx.VERTICAL)
        panel.SetSizer(sizer)
        txt = wx.StaticText(panel, -1, 'Hello World!')
        sizer.Add(txt, 0, wx.TOP|wx.LEFT, 20)
        self.Centre()
        self.Show(True)

app = wx.App(0)
Application(None)
app.MainLoop()
#    Comments [0] |
 Wednesday, November 14, 2007

Regex Capture Groups In Python and Perl

I am a Python programmer and ex-Perl hacker.

Regular Expressions are possibly the quintessential feature of Perl and are directly part of the language syntax.

Rather than being part of the syntax, Python's Regular expressions are available via the 're' module. For some reason, I had some trouble figuring out matching groups when I first started using Python's Regular Expressions.

He are examples of extracting capture groups in both Perl and Python.

Lets say we have a string containing a date: '11/14/2007', and we want to capture only the year from this string.

A regex to match this format might be something like this:

[0-9]{2}/[0-9]{2}/[0-9]{4}

We can then put parenthesis around the piece we want to extract (the 4-digit year) to denote a capture group.

So now our regex would look like this:

[0-9]{2}/[0-9]{2}/([0-9]{4})


Perl Example:

$foo = '11/14/2007';

if ($foo =~ m^[0-9]{2}/[0-9]{2}/([0-9]{4})^) {
    print $1;
}

output:

2007

* Note the string we captured ended up in the special variable $1


Python Example:

import re

foo = '11/14/2007'

match = re.search('[0-9]{2}/[0-9]{2}/([0-9]{4})', foo)
if match:
    print match.group(1)

output:

2007

* Note the string we captured ended up in a match object, which can be accessed with the 'group()' method.

#    Comments [6] |
 Wednesday, November 07, 2007

Python - Processing Large Text Files One Line At A Time

I want to process some very large text files one line at a time.  Normally when I process text files, I slurp them into a list using the readlines() method.   However, sometimes the files are huge and it isn't feasible or optimal to read the entire content into memory upfront.   In this case, it makes sense to process them one line at a time.

The best solution I can come up with is this:


fh = open('foo.txt', 'r')
line = fh.readline()
while line:
    # do something here
    line = fh.readline()

It doesn't feel very pythonic/idiomatic.  Anyone have a better solution?


Update
Thanks to the comments below, I found a few different ways to do it. The best and most Pythonic way seems to be this:


for line in open('foo.txt', 'r'):
    # do something here

Python file objects support the iterator protocol, so you can just open it and go.   This is the same as using a while loop and calling readline() but more compact.

#    Comments [7] |
 Wednesday, October 31, 2007

Which Version Of Python Ships With Mac OS X Leopard?

I am not a Mac user, but in case anyone is interested in knowing which version of Python ships with OS X Leopard, the answer is Python 2.5.

#    Comments [0] |
 Wednesday, October 24, 2007

Python - List Comprehensions Leak Variables

One thing to remember when using List Comprehensions is that they "leak" their temporary iteration variable to the outside.

what does that mean?

In the following example, we still have access to 'x' after we run the list comprehension.

foo = ['a', 'b', 'c']
my_list = [x for x in foo]
print x

output:
>> c

This behaviour is different from how a Generator Expression works. We could have wrote the List Comprehension using a Generator Expression like this:

my_list = list(x for x in foo)

Now, the temporary variable we used is not accessible from outside the scope of the expression.

foo = ['a', 'b', 'c']
my_list = list(x for x in foo)
print x

output:
>> NameError: name 'x' is not defined

Note: This is fixed in Python 3000

#    Comments [5] |
 Thursday, October 18, 2007

Charts And Graphs - Modern Solutions

To all the chart/graph/plot/visualization weenies out there...
Here is a great overview of some modern charting and graphing technologies.

Some options I will be exploring:

#    Comments [0] |
 Sunday, October 14, 2007

Python - Simple Multithreaded HTTP Load Generator/Timer

This is a module for generating concurrent requests to an HTTP server.  Each thread makes HTTP GET requests to a single URL at the specified interval.  Threads are added over a given rampup time if you want to generate increasing load.  Response times are printed to STDOUT.  Can be used for cursory performance benchmarking or load testing a web resource.

load_generator.py module

sample usage:


#!/usr/bin/env python

from load_generator import LoadManager

lm = LoadManager()
lm.msg = ('www.example.com', '/')
lm.start(threads=5, interval=2, rampup=2)
#    Comments [3] |
 Wednesday, September 26, 2007

Python - Tk Graph Example

I found a snippet to draw bar graphs in Python using Tk:
http://www.daniweb.com/code/snippet583.html

The output looks like this:


Here is a modified version that creates a bar graph in a Tk panel:

import Tkinter as tk

def graph_points(seq, width=375, height=325):
root = tk.Tk()
c = tk.Canvas(root, width=width, height=height, bg='white')
c.pack()
y_stretch = 15
y_gap = 20
x_stretch = 10
x_width = 20
x_gap = 20
for x, y in enumerate(data):
x0 = x * x_stretch + x * x_width + x_gap
y0 = height - (y * y_stretch + y_gap)
x1 = x * x_stretch + x * x_width + x_width + x_gap
y1 = height - y_gap
c.create_rectangle(x0, y0, x1, y1, fill="red")
c.create_text(x0+2, y0, anchor=tk.SW, text=str(y))
root.mainloop()

data = (18, 15, 10, 7, 5, 4, 2, 5, 8, 10, 13)
graph_points(data)
#    Comments [0] |
 Monday, September 17, 2007

Old-School Pair Programming And My Inclination To Become A Tester

My first programming course was as an undergrad freshman in 1993. It was the basic introductory programming class for CS majors. The course was pretty difficult and was a great filter to separate the real CS students from the wannabes. About one-third of the students dropped the class, and out of the remaining two-thirds, many changed majors after this course was complete.

The course was taught using Scheme, with SICP as the text book. We programmed on a VAX cluster with Ultrix (DEC's Unix flavor) as the Operating System. We had to learn the Unix shells, VI, and all sorts of fun stuff to get us up and running.

The computer lab ("the cluster") was a large sterile room with rows of green-screen dumb terminals. I remember our professor told us that the VAX had 128 MB of memory and I was blown away by how huge that was (my rippin' fast PC had 4 MB at the time).

Spending hours in the computer lab was no fun at all. But I was one of the lucky ones. I owned a brand new 486-DX33 PC running Windows 3.1. I had a blazing fast 14.4 bps Zoom modem and could use Procomm Plus to dial into the VAX and program from the comfort of my own dorm room. I also found a Scheme interpreter that ran on DOS, giving me further options to do my work offline.

The programming assignments were brutal. All-nighters were the norm. Collaboration on the assignments was encouraged, but we were all expected to turn in our own original work. I found a fellow student that I got along with well and we decided to work together (unfortunately, I don't even remember his name... all I remember is that he was a lot smarter than me).

So the basic workflow was that we would get together, work out the basics of the assignment, get most of the algorithms working, then each take the code and finish it on our own. Since I had the bad-ass PC, we would work in my room. Two things quickly became apparent: He was a much better programmer than me, but I had a better eye for subtle details and debugging. Eventually we settled into a pattern where he would do the programming and I would look over his shoulder to give advice and input. Every few minutes, he would shoot a copy of the code to my dot-matrix printer. I would grab it, go through it line by line, and mark errors with my red pen and hand-write parts of the code that weren't correct. I would then hand the printout back to him and let him enter the changes. We iterated like this until we had all of the core code working.

For some reason, that instinct for attention to detail and debugging has always stuck with me. Because of that, my career has always been influenced by testing. I am a develop/tester, rather than just a developer. Most of the impact I have had in all of my jobs is from creating test tools and bringing in new ways to test software.

Just an interesting observation. I wonder how many others were naturally drawn to testing as soon as they started writing code?

#    Comments [2] |

Python - Yahoo Stock Quote Module

Last week I wrote a small Python module for retrieving stock prices.

It used screen scraping to get data from Google Finance.  Yahoo offers stock data in a much more digestible form which allowed me to get values without screen scraping and regular expressions.  So, I wrote a module based around this.

This new module is much more comprehensive and exposes a Python API for retrieving all sorts of stock data from Yahoo Finance.

My ystockquote module provides a Python API for retrieving stock data from Yahoo Finance.  This module contains the following functions:

  • get_all(symbol)
  • get_price(symbol)
  • get_change(symbol)
  • get_volume(symbol)
  • get_avg_daily_volume(symbol)
  • get_stock_exchange(symbol)
  • get_market_cap(symbol)
  • get_book_value(symbol)
  • get_ebitda(symbol)
  • get_dividend_per_share(symbol)
  • get_dividend_yield(symbol)
  • get_earnings_per_share(symbol)
  • get_52_week_high(symbol)
  • get_52_week_low(symbol)
  • get_50day_moving_avg(symbol)
  • get_200day_moving_avg(symbol)
  • get_price_earnings_ratio(symbol)
  • get_price_earnings_growth_ratio(symbol)
  • get_price_sales_ratio(symbol)
  • get_price_book_ratio(symbol)
  • get_short_ratio(symbol)

Sample Usage:


>>> import ystockquote
>>> print ystockquote.get_price('GOOG')
529.46
>>> print ystockquote.get_all('MSFT')
{'stock_exchange': '"NasdaqNM"', 'market_cap': '268.6B', 
'200day_moving_avg': '29.2879', '52_week_high': '31.84', 
'price_earnings_growth_ratio': '1.45', 'price_sales_ratio': '5.33',
'price': '28.65', 'earnings_per_share': '1.423', 
'50day_moving_avg': '28.7981', 'avg_daily_volume': '55579700',
'volume': '25330856', '52_week_low': '26.48', 'short_ratio': '1.60', 
'price_earnings_ratio': '28.65', 'dividend_yield': '1.38', 
'dividend_per_share': '0.40', 'price_book_ratio': '8.76', 
'ebitda': '20.441B', 'change': '-0.39', 'book_value': '3.315'}

The module is available here:  http://www.goldb.org/ystockquote.html

#    Comments [11] |
 Friday, September 14, 2007

Python - Stock Quote Module

I just wrote a tiny Python module for programmatically retrieving stock quotes from Google Finance:

The module:


import urllib
import re

def get_quote(symbol):
    base_url = 'http://finance.google.com/finance?q='
    content = urllib.urlopen(base_url + symbol).read()
    m = re.search('class="pr".*?>(.*?)<', content)
    if m:
        quote = m.group(1)
    else:
        quote = 'no quote available for: ' + symbol
    return quote


Sample usage:


#!/usr/bin/env python

import stockquote

print stockquote.get_quote('goog')


Output:


>> 529.56
#    Comments [8] |
 Tuesday, September 11, 2007

Python httplib2 - Handling Cookies in HTTP Form Posts

I often need to automate tasks in web based applications.  I like to do this at the protocol level by simulating a real user's interactions via HTTP.  Python comes with two built-in modules for this: urllib (higher level Web interface) and httplib (lower level HTTP interface).

However, I usually don't use either of these.  I prefer to use Joe Gregario's excellent httplib2 module (btw, I really wish this could make its way into Python's Standard Library).  It is a much richer library and has a lot of nice features for dealing with HTTP.  

When automating something, you often need to "login" to maintain some sort of session/state with the server.  This is usually achieved with form-based authentication. You post a form to the server, and it responds with a cookie in the incoming HTTP header.  You need to pass this cookie back to the server in subsequent requests to maintain state or to keep a session alive.

Here is an example of how to deal with cookies when doing your HTTP Post.


First, lets import the modules we will use:


import urllib
import httplib2


Now, lets define the data we will need: In this case, we are doing a form post with 2 fields representing a username and a password.


url = 'http://www.example.com/login'   
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}


Now we can send the HTTP request:


http = httplib2.Http()
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))


At this point, our "response" variable contains a dictionary of HTTP header fields that were returned by the server. If a cookie was returned, you would see a "set-cookie" field containing the cookie value. We want to take this value and put it into the outgoing HTTP header for our subsequent requests:


headers['Cookie'] = response['set-cookie']

Now we can send a request using this header and it will contain the cookie, so the server can recognize us.



So... here is the whole thing in a script. We login to a site and then make another request using the cookie we received:


#!/usr/bin/env python

import urllib
import httplib2

http = httplib2.Http()

url = 'http://www.example.com/login'   
body = {'USERNAME': 'foo', 'PASSWORD': 'bar'}
headers = {'Content-type': 'application/x-www-form-urlencoded'}
response, content = http.request(url, 'POST', headers=headers, body=urllib.urlencode(body))

headers = {'Cookie': response['set-cookie']}

url = 'http://www.example.com/home'   
response, content = http.request(url, 'GET', headers=headers)
#    Comments [2] |
 Friday, August 31, 2007

Python 3000 alpha 1 Released!

wow... congrats to Guido and everyone else involved.

get it here:

http://python.org/download/releases/3.0/
#    Comments [0] |
 Wednesday, August 22, 2007

My Text Editor - What SciTE Says About Me

In a recent post: "What does your favorite text editor say about you", the author lists popular text editors and what they say about their users.  Here is the Editor or IDE I use with various programming languages:

Python:  SciTE
Perl:  SciTE
C#:  Visual Studio
Java:  Eclipse

I do all of my writing and a large portion of my programming in a plain old text editor.  Most of the code I write is in Python.  I love using a lightweight text editor instead of a big bloated IDE.  So... I pretty much live inside a text editor.

... and I love SciTE.  It rocks equally on Windows and GNU/Linux.  So what does this say about me?


SciTE:
"Your text editor is lightweight, full featured, extensible and cross platform. In addition, it can work as a stand-alone executable which requires no installation. Fits perfectly with all your other portable tools on your USB thumb drive. You also love how SciTE let’s you write Lua scripts to extend it’s functionality. You take your text editor choice very seriously. You like tinkering, and minimalistic, portable applications."

#    Comments [3] |
 Friday, August 17, 2007

Lines Of Code In Popular Open Source Code Bases

(via Matt Asay)

I found this pretty interesting.

Lines Of Code (LOC) in some popular Open Source code bases:

  • Linux Kernel: 6 million
  • Sun Java Development Kit: 6.5 million
  • Sun StarOffice: 9 million
  • Eclipse: 17 million
#    Comments [1] |
 Friday, July 27, 2007

Recommended Reading For Learning Python

I have the opportunity to spread Python to some junior/newbie programmers. In doing so, I wanted to compile a concise list of reccomended learning materials. The intended audience is someone who has a basic familiarity with programming but no specific Python experience.

There are a ton of books and online materials available, but where should you start? Here is my very brief list:

First Book:

Python Tutorials Online:

#    Comments [5] |

C# - Simple TCP Server

Most of network programming I do is Web/HTTP oriented. So it has been a while since I had to work with TCP and Socket programming directly. Yesterday I needed to write a quick TCP Server. C# and .NET made this really easy to do:


using System;
using System.Text;
using System.Net;
using System.Net.Sockets;

public class TCPServer
{
    private static int port = 8001;

    public static void Main()
    {
        IPAddress ipAddress = IPAddress.Any;
        TcpListener listener = new TcpListener(ipAddress, port);
        listener.Start();
        Console.WriteLine("Server is running");
        Console.WriteLine("Listening on port " + port);
        Console.WriteLine("Waiting for connections...");
        while (true)
        {
            Socket s = listener.AcceptSocket();
            Console.WriteLine("Connection accepted from " + s.RemoteEndPoint);
            byte[] b = new byte[65535];
            int k = s.Receive(b);
            Console.WriteLine("Received:");
            for (int i = 0; i < k; i++)
                Console.Write(Convert.ToChar(b[i]));
            ASCIIEncoding enc = new ASCIIEncoding();
            s.Send(enc.GetBytes("Server responded"));
            Console.WriteLine("\nSent Response");
            s.Close();
    }
}

#    Comments [0] |
 Friday, June 29, 2007

C# - Convert ASCII String To Hex

C# method to convert an ascii string to hex:

public string ConvertToHex(string asciiString)
{
    string hex = "";
    foreach (char c in asciiString)
    {
        int tmp = c;
        hex += String.Format("{0:x2}", (uint)System.Convert.ToUInt32(tmp.ToString()));
    }
    return hex;
}

#    Comments [0] |
 Tuesday, May 29, 2007

Simple Python Web Server Example

Note to self...
use this:

Roll your own server in 50 lines of Python code (by Muharem Hrnjadovic):

"Just in case you wondered why there are so many frameworks in Python land, here’s a basic server (including a request dispatch mechanism) in only 50 lines of code."

Why *not* add a server interface to every tool I write?  :)

#    Comments [0] |
 Wednesday, May 23, 2007

StockQuote Google Gadget - Usage Stats

A few months ago I deployed my StockQuote Google Gadget, which is used for retrieving stock quotes and daily price graphs.

Behind the gadget is a remote .NET/C# service I created which scrapes stock quotes and charts from Google Finance.

You can see it and play with the demo: cgoldberg.googlepages.com

- Add my gadget to your Google Personalized Homepage
- Add my gadget to your own web page

I have been logging usage stats; just to see how many people are using it and how many transactions it is doing. Stats have been collected for about 4 months:

12000 transactions per day and growing fast.. yikes.


Update: My StockQuote gadget is no longer in service.  I Received a takedown notice from Google Finance on 05/23/2007.   umm...  sorta saw that comin' :)

#    Comments [0] |
 Thursday, May 17, 2007

RESTful Web Services - 10 Years of 'Programmable Web' Books

I just got the RESTful Web Services book (Leonard Richardson & Sam Ruby, O'Reilly, 2007) in the mail today.  I've only read the beginning, but so far it is great.  In fact, it brings me back to when I first started working with the "programmable web".  I got into the programmable web back when the web was only a few years old.  I spent years doing performance/scalability testing and tuning for large Web 1.0 applications and bizarre custom Web API's (think huge financial services rushing to get online).  Building tools to run realistic workloads through a system involves writing custom clients to simulate real user/browser interaction.  This is pretty ugly stuff when you are dealing with an application that was designed with only humans in mind (AKA all).  It involves lots of HTTP protocol level work.. screen scraping.. protocol sniffing and analyzing.. requests.. header mangling.. cookie handling.. redirects.. authentication.. session information parsing.. etc, etc.

Application simulation is pretty messy work.  There is no simple API to hide behind; you had to figure out what the API was for yourself.  See.. *every* web application has an API.  Though it might have been designed by accident.  This allowed me to see first hand how developers and frameworks butchered the use of the "Web" as a platform.  Staring at naked HTTP let me see every little bit of the hairball underneath.  Alas, any standardization around web services (or the concept to be officially named) was far off.

A friend (bearded Perl hacker) let me borrow a book to show me how Perl can do this cool web stuff:  Web Client Programming with Perl (Clinton Wong, O'Reilly, 1997).  This book helped me build my first web clients to do application simulation and testing.  There wasn't a ton of documentation at the time to do this sort of thing, so i relied heavily on this book.

So now.. 10 years later..  the Web has changed..  it has morphed into *the* distributed platform..  it is becoming organized.

As I flip through Restful Web Services, it all just looks right..  REST looks right..   It is simple..  it is HTTP..  it is all the guts I already know.  It almost feels like a sequel to my old favorite:

I have traded Perl for Python as my preferred scripting language the past few years, but I am still building simulators, web clients, and virtual users. I am excited to work on some new stuff in this area.

#    Comments [0] |
 Friday, May 11, 2007

Mnesia - Scalable Data Persistence in Erlang

SlideAway - There is a world outside of Ruby on Rails:

"Who needs Oracle/Mysql when you have Mnesia, a free, distributed, in memory database ? The ability to store native Erlang structures out of the box is so liberating: suddenly the need for your object-database mapping layer almost vanishes (well, not 100% to be fairly honest, but a big chunk of it: no need to create a 1-to-n relationship or a n-to-n relationship and a mapping table in many simple cases)

Not to mention that Mnesia supports table replication and is fully distributed, with the ability to add new 'nodes' on the fly. All of this out of the box ! (did I mention it was free too ?) This makes scaling up almost a joke. Compare this to the usual nightmares (and cost) of trying to implement a distributed Mysql/Oracle."


Awesome.

#    Comments [0] |
 Monday, April 30, 2007

I Am LISP?

I just took the "Which Programming Language Are You?" quiz. Was hoping to be Python.

Apparently I am LISP?

You are Lisp.  Very few people like you (Probably because you use too many parenthesis (You better stop it (Reallly)))
Which Programming Language are You?

#    Comments [2] |
 Thursday, March 29, 2007

Python - Remove Duplicate Items From a Sequence

Say you have a sequence like:

[1, 1, 2, 2, 2, 3, 4, 4, 4]

... and you want a sequence containing all the unique items (remove duplicates) like:

[1, 2, 3, 4]


Here is a function to do it:

def remove_dups(seq):
    x = {}
    for y in seq:
        x[y] = 1
    u = x.keys()
    return u


or a one-liner:

u = [x for x in seq if x not in locals()['_[1]']]



update: in the comments below, some other ways were suggested..

with 'set'.. like this:

u = list(set(seq))

or with a dictionary.. like this:

u = dict.fromkeys(seq).keys()
#    Comments [4] |
 Friday, March 23, 2007

Python - Creating Bar Graphs with Matplotlib

Matplotlib is an open source 2D plotting library for Python.  It is very impressive and robust, but the API and documentation is maddeningly difficult to follow.

Here I have provided a function that will create a bar graph [as a png image] from a Python dictionary using the Matplotlib API.

It will auto-size the bars and auto-adjust the axis labels for you. All you need to pass into it is a dictionary data structure (and optionally a graph title and output name).


We start with a Python dictionary like this:

{'A': 70, 'B': 290, 'C': 130}


... and the function will use Matplotlib to create a graph like this:


Here is a sample script that uses my function:


#!/usr/bin/env python

from pylab import *

def main():  
    my_dict = {'A': 70, 'B': 290, 'C': 130}
    bar_graph(my_dict, graph_title='ABC')


def bar_graph(name_value_dict, graph_title='', output_name='bargraph.png'):
    figure(figsize=(4, 2)) # image dimensions  
    title(graph_title, size='x-small')
   
    # add bars
    for i, key in zip(range(len(name_value_dict)), name_value_dict.keys()):
        bar(i + 0.25 , name_value_dict[key], color='red')
   
    # axis setup
    xticks(arange(0.65, len(name_value_dict)),
        [('%s: %d' % (name, value)) for name, value in
        zip(name_value_dict.keys(), name_value_dict.values())],
        size='xx-small')
    max_value = max(name_value_dict.values())
    tick_range = arange(0, max_value, (max_value / 7))
    yticks(tick_range, size='xx-small')
    formatter = FixedFormatter([str(x) for x in tick_range])
    gca().yaxis.set_major_formatter(formatter)
    gca().yaxis.grid(which='major')
   
    savefig(output_name)


if __name__ == "__main__":
    main()


enjoy.

-Corey

#    Comments [6] |
 Thursday, March 22, 2007

Python - Convert Date/Time to Epoch

I'm not sure why, but this took me forever to figure out; so I'm posting it here for others...

Let's say you have a string representing a date and a time and you want to convert it to epoch time (# secs since the epoch).

First you will need to create a pattern for your time format, using time format directives.

For example, the pattern for:

'2007-02-05 16:15:18'

Would be:

'%Y-%m-%d %H:%M:%S'

You can then convert it to epoch like this:

int(time.mktime(time.strptime('2007-02-05 16:15:18', '%Y-%m-%d %H:%M:%S')))


Now in a script:

#!/usr/bin/env python

import time

date_time = '2007-02-05 16:15:18'
pattern = '%Y-%m-%d %H:%M:%S'
epoch = int(time.mktime(time.strptime(date_time, pattern)))
print epoch
#    Comments [0] |
 Saturday, March 17, 2007

Python3000 vs. Perl6 ... Wanna Bet?

Perl6...
Python3000...

Both are redesigns of very popular dynamic/scripting languages.  Both have very strong, though very different, communities supporting them.

Out of the gate, Perl's plans were much more ambitious, including a new generic virtual machine.  Python's plans were more pragmatic; more of a language cleanup than a drastic redesign.

Perl 6 was officially announced nearly 7 years ago and I don't see a stable production release coming *any* time soon.  On the other hand, the idea of Python3000 was sorta tossed around for a while and swung into gear 2 years ago.

Guido (Python's BDFL) has been spearheading the effort, whereas Perl's leadership structure is much more anarchic (Where is Larry Wall these days?).  Guido has been very transparent and kept the community aware of his worries.

Some people saw this as a slippery slope...

Chromatic:

"Language redesign is difficult, isn’t it?  Once you start challenging base assumptions, you find that a lot of your previous conclusions are shaky, and good luck reigning in blue-sky ideas!

See you in 2007… or 2008… or 2009.

Best wishes,
a Perl 6 hacker"

I disagree..

I'd bet anyone money that I will be hacking on a stable release of Python3000 long before I'm using a stable version of Perl6... any takers?


(disclosure: I have written lots of code in both Perl and Python and am a fan of both)

#    Comments [6] |
 Wednesday, March 14, 2007

Regex "Match" in Python vs. C#

I have been writing a lot of code in both C# and Python lately... flipping back and forth between both languages.  One thing I keep getting tripped up on is the terminology used in regular expression syntax, and what a "match" is.

So for my own disambiguation:

  • Python's re.match() is different than C#'s Regex.IsMatch()
  • Python's re.search() is similar to C#'s Regex.IsMatch()


Better explained in code:


Using Regex.IsMatch() in C# to match a pattern with some text:

if (Regex.IsMatch("foobar", "bar"))
{
    Console.WriteLine("Match");
}
else
{
    Console.WriteLine("No Match");
}

this prints 'Match'


Same thing, using re.match() in Python:

if re.match('bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'No Match'


oops.. didn't get a match. What happened?

match() only checks if the regex matches at the beginning of the string, while search() will scan forward through the string for a match.


If you were expecting the pattern to match anywhere in the string, you need to use re.search() instead:

if re.search('bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'Match'


... or else you must supply a pattern that will match from the beginning of the string:

if re.match('.*bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'Match'


#    Comments [0] |
 Saturday, March 10, 2007

Python - Iterating Multiple Sequences

Here are some examples of iterating through multiple sequences simultaneously in Python:


I start with 2 lists of numbers:

foos = [0, 1, 2, 3, 4]
bars = [1, 2, 3, 4, 5]

I want to create a new list that is made up of the sum of the items at each position in the original lists.  So I will end up with this:

>>> print foobars

[1, 3, 5, 7, 9]


Starting with an unpythonic way...
Here I use a counter to iterate through the indexes of each sequence and build a new list:

foobars = []
for i in range(len(foos)):
    foo = foos[i]
    bar = bars[i]
    foobars.append(foo + bar)


Getting more pythonic...
Here I use zip. Zip allows me to iterate each sequence simultaneously, assigning the current sequence values each time through the loop:

foobars = []
for foo, bar in zip(foos, bars):
    foobars.append(foo + bar)


The older pythonic way to do this was with map:

foobars = []
for foo, bar in map(None, foos, bars):
    foobars.append(foo + bar)


Getting even more pythonic and more concise...
I can combine zip with a list comprehension and do it in a one-liner like this:

foobars = [foo + bar for (foo, bar) in zip(foos, bars)]


*note:  zip will not be part of Python 3000.  It will be replaced by izip and iterators to achieve similar results.

#    Comments [2] |
 Thursday, March 08, 2007

PLEAC - Programming Language Examples Alike Cookbook

I just stumbled across the PLEAC Project (Programming Language Examples Alike Cookbook).

Project Description:

"Following the great Perl Cookbook (by Tom Christiansen & Nathan Torkington, published by O'Reilly; you can freely browse an excerpt of the book here) which presents a suite of common programming problems solved in the Perl language, this project aims to gather fans of programming, in order to implement the solutions in other programming languages."


There is sample code in many popular languages.  The Python examples are really good.  They would serve as an excellent primer for someone moving from Perl to Python, or as a general Python reference with cookbook-style examples.

It is hosted at SourceForge and licensed under the GNU Free Documentation License (GFDL).

#    Comments [0] |
 Monday, March 05, 2007

.NET CLR - Covertly Throttling Thread Creation

Joe Duffy (from Microsoft) talking about the .NET 2.0 CLR:

"It's also worth noting that the threadpool throttles its creation of threads to 2/second once the count has exceeded the # of CPUs."


Yuck...  I don't like throttling like that behind the scenes.  It can make performance problems *very* hard to diagnose.

#    Comments [2] |
 Sunday, March 04, 2007

Python Parameters - Pass-By-Value or Pass-By-Reference?

Passing parameters to functions and methods.  Pass-by-value?  Pass-by-reference?  Which does your language use?

You probably learned this in your first CS class... so did I.

Then why did it take me a frakin' month to understand what Python does? :)

Well... if you look online, you will find some very ambiguous answers about Python being pass-by-reference or pass-by-value.  (which ends up boiling down to semantics and how you use certain terminology, but forget that for now)


To review, how do other languages handle this concept?

C is pass-by-value

Straightforward. You can simulate pass-by-reference with pointers.  Not much else to say here.

Java is pass-by-value

Primitive Types (non-object built-in types) are simply passed by value.  Passing Object References feels like pass-by-reference, but it isn't.  What you are really doing is passing references-to-objects by value.

OK, so what about Python?

Python passes references to objects by value (like Java), and everything in Python is an object. This sounds simple, but then you will notice that some data types seem to exhibit pass-by-value characteristics, while others seem to act like pass-by-reference... what's the deal?

It is important to understand mutable and immutable objects. Some objects, like strings, tuples, and numbers, are immutable.  Altering them inside a function/method will create a new instance and the original instance outside the function/method is not changed.  Other objects, like lists and dictionaries are mutable, which means you can change the object in-place.  Therefore, altering an object inside a function/method will also change the original object outside.



For entirely too much information about this topic in Python and across many other languages (Java, Scheme, C#, C, C++, Python), read the thread where these quotes come from:

Is Python By Value Or By Reference?

Alex Martelli:

The terminology problem may be due to the fact that, in python, the value of a name is a reference to an object. So, you always pass the value (no implicit copying), and that value is always a reference.
[...]
Now if you want to coin a name for that, such as "by object reference", "by uncopied value", or whatever, be my guest. Trying to reuse terminology that is more generally applied to languages where "variables are boxes" to a language where "variables are post-it tags" is, IMHO, more likely to confuse than to help.

Michael Hoffman:

Alex is right that trying to shoehorn Python into a "pass-by-reference" or "pass-by-value" paradigm is misleading and probably not very helpful. In Python every variable assignment (even an assignment of a small integer) is an assignment of a reference. Every function call involves passing the values of those references.


word.

#    Comments [0] |
 Wednesday, February 28, 2007

Reading Outlook/Exchange Email Programatically with Python

With Python's Windows Extensions, you can talk via COM to an Exchange Server and read/process your email.  You must have the Outlook Client installed on the box you are running this from.

Here is a sample script that will:

  • connect to your mailbox
  • print the inbox name
  • print the message count
  • print the subjects for all your email messages


#!/usr/bin/env python

from win32com.client import Dispatch

session = Dispatch("MAPI.session")
session.Logon('OUTLOOK')  # MAPI profile name
inbox = session.Inbox

print "Inbox name is:", inbox.Name
print "Number of messages:", inbox.Messages.Count

for i in range(inbox.Messages.Count):
    message = inbox.Messages.Item(i + 1)
    print message.Subject


#    Comments [0] |
 Monday, February 26, 2007

Python 3000 Video and Slides

Guido van Rossum just published his slides from the PyCon 2007 Keynote where he discusses Python 3000.

His talk is also available on Google Video.

I'm psyched to watch this!

#    Comments [0] |
 Sunday, February 25, 2007

Wes Dyer on Type Systems

Awesome post by Wes Dyer about Type Systems:
Types Of Confusion


He tackles a lot of misconceptions and myths about Typing.

To frame his points, he starts with some fantastic definitions.  I am all for clarifying vocabulary and I really like this:


Based on the apparent confusion, I think it is best to clarify what I mean by each of the following terms:

    Type Checking - Verifying that code respects type constraints.

    Statically Typed - Type checking occurs at compile time.

    Dynamically Typed - Type checking occurs at run time.

    Type Safe Language - A language which protects its own abstractions.

    Type Unsafe Language - A language which is not type safe.

    Strongly Typed and Weakly Typed - Depends on the author; The definitions are so many and so varied that the terms are practically useless. It seems that anyone can claim that language X is either strongly typed or weakly typed based on sound reasoning derived from one of the various definitions.

    Dynamic Language - A language which enables runtime inspection or modification of a program; most languages can do this but dynamic languages make it easy. It is common for people to refer to "dynamic languages" and mean "dynamically typed languages" as the term is defined here.

good read.

#    Comments [0] |

How To See Your Swallowed Exceptions In .NET (Visual Studio debugger)

Good to know..


See all the exceptions you are swallowing in .NET (Visual Studio debugger):

Turn on 'all exceptions' and watch the fireworks fly


One of the most glaring differences for me between Java and .NET is the difference between Checked/Unchecked Exceptions, so stuff like this has been helpful to me in figuring out how exception handling in C#/.NET really works.

#    Comments [0] |

4-Space Indents - 4Eva!

I personally use 4-space indents when I write code (in any language, period).


Oddly... in Python, where whitespace matters, there is no single common practice (which would fit in nicely with Python's TOOWTDI ideology).

Some observations of source code indentations in Python:

  • I mostly see 4-space indents in code I read (colleagues, libraries, 3rd party code)
  • Python's own source (the Python parts) use 4-space indents
  • Google uses 2-space indents in their Python
  • I hate tabs (if I randomly sucker-punch you, this is why)


Guido van Rossum (creator, lead developer, and BDFL of Python) writes

"If it uses two-space indents, it's corporate code; if it uses four-space indents, it's open source. (If it uses tabs, I didn't write it! :)"
#    Comments [0] |
 Thursday, February 22, 2007

Joe Gregario on MOM vs. RPC

RPC: Remote Procedure Call
MOM: Message Oriented Middleware

Both RPC and MOM are communication models for distributed systems.  Each has strengths and advantages. However, when you get into large heterogenous distributed systems, message passing is the way to achieve scalability.

I like this quote:

"In a large system you may be faced with either a multitude of clients or a menagerie of them; in either case you have to stop serializing objects and start exchanging documents."
- Joe Gregorio, 2007
#    Comments [0] |

Not Using ASP.NET Session State? Then Turn It Off

I am developing some small ASP.NET 2.0 web applications.  They are stateless and I am not doing anything with Session State.  However, I noticed that ASP.NET enables Session State by default (In-process mode is the default setting).  Therefore, if you have a truly stateless site or application, session state does nothing more than slow down performance.

In-process session state is still relatively fast, as the memory used to handle session is allocated by the same process on the local machine (no cross-process calls or data marshaling).  But this is needless overhead if you are not using your Session State.

So... to turn it off for the whole application, add the following line to your web.config, inside the system.web section:


<sessionState mode="Off" />


#    Comments [0] |
 Tuesday, February 20, 2007

Compiling Python Scripts to Windows Executables

I often write quick Python scripts that I need to run on other machines. It is sometimes easier to just drop a windows .EXE onto a machine (with a Python Interpreter compiled into it), rather than doing a full Python installation. To do this, I use py2exe

py2exe is a Python Distutils extension which converts Python scripts into executable Windows programs. This enables your Python scripts to be run on Windows platforms without a Python installation.

You can run py2exe directly from the command line, or you can script it. I wrote a small convenience script that I use for general compilation.

Let's call the compilation script: compile.py
Let's say we have a script we want to compile named: foo.py

You would then invoke it from the command line like this:

>python compile.py foo.py

This will create a 'dist' subdirectory containing the newly created executable along with some necessary DLL's.


Here is the code I use for my compile.py:


#!/usr/bin/env python
# Corey Goldberg

from distutils.core import setup
import py2exe
import sys

if len(sys.argv) == 2:
    entry_point = sys.argv[1]
    sys.argv.pop()
    sys.argv.append('py2exe')
    sys.argv.append('-q')
else:
    print 'usage: compile.py <python_script>\n'
    raw_input("press ENTER to exit...")
    sys.exit(1)

opts = {
    'py2exe': {
        'compressed': 1,
        'optimize': 2,
        'bundle_files': 1
    }
}

setup(console=[entry_point], options=opts, zipfile=None)

(note: you need to have Python and py2exe installed on a Windows box to run this)

#    Comments [0] |
 Thursday, February 15, 2007

Perl - File Slurping

A common idiom in Perl 5 is "slurping".  Slurping is the process of reading a file into an array, split by line breaks.  You can then iterate over the array and perform an operation on each line.  This is the basic input mechanism I use to process all sorts of data/text files.


The basic slurp goes like this...

Open a file in read mode and assign it a file handle:

open(FILE, 'foo.txt') or die $!;

Read (slurp) the file into an array of lines (splitting the file on newlines):

@file = <FILE>;


You can then process the array in a foreach loop and "Un-slurp" (De-slurp?) it back to the file system like this...

Now we have an array which we can iterate through and do whatever we want with each line:

foreach (@file) { # do something here }

Re-open the file in overwrite mode:

open(FILE, '>foo.txt') or die $!;

Print the contents of the array back to the file:

print FILE @file;


The following script shows some slurping in a action. This script will read a file named "foo.txt" and replace all intances of "foo" with "bar"

#!/usr/bin/perl replace('foo.txt', 'foo', 'bar'); sub replace { ($filename, $original, $substituted) = @_; open(FILE, $filename) or die $!; @file = ; foreach (@file) { s/$original/$substituted/g; } open(FILE, '>foo.txt') or die $!; print FILE @file; }
#    Comments [0] |
 Tuesday, February 13, 2007

Trampolining With Generators - Roll Your Own Scheduler?

Even the subject sounds confusing huh?

I was reading Neil Mix's: Threading in JavaScript 1.7 post and was really fascinated by the concept he discusses: trampolining

Basically, trampolining it is a way to achieve concurrency by using Generators to create a coroutine scheduler.

In JavaScript 1.7 (which Firefox 2 supports), you can already do concurrent programming with this technique.


Neil Mix:

"The way trampolining works is that a scheduler object (written in JavaScript) manages the execution of a series of generators, cobbling together a stack-like execution. Here’s how it works: The scheduler sets the starting generator as the base “frame” in the call stack. The scheduler then calls next() on the generator to obtain a yield value. If the yielded value is itself a generator, the scheduler pushes this new generator on the stack and calls next() on it, again obtaining a yield value. This continues until the top generator yields a non-generator value. This value could be a special directive to the scheduler (for example, a SUSPEND value that tells the scheduler to freeze execution of the “stack” of generators we’ve piled up). If not, the scheduler treats it as a return value. The scheduler then pops and closes the now complete generator and sends the return value back into the next generator in the stack."

pretty sick, huh?  ... definitely a twisted idea :)

The interesting takeaway is that this technique could be used to implement concurrency in any language that supports Generators.  It looks like Python has a similar capability.  This is described in detail in: PEP 342 - Coroutines via Enhanced Generators.

Generator-based state machines sound really interesting.  Hopefully I'll find some time to play with them [in python] as an alternate to threading.

#    Comments [0] |

Screen Scraping in Python

Mads Kristensen just posted an article: Screen scraping in C#, where he shows several ways to make HTTP requests in C# that can be used for screen scraping.

from Mads:

"Some say that screen scraping is a lost art because it is no longer an advanced discipline. That may be right, but there are different ways of doing it. Here are some different ways that all are perfectly acceptable, but can be used for various different purposes."


Not to be outdone... here are 2 examples of how to do the same thing in Python:

using httplib:

conn = httplib.HTTPConnection("www.python.org")
conn.request("GET", '/')
print conn.getresponse().read()


using urllib:

f = urllib.urlopen('http://www.python.org/')
print f.read()


#    Comments [2] |
 Friday, February 09, 2007

Python - use Psyco (x86 JIT-like compiler) for a speed boost

Psyco is a Python extension module which can speed up the execution of any Python code.

from the Psyco site:

"Think of Psyco as a kind of just-in-time (JIT) compiler, a little bit like what exists for other languages, that emit machine code on the fly instead of interpreting your Python program step by step. The difference with the traditional approach to JIT compilers is that Psyco writes several version of the same blocks (a block is a bit of a function), which are optimized by being specialized to some kinds of variables (a "kind" can mean a type, but it is more general). The result is that your unmodified Python programs run faster"


I have been working on some Python projects recently where Pysco has given me a a really substantial performance increase. The type of work I am doing mostly involves statistcal analysis of large numerical data sets.. array math.. percentiles.. time-series.. etc, etc.

To use it, all I do is copy Psyco to my system (to Python's Lib/site-packages), and add the following to the top of my python source file:

import psyco
psyco.full()


Thats all ...

... or even better; wrap it in a try/except so your program still runs on systems without Psyco installed.

try:
    import psyco
    psyco.full()
except:
    pass
#    Comments [0] |

Live Brain-Surgery With Python

Another example of improved productivity with dynamic languages...

Gojko Adzic on prototyping in Python:

"writing the prototype in Python allowed us to start a web server and open an interactive console to re-wire it and perform live brain-surgery while the server is running. I cannot imagine doing that in Java or C#. In a month, we wrote the functional equivalent of at least 4-5 months of C# code."

#    Comments [0] |
 Tuesday, February 06, 2007

Anders Hejlsberg on LINQ and Functional Programming

Anders Hejlsberg on LINQ and Functional Programming

This is a good video of an interview with Anders Hejlsberg on LINQ and Functional Programming.  He is the designer of C# and talks about some of the upcoming features in Orcas (the next Visual Studio with C# 3.0).

I think it is very interesting (and good) that Microsoft (and many other modern language designers) are adding functional programming features.  If functional programming, lambda expressions, list comprehensions, and set processing, are your bag.. watch this.

#    Comments [0] |

Improving Regular Expression Performance

Alex from Dojo just linked to a fascinating article about by Russ Cox about Regular Expressions:  Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

from the article:  
"This article reviews the good theory: regular expressions, finite automata, and a regular expression search algorithm invented by Ken Thompson in the mid-1960s. It also puts the theory into practice, describing a simple implementation of Thompson's algorithm. That implementation, less than 400 lines of C, is the one that went head to head with Perl above. It outperforms the more complex real-world implementations used by Perl, Python, PCRE, and others. The article concludes with a discussion of how theory might yet be converted into practice in the real-world implementations."
so.. there is a 40 year old technique that improves performance of regexes dramatically?

The following graph plots time required to check whether a?^na^n matches a^n:



wow... so awk and grep use the Thomson NFA implementation of regexes, while most programming languages don't.  

... and here I thought Perl was the regex king.

#    Comments [0] |
 Sunday, February 04, 2007

Perl - Building Web Clients

The following is a short tutorial on web programming in Perl I wrote several years ago.  This type of programming was my first foray into the guts of the web.  Writing tools at the protocol level forced me to gain a deep understanding of HTTP and Web Architecture, which has been extremely helpful to me since.


These examples show how to use Perl's 'LWP' (libwww-perl) modules to make requests to a web server. The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface to the web.


Using 'LWP' to do an HTTP GET Request:

This will request the main Google page and store the entire contents of the response in the the '$response' object.
#!/usr/bin/perl

use LWP;

$useragent = LWP::UserAgent->new;
$request = new HTTP::Request('GET',"http://www.example.com");
$response = $useragent->simple_request($request);

print $response->as_string();

(*use "useragent->request" instead of "useragent->simple_request" to follow server redirects)


Working With Cookies:

Here is the http header returned by the initial http request to Google:
(first part of 'print $response->as_string();' output in the previous example)
Date: Mon, 14 Apr 2003 18:38:28 GMT
Server: GWS/2.0
Content-Length: 2691
Content-Type: text/html
Content-Type: text/html; charset=ISO-8859-1
Client-Date: Mon, 14 Apr 2003 18:38:29 GMT
Client-Peer: 216.239.57.99:80
Client-Response-Num: 1
Connection: Close
Set-Cookie: PREF=ID=48fd767576ebd920:TM=1050345508:LM=1050345508:S=qLA8i5XyvLX37lG6;
expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Title: Google

Notice the "Set-Cookie:" line in the header. This is what tells your web browser that a cookie needs to be set and returned as part of the http header in subsequent http requests to this server. In this case the cookie doesnt do much, but for a site that requires a login, this is how the server knows who you are to maintain a session.

In Perl, cookies can be handled for you by using the HTTP::Cookies module.

You first need to construct the object to contain your cookies:
$cookie_jar = HTTP::Cookies->new;

After an http request is sent, you can then extract the cookie from the response header:
$cookie_jar->extract_cookies($response);

Once you have the cookie stored in your cookie_jar, it needs to be sent back to the server in the header of every subsequent http request. This is done by adding the following command after you format each request:
$cookie_jar->add_cookie_header($request);


Now for the whole thing in a script:


The following script will make a request to the main Google page and store the cookie it receives. It will then make a request to Google to change the default language (user preference) to Spanish. A new cookie will be returned that we will store and use it to make another request to the main Google page. Google will recognize the information stored in our cookie and return the page in Spanish.
#!/usr/bin/perl

use LWP;
use HTTP::Cookies;

# construct objects
$useragent = LWP::UserAgent->new;
$cookie_jar = HTTP::Cookies->new;

# send request for main Google page
$request = new HTTP::Request('GET',"http://www.google.com");
$response = $useragent->simple_request($request);

# extract cookie from response header
$cookie_jar->extract_cookies($response);

# set user preference on Google to Spanish language
$request = new HTTP::Request('GET',"http://www.google.com/setprefs?
               submit2=Save+Preferences+&hl=es<=all&safe=images&num=10
               &q=&prev=http%3A%2F%2Fwww.google.com%2F&ie=UTF-8&oe=UTF-8");
$cookie_jar->add_cookie_header($request);
$response = $useragent->simple_request($request);

# extract new cookie from response header
$cookie_jar->extract_cookies($response);

# send request for main Google page (will return Spanish Google page)    
$request = new HTTP::Request('GET',"http://www.google.com");
$cookie_jar->add_cookie_header($request);
$response = $useragent->simple_request($request);

print $response->as_string; # print response body to verify cookies work (some text now in spanish)

#    Comments [0] |
 Friday, January 26, 2007

Python - Sort A Nested Sequence With DSU

The DSU (Decorate, Sort, Undecorate) idiom originates from Lisp.  I first learned it in Perl, where it is called the Schwartzian Transform (coolest name ever?), named after longtime Perl hacker Randal L. Schwartz.

I find myself using this same DSU idiom in Python when I need to sort a nested sequence (single level sequence of sequences).

Lets say I have the following list of lists:

seq = [
    ['a', 1, 5],
    ['b', 3, 4],
    ['c', 2, 2],
    ['d', 4, 3],
    ['e', 5, 1],
]

... and I want the outer list to contain the inner lists sorted by their last column (in this case, index 2).

How would I do this?

Here is an implementations of the DSU (Decorate, Sort, Undecorate) idiom in a Python function:

def dsu_sort(idx, seq):
    for i, e in enumerate(seq):
        seq[i] = (e[idx], e)
    seq.sort()
    for i, e in enumerate(seq):
        seq[i] = e[1]
    return seq
   
(Keep in mind that lists in Python are mutable and this will transform your original sequence.)


So applying this to the sequence above like this:

dsu_sort(2, seq)

gives us:

[['e', 5, 1], ['c', 2, 2], ['d', 4, 3], ['b', 3, 4], ['a', 1, 5]]

which is the original sequence, transformed so it is sorted by the last column (index 2).



Randal's original implementation in Perl from 1994:
#!/usr/bin/perl
 print
     map { $_->[0] }
     sort { $a->[1] cmp $b->[1] }
     map { [$_, /(\S+)$/] }
     <>;

#    Comments [3] |
 Monday, January 22, 2007

C# Simple Multithreading Example

Here is a simple example of multithreading in C#

using System;
using System.Threading;

public class Test
{
    static void Main()
    {
        ThreadStart job = new ThreadStart(ThreadJob);
        Thread thread = new Thread(job);
        thread.Start();

        for (int i=0; i < 5; i++)
        {
            Console.WriteLine ("Main thread: {0}", i);
            Thread.Sleep(1000);
        }
    }

    static void ThreadJob()
    {
        for (int i=0; i < 10; i++)
        {
            Console.WriteLine ("Spawned thread: {0}", i);
            Thread.Sleep(500);
        }
    }
}

#    Comments [0] |

Calling A Command Line Program From C#

I often need to call external command line programs from within my C# code.  To do this, I use a Process object.  Here is some example code I use for calling a Python program:

private void Execute()
{
    Process proc = new Process();
    
    proc.StartInfo.WorkingDirectory = @"C:\scripts";
    proc.StartInfo.FileName = "python.exe";
    proc.StartInfo.Arguments = "foo.py";
    proc.StartInfo.UseShellExecute = false;
    proc.StartInfo.RedirectStandardOutput = false;
    proc.StartInfo.RedirectStandardError = true;
    proc.Start();
    proc.WaitForExit();
    proc.Close()
}


#    Comments [0] |
 Tuesday, January 16, 2007

Python - Merge a Sequence of Lists Into a Single List

the function:

def merge(seq):
    merged = []
    for s in seq:
        for x in s:
            merged.append(x)
    return merged


sample usage:

foo = [['a', 'b'],['c'],['d', 'e', 'f']]
print merge(foo)

>>>['a', 'b', 'c', 'd', 'e', 'f']

Update:
Here is another implementation that uses a Python dictionary. This version merges the lists and only keeps unique entries.

def merge(seq):
d = {}
for s in seq:
for x in s:
d[x] = 1
return d.keys()
#    Comments [0] |
 Wednesday, January 10, 2007

VBScript - Creating a Microsoft Web Archive (*.mht) File Programmatically

Here is a little VBScript for generating a Microsoft Web Archive (*.mht) file.  Web archives are a convenient way to pack a bunch of web files (HTML/CSS/JavaScript) into a single file that is viewable in your browser.  The downside is MHT files are only viewable in MS Internet Explorer (lame).

Normally you would create an MHT by using the "Save As..." option in IE.  This script allows you to create one programmatically.

Sample Usage:

for a remote html file:

>cscript mht_converter.vbs http://www.example.com/temp/foo.html foo.mht


for a local html file:

>cscript mht_converter.vbs file:/temp/foo.html foo.mht



... And now the code:




'mht_converter.vbs

Const adSaveCreateNotExist = 1
Const adSaveCreateOverWrite = 2
Const adTypeBinary = 1
Const adTypeText = 2

Set args = WScript.Arguments

if args.Count = 0 then
WScript.Echo "Usage: [CScript | WScript] mht_converter.vbs <html file> <mht filename>"
WScript.Quit 1
end if

Set objMessage = CreateObject("CDO.Message")
objMessage.CreateMHTMLBody args.Item(0)
SaveToFile objMessage, args.Item(1)


Sub SaveToFile(Msg, Fn)
Dim Strm, Dsk
Set Strm = CreateObject("ADODB.Stream")
Strm.Type = adTypeText
Strm.Charset = "US-ASCII"
Strm.Open
Set Dsk = Msg.DataSource
Dsk.SaveToObject Strm, "_Stream"
Strm.SaveToFile Fn, adSaveCreateOverWrite
End Sub




Caveat:  I am not a VB programmer... don't pretend to be... and never wanna be.  This was just something I needed to do and this was the only way I could quickly figure out how to do it.

#    Comments [2] |
 Tuesday, January 09, 2007

Python - Formatted Dates and Times

I am not sure why, but every time I need to use some formatted dates or times in Python, I end up spending about 20 minutes going through the docs and reading up on the datetime module; which leads to more confusion.

So for my own clarity, here is how we do it using only the time module:

>>> import time
>>> print time.strftime("%m/%d/%y %H:%M:%S", time.localtime())
01/09/07 12:17:25

All of the formatting options for strftime() can be found here: http://docs.python.org/lib/module-time.html

#    Comments [0] |
 Wednesday, January 03, 2007

Dojo JavaScript Toolkit with ASP.NET

Dojo is a free/open source JavaScript toolkit.  I wanted to add some its eye candy to one of my ASP.NET 2.0 applications, so I integrated the Fisheye menu (a menu that balloons out, similar to the launcher on OS X).






Here is how I did it:


First I downloaded Dojo and created a 'dojo' directory under my main project directory.  I dropped dojo.js and the entire Dojo 'src' directory here.

Then in my C# codebehind (.aspsx.cs), I add this to the Page_Load event:

protected void Page_Load(object sender, EventArgs e)
{
    HtmlGenericControl Include = new HtmlGenericControl("script");
    Include.Attributes.Add("type", "text/javascript");
    Include.Attributes.Add("src", "dojo/dojo.js");
    Page.Header.Controls.Add(Include);

    HtmlGenericControl Include2 = new HtmlGenericControl("script");
    Include2.Attributes.Add("type", "text/javascript");
    Include2.InnerHtml = "dojo.require('dojo.widget.FisheyeList');";
    Page.Header.Controls.Add(Include2);
}

(I do this in the codebehind because I am using a Master Page and I need to access the HTML header from an individual content page.)


Then inside my ASP.NET page (.aspx), I added this div:

<div class="fisheyelist">
    <div dojoType="FisheyeList"
        itemWidth="80" itemHeight="80"
        itemMaxWidth="200" itemMaxHeight="200"
        orientation="horizontal"
        effectUnits="2"
        itemPadding="10"
        attachEdge="center"
        labelEdge="bottom"
        conservativeTrigger="false"
    >
        <div dojoType="FisheyeListItem"
             onclick="window.location = 'item1.aspx';"
             caption="Item 1"
             iconsrc="img/item1.png">
        </div>
        <div dojoType="FisheyeListItem"
             onclick="window.location = 'item2.aspx';"
             caption="Item 2"
             iconsrc="img/item2.png">
        </div>
        <div dojoType="FisheyeListItem"
             onclick="window.location = 'item3.aspx';"
             caption="Item 3"
             iconsrc="img/item3.png">
        </div>
    </div>
</div>



.. and it works.

#    Comments [0] |

Python - Find And Replace A String In Every File In A Directory

The Python Cookbook has a recipe to find and replace a string in every file in a directory.

I needed to do something like this today, so I cleaned up the script a little to make it [hopefully] a little more pythonic:


#!/usr/bin/env python
# replace a string in multiple files

import fileinput
import glob
import sys
import os


if len(sys.argv) < 2:
    print 'usage: %s search_text replace_text directory' \
        % os.path.basename(sys.argv[0])
    sys.exit(0)


stext = sys.argv[1]
rtext = sys.argv[2]
if len(sys.argv) == 4:
    path = os.path.join(sys.argv[3], '*')
else:
    path = '*'


print 'finding: %s and replacing with: %s' % (stext, rtext)


files = glob.glob(path)
for line in fileinput.input(files, inplace=1):
    if stext in line:
        line = line.replace(stext, rtext)
    sys.stdout.write(line)


#    Comments [0] |
 Friday, November 24, 2006

Python, IDEs, and Drones

Python is a very popular programming language with adoption and advocacy from many corporations, and large factions of open source programmers using it extensively.  However, in the world of "corporate drone programming", it is still pretty niche. 

Have a look at this indication of popularity among programming languages:
TIOBE Programming Community Index

One thing I like about Python is the simpilcity it strives for.  I find myself writing all my code in SciTE, a simple text editor; rather than a full blown IDE.

I always looked at this is a strong point for dynamic languages.

Over in the cult of corporate drone programmers, static languages (C++, C#, Java) are the norm, and life is spent inside an IDE.


from Robert on comp.lang.python:

"Flat Web/DB programming is one major field where programmer masses are born.  The other big one is RAD-GUI/DB programming. This field is probably still wide open. Best tooled Borland RAD systems are going down meanwhile because of the stiff compiler language. Programmers look around for the next language & toolset. Python is the language - but with Python there is again a similar confusion around IDE's and GUI-libs. There is no really good IDE (but fat ones). And the major gui libs there are not Python, but are fat sickening layers upon layers upon other OO-langs."

Not that I necesarilly want Python to become the next default language for drones, but it makes me think about further adaption and mainstreamability of Python and other dynamic languages (which typically aren't as well suited to the features of many IDEs)

#    Comments [0] |
 Thursday, November 23, 2006

SOAP and REST - Conceptually

After reading thousands of articles about SOAP vs. REST, I was more confused about everything than convinced of anything.

Finally, this quote  from Stefan Tilkov made the conceptual difference between SOA(P) and REST very clear to me:
"In REST, you have lots and lots of resources all supporting the same interface; in SOA(P) (at least the wide-spread paradigm), you have few endpoints all supporting different interfaces."

#    Comments [0] |
 Friday, November 17, 2006

Python - The New Choice For Computer Science Academia?

I have seen a few articles in the past couple days talking about how MIT is revamping its introductory computer science course from using Scheme/Lisp to using Python.  Apparantly, other CS programs are using Python as well.

As an undergrad in 1993, I took CS classes in a program that was somewhat modeled after the MIT curriculum.  We used the first edition of the [in]famous wizard book.  Head first into the weird ways of functional programming was a bit of a shock for me and Scheme nearly scarred me for life.  I think the move to using Python is certainly a good one.

I just took a look around the net and was surprised by how many people are pushing for Python as an introductory language that is well suited to be taught in an academic setting.


Some links to related articles:

Teaching with Python
Using Python in a High School Computer Science Program
EDU-SIG: Python in Education
Teaching Introductory Computer Science with Python

#    Comments [0] |
 Tuesday, November 14, 2006

MQSeries and .NET - Interacting With Message Queues from C#

Below is a C# Class that I use for interacting with MQSeries (reading/writing from/to queues). 

It contains 2 methods:
PutMessageOnQueue
GetMessageOffQueue

To use it, you must have IBM WebSphere MQ installed, and you must add an assembly reference to amqmdnet.dll (the .NET bindings that come with WebSphere MQ).

I am using:
.NET 2.0
VS 2005
WebSphere MQ 5.3.0




public class MQSeries
{
    string queueName;
    string queueManagerName;
   
    MQQueue queue;
    MQMessage queueMessage;
    MQQueueManager queueManager;
   

    public MQSeries()
    {
        queueName = "TESTQ";
        queueManagerName = "TESTQM";
        queueManager = new MQQueueManager(queueManagerName);
    }  


    public void PutMessageOnQueue(string message)
    {
        try
        {
            queue = queueManager.AccessQueue(queueName,
                    MQC.MQOO_OUTPUT + MQC.MQOO_FAIL_IF_QUIESCING);
            queueMessage = new MQMessage();
            queueMessage.WriteString(message);
            queueMessage.Format = MQC.MQFMT_STRING;

            queue.Put(queueMessage);
        }
        catch (MQException mqexp)
        {
            Console.WriteLine("MQSeries Exception: " + mqexp.Message);
        }
    }


    public string GetMessageOffQueue()
    {
        string message = "";
       
        queue = queueManager.AccessQueue(queueName,
                MQC.MQOO_INPUT_AS_Q_DEF + MQC.MQOO_FAIL_IF_QUIESCING);
        queueMessage = new MQMessage();
        queueMessage.Format = MQC.MQFMT_STRING;

        try
        {
            queue.Get(queueMessage);
            message = queueMessage.ReadString(queueMessage.MessageLength);
        }
        catch (MQException MQExp)
        {
            Console.WriteLine("MQQueue::Get ended with " + MQExp.Message);
        }

        return message;
    }

}
#    Comments [0] |
 Tuesday, November 07, 2006

Python - Removing Duplicates From A Sequence

Sequences (lists and tuples) are common data structures used in Python programming.

Here is a simple function that will remove duplicates from a sequence and return a sorted sequence of the unique items:


def remove_dups(seq):
    x = {}
    for y in seq:
        x[y] = 1
    u = x.keys()
    u.sort()
    return u


(Caveat:  It requires that all the sequence elements be hashable, and support equality comparison)


And another implementation (not sure which is better):

def remove_dups(seq):
    u = [x for x in seq if x not in locals()['_[1]']]
    u.sort()
    return u



Example using them from the Python Interpreter:

>>> my_seq = [1, 1, 3, 1, 2, 2, 7.75, 'foo', 7.75, 'foo']
>>> print remove_dups(my_seq)
[1, 2, 3, 7.75, 'foo']




Tim Peters has an excellent recipe in the Python Cookbook that dives into this much further:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
#    Comments [0] |
 Wednesday, November 01, 2006

Google Code Search - Indexing Source Code Inside Zip Files

I was playing with Google Code Search (search engine for public source code) and I noticed it had indexed some code I released a while back.

I knew the google bot was indexing public CVS and SVN repositories...
But the interesting thing is that I never checked this code into any public repository.  All I did was place a zip file on my webserver and link to it from my homepage.

I searched around a little and found this explaining what it does:

"The two ways that source code lives on the Internet is in archives, things like Zip files, gzip, etc. And then in software-control repositories like SourceForge.net, Google's code hosting, and other places," Google product manager Tom Stocky told internetnews.com.

"We'll be crawling all of that."

Google isn't just going to index the Zip archive files. They're actually going to open up the files and index all the individual files within in.

This is pretty cool.  By doing a Google Code Search you can see the full contents of the zipped source files, as indexed by Google.

#    Comments [0] |