goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Wednesday, November 14, 2007

Regex Capture Groups In Python and Perl

I am a Python programmer and ex-Perl hacker.

Regular Expressions are possibly the quintessential feature of Perl and are directly part of the language syntax.

Rather than being part of the syntax, Python's Regular expressions are available via the 're' module. For some reason, I had some trouble figuring out matching groups when I first started using Python's Regular Expressions.

He are examples of extracting capture groups in both Perl and Python.

Lets say we have a string containing a date: '11/14/2007', and we want to capture only the year from this string.

A regex to match this format might be something like this:

[0-9]{2}/[0-9]{2}/[0-9]{4}

We can then put parenthesis around the piece we want to extract (the 4-digit year) to denote a capture group.

So now our regex would look like this:

[0-9]{2}/[0-9]{2}/([0-9]{4})


Perl Example:

$foo = '11/14/2007';

if ($foo =~ m^[0-9]{2}/[0-9]{2}/([0-9]{4})^) {
    print $1;
}

output:

2007

* Note the string we captured ended up in the special variable $1


Python Example:

import re

foo = '11/14/2007'

match = re.search('[0-9]{2}/[0-9]{2}/([0-9]{4})', foo)
if match:
    print match.group(1)

output:

2007

* Note the string we captured ended up in a match object, which can be accessed with the 'group()' method.

#    Comments [6] |
 Wednesday, August 22, 2007

My Text Editor - What SciTE Says About Me

In a recent post: "What does your favorite text editor say about you", the author lists popular text editors and what they say about their users.  Here is the Editor or IDE I use with various programming languages:

Python:  SciTE
Perl:  SciTE
C#:  Visual Studio
Java:  Eclipse

I do all of my writing and a large portion of my programming in a plain old text editor.  Most of the code I write is in Python.  I love using a lightweight text editor instead of a big bloated IDE.  So... I pretty much live inside a text editor.

... and I love SciTE.  It rocks equally on Windows and GNU/Linux.  So what does this say about me?


SciTE:
"Your text editor is lightweight, full featured, extensible and cross platform. In addition, it can work as a stand-alone executable which requires no installation. Fits perfectly with all your other portable tools on your USB thumb drive. You also love how SciTE let’s you write Lua scripts to extend it’s functionality. You take your text editor choice very seriously. You like tinkering, and minimalistic, portable applications."

#    Comments [3] |
 Thursday, May 17, 2007

RESTful Web Services - 10 Years of 'Programmable Web' Books

I just got the RESTful Web Services book (Leonard Richardson & Sam Ruby, O'Reilly, 2007) in the mail today.  I've only read the beginning, but so far it is great.  In fact, it brings me back to when I first started working with the "programmable web".  I got into the programmable web back when the web was only a few years old.  I spent years doing performance/scalability testing and tuning for large Web 1.0 applications and bizarre custom Web API's (think huge financial services rushing to get online).  Building tools to run realistic workloads through a system involves writing custom clients to simulate real user/browser interaction.  This is pretty ugly stuff when you are dealing with an application that was designed with only humans in mind (AKA all).  It involves lots of HTTP protocol level work.. screen scraping.. protocol sniffing and analyzing.. requests.. header mangling.. cookie handling.. redirects.. authentication.. session information parsing.. etc, etc.

Application simulation is pretty messy work.  There is no simple API to hide behind; you had to figure out what the API was for yourself.  See.. *every* web application has an API.  Though it might have been designed by accident.  This allowed me to see first hand how developers and frameworks butchered the use of the "Web" as a platform.  Staring at naked HTTP let me see every little bit of the hairball underneath.  Alas, any standardization around web services (or the concept to be officially named) was far off.

A friend (bearded Perl hacker) let me borrow a book to show me how Perl can do this cool web stuff:  Web Client Programming with Perl (Clinton Wong, O'Reilly, 1997).  This book helped me build my first web clients to do application simulation and testing.  There wasn't a ton of documentation at the time to do this sort of thing, so i relied heavily on this book.

So now.. 10 years later..  the Web has changed..  it has morphed into *the* distributed platform..  it is becoming organized.

As I flip through Restful Web Services, it all just looks right..  REST looks right..   It is simple..  it is HTTP..  it is all the guts I already know.  It almost feels like a sequel to my old favorite:

I have traded Perl for Python as my preferred scripting language the past few years, but I am still building simulators, web clients, and virtual users. I am excited to work on some new stuff in this area.

#    Comments [0] |
 Wednesday, March 21, 2007

Google Summer of Code 2007 - No Perl for You

The Perl Foundation won't be involved in Google Summer of Code 2007.

Bill Odom:

"The short version: We submitted an application to be a mentoring organization, but we weren't accepted."

However, even without the Perl community represented, the list of mentoring organizations and projects is really good!

#    Comments [0] |
 Saturday, March 17, 2007

Python3000 vs. Perl6 ... Wanna Bet?

Perl6...
Python3000...

Both are redesigns of very popular dynamic/scripting languages.  Both have very strong, though very different, communities supporting them.

Out of the gate, Perl's plans were much more ambitious, including a new generic virtual machine.  Python's plans were more pragmatic; more of a language cleanup than a drastic redesign.

Perl 6 was officially announced nearly 7 years ago and I don't see a stable production release coming *any* time soon.  On the other hand, the idea of Python3000 was sorta tossed around for a while and swung into gear 2 years ago.

Guido (Python's BDFL) has been spearheading the effort, whereas Perl's leadership structure is much more anarchic (Where is Larry Wall these days?).  Guido has been very transparent and kept the community aware of his worries.

Some people saw this as a slippery slope...

Chromatic:

"Language redesign is difficult, isn’t it?  Once you start challenging base assumptions, you find that a lot of your previous conclusions are shaky, and good luck reigning in blue-sky ideas!

See you in 2007… or 2008… or 2009.

Best wishes,
a Perl 6 hacker"

I disagree..

I'd bet anyone money that I will be hacking on a stable release of Python3000 long before I'm using a stable version of Perl6... any takers?


(disclosure: I have written lots of code in both Perl and Python and am a fan of both)

#    Comments [6] |
 Thursday, February 15, 2007

Perl - File Slurping

A common idiom in Perl 5 is "slurping".  Slurping is the process of reading a file into an array, split by line breaks.  You can then iterate over the array and perform an operation on each line.  This is the basic input mechanism I use to process all sorts of data/text files.


The basic slurp goes like this...

Open a file in read mode and assign it a file handle:

open(FILE, 'foo.txt') or die $!;

Read (slurp) the file into an array of lines (splitting the file on newlines):

@file = <FILE>;


You can then process the array in a foreach loop and "Un-slurp" (De-slurp?) it back to the file system like this...

Now we have an array which we can iterate through and do whatever we want with each line:

foreach (@file) { # do something here }

Re-open the file in overwrite mode:

open(FILE, '>foo.txt') or die $!;

Print the contents of the array back to the file:

print FILE @file;


The following script shows some slurping in a action. This script will read a file named "foo.txt" and replace all intances of "foo" with "bar"

#!/usr/bin/perl replace('foo.txt', 'foo', 'bar'); sub replace { ($filename, $original, $substituted) = @_; open(FILE, $filename) or die $!; @file = ; foreach (@file) { s/$original/$substituted/g; } open(FILE, '>foo.txt') or die $!; print FILE @file; }
#    Comments [0] |
 Sunday, February 11, 2007

Sqeezebox - Hackable Device For Streaming My Tunes

(* I am not affiliated with Slim Devices.. this is just a fanboy post.)

I wanted to hook up something that would integrate my home PC (jacked full of glorious DRM-free MP3's) and my home audio system. I didn't want to go the full HTPC route, I am more of an audio guy and my immediate need is for an audio-only solution.

After browsing the various Media Servers, Sound Cards, External Components, and other music playing gizmos; I finally figured out the type of unit I was looking for and what it should do...

Here are my requirements:

  • Must be able to play the MP3's stored on my computer
  • Must have a cross platform client application (I use both Linux and Windows at home)
  • Must have high quality analog and digital output that can connect to my receiver/amp
  • Must have a remote with basic controls (volume, tuning, select, etc)
  • Must be able to control it from my computer (change songs, playlists, etc)
  • Must be able to stream Internet Radio (of some sort)
  • No wires connected to my PC

Obviously some of these requirements became apparent when I read about the Squeezebox from Slim Devices.  It seems to do everything I need (and more), and isn't outrageously expensive ($299 retail).



So I ended up ordering one of these bad boys to play around with.  (of course I got the all black version)

OK.. but now the real reason I chose the Squeezebox... It is Open Source and has a developer community!

Well... it is sort of Open Source.  Its SlimServer software (the audio server software) is GPL licensed, so the Source Code is available to modify, hack, and contribute to.  But.. the device's firmware is proprietary and closed (boo).

The real kicker came when I realized that the SlimServer is written in Perl It is web based, and uses templated HTML/CSS.  Hmm.. I've written a Perl program or 2 [thousand] in my day... this could get interesting.  I already have SlimServer running from Source and I am making little tweaks to the interface to customize it for myself.  Hopefully someday I'll do something worth contributing back...  I love that I have the option.

... More to come once I actually get the Sqeezebox and hook it up..  I just ordered it last night.

#    Comments [0] |
 Sunday, February 04, 2007

Perl - Building Web Clients

The following is a short tutorial on web programming in Perl I wrote several years ago.  This type of programming was my first foray into the guts of the web.  Writing tools at the protocol level forced me to gain a deep understanding of HTTP and Web Architecture, which has been extremely helpful to me since.


These examples show how to use Perl's 'LWP' (libwww-perl) modules to make requests to a web server. The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface to the web.


Using 'LWP' to do an HTTP GET Request:

This will request the main Google page and store the entire contents of the response in the the '$response' object.
#!/usr/bin/perl

use LWP;

$useragent = LWP::UserAgent->new;
$request = new HTTP::Request('GET',"http://www.example.com");
$response = $useragent->simple_request($request);

print $response->as_string();

(*use "useragent->request" instead of "useragent->simple_request" to follow server redirects)


Working With Cookies:

Here is the http header returned by the initial http request to Google:
(first part of 'print $response->as_string();' output in the previous example)
Date: Mon, 14 Apr 2003 18:38:28 GMT
Server: GWS/2.0
Content-Length: 2691
Content-Type: text/html
Content-Type: text/html; charset=ISO-8859-1
Client-Date: Mon, 14 Apr 2003 18:38:29 GMT
Client-Peer: 216.239.57.99:80
Client-Response-Num: 1
Connection: Close
Set-Cookie: PREF=ID=48fd767576ebd920:TM=1050345508:LM=1050345508:S=qLA8i5XyvLX37lG6;
expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Title: Google

Notice the "Set-Cookie:" line in the header. This is what tells your web browser that a cookie needs to be set and returned as part of the http header in subsequent http requests to this server. In this case the cookie doesnt do much, but for a site that requires a login, this is how the server knows who you are to maintain a session.

In Perl, cookies can be handled for you by using the HTTP::Cookies module.

You first need to construct the object to contain your cookies:
$cookie_jar = HTTP::Cookies->new;

After an http request is sent, you can then extract the cookie from the response header:
$cookie_jar->extract_cookies($response);

Once you have the cookie stored in your cookie_jar, it needs to be sent back to the server in the header of every subsequent http request. This is done by adding the following command after you format each request:
$cookie_jar->add_cookie_header($request);


Now for the whole thing in a script:


The following script will make a request to the main Google page and store the cookie it receives. It will then make a request to Google to change the default language (user preference) to Spanish. A new cookie will be returned that we will store and use it to make another request to the main Google page. Google will recognize the information stored in our cookie and return the page in Spanish.
#!/usr/bin/perl

use LWP;
use HTTP::Cookies;

# construct objects
$useragent = LWP::UserAgent->new;
$cookie_jar = HTTP::Cookies->new;

# send request for main Google page
$request = new HTTP::Request('GET',"http://www.google.com");
$response = $useragent->simple_request($request);

# extract cookie from response header
$cookie_jar->extract_cookies($response);

# set user preference on Google to Spanish language
$request = new HTTP::Request('GET',"http://www.google.com/setprefs?
               submit2=Save+Preferences+&hl=es<=all&safe=images&num=10
               &q=&prev=http%3A%2F%2Fwww.google.com%2F&ie=UTF-8&oe=UTF-8");
$cookie_jar->add_cookie_header($request);
$response = $useragent->simple_request($request);

# extract new cookie from response header
$cookie_jar->extract_cookies($response);

# send request for main Google page (will return Spanish Google page)    
$request = new HTTP::Request('GET',"http://www.google.com");
$cookie_jar->add_cookie_header($request);
$response = $useragent->simple_request($request);

print $response->as_string; # print response body to verify cookies work (some text now in spanish)

#    Comments [0] |
 Friday, January 26, 2007

Python - Sort A Nested Sequence With DSU

The DSU (Decorate, Sort, Undecorate) idiom originates from Lisp.  I first learned it in Perl, where it is called the Schwartzian Transform (coolest name ever?), named after longtime Perl hacker Randal L. Schwartz.

I find myself using this same DSU idiom in Python when I need to sort a nested sequence (single level sequence of sequences).

Lets say I have the following list of lists:

seq = [
    ['a', 1, 5],
    ['b', 3, 4],
    ['c', 2, 2],
    ['d', 4, 3],
    ['e', 5, 1],
]

... and I want the outer list to contain the inner lists sorted by their last column (in this case, index 2).

How would I do this?

Here is an implementations of the DSU (Decorate, Sort, Undecorate) idiom in a Python function:

def dsu_sort(idx, seq):
    for i, e in enumerate(seq):
        seq[i] = (e[idx], e)
    seq.sort()
    for i, e in enumerate(seq):
        seq[i] = e[1]
    return seq
   
(Keep in mind that lists in Python are mutable and this will transform your original sequence.)


So applying this to the sequence above like this:

dsu_sort(2, seq)

gives us:

[['e', 5, 1], ['c', 2, 2], ['d', 4, 3], ['b', 3, 4], ['a', 1, 5]]

which is the original sequence, transformed so it is sorted by the last column (index 2).



Randal's original implementation in Perl from 1994:
#!/usr/bin/perl
 print
     map { $_->[0] }
     sort { $a->[1] cmp $b->[1] }
     map { [$_, /(\S+)$/] }
     <>;

#    Comments [3] |
 Sunday, January 21, 2007

Perl Bottles

Guess what this is?

    ''=~(        '(?{'        .('`'        |'%')        .('['        ^'-')
    .('`'        |'!')        .('`'        |',')        .'"'.        '\\$'
    .'=='        .('['        ^'+')        .('`'        |'/')        .('['
    ^'+')        .'||'        .(';'        &'=')        .(';'        &'=')
    .';-'        .'-'.        '\\$'        .'=;'        .('['        ^'(')
    .('['        ^'.')        .('`'        |'"')        .('!'        ^'+')
   .'_\\{'      .'(\\$'      .';=('.      '\\$=|'      ."\|".(      '`'^'.'
  ).(('`')|    '/').').'    .'\\"'.+(    '{'^'[').    ('`'|'"')    .('`'|'/'
 ).('['^'/')  .('['^'/').  ('`'|',').(  '`'|('%')).  '\\".\\"'.(  '['^('(')).
 '\\"'.('['^  '#').'!!--'  .'\\$=.\\"'  .('{'^'[').  ('`'|'/').(  '`'|"\&").(
 '{'^"\[").(  '`'|"\"").(  '`'|"\%").(  '`'|"\%").(  '['^(')')).  '\\").\\"'.
 ('{'^'[').(  '`'|"\/").(  '`'|"\.").(  '{'^"\[").(  '['^"\/").(  '`'|"\(").(
 '`'|"\%").(  '{'^"\[").(  '['^"\,").(  '`'|"\!").(  '`'|"\,").(  '`'|(',')).
 '\\"\\}'.+(  '['^"\+").(  '['^"\)").(  '`'|"\)").(  '`'|"\.").(  '['^('/')).
 '+_,\\",'.(  '{'^('[')).  ('\\$;!').(  '!'^"\+").(  '{'^"\/").(  '`'|"\!").(
 '`'|"\+").(  '`'|"\%").(  '{'^"\[").(  '`'|"\/").(  '`'|"\.").(  '`'|"\%").(
 '{'^"\[").(  '`'|"\$").(  '`'|"\/").(  '['^"\,").(  '`'|('.')).  ','.(('{')^
 '[').("\["^  '+').("\`"|  '!').("\["^  '(').("\["^  '(').("\{"^  '[').("\`"|
 ')').("\["^  '/').("\{"^  '[').("\`"|  '!').("\["^  ')').("\`"|  '/').("\["^
 '.').("\`"|  '.').("\`"|  '$')."\,".(  '!'^('+')).  '\\",_,\\"'  .'!'.("\!"^
 '+').("\!"^  '+').'\\"'.  ('['^',').(  '`'|"\(").(  '`'|"\)").(  '`'|"\,").(
 '`'|('%')).  '++\\$="})'  );$:=('.')^  '~';$~='@'|  '(';$^=')'^  '[';$/='`';


It is Perl 5 source code.  When executed, it prints the "99 Bottles of Beer" song.  Like this:

99 bottles of beer on the wall, 99 bottles of beer!
Take one down, pass it around,
98 bottles of beer on the wall!

98 bottles of beer on the wall, 98 bottles of beer!
Take one down, pass it around,
97 bottles of beer on the wall!

97 bottles of beer on the wall, 97 bottles of beer!
Take one down, pass it around,
96 bottles of beer on the wall!

etc...


Pretty insane.
Who said Perl can be hard to read?

(Lots of implementations of the song generator in various languages are available; but none as cool as this one.)

#    Comments [0] |
 Sunday, November 05, 2006

Perl 6? The Long Wait

I've been waiting for Perl 6 for quite a few years now...

I initially started hacking Perl in 1998 in my days as a software tester.  Perl is a programming language that tends to be popular with testers... usually because of its powerful text processing features (built-in regex's, dynamic/weak typing, simple/powerful data structures, etc).  It is perfect for munging large data sets and slinging text into whatever test configurations you need.  I have written plenty of useful software in Perl.

But.. the Perl community has sort of stagnated and other languages are taking over what was once its niche  (By stagnated I mean in terms of getting a finished version out, certainly not in terms of work being done.. which there is lots of).  We have been waiting on Perl 6 for many years now and a working implementation is yet to be generally released.  Every year or so, we get a new term or acronym to chew on (Parrot, PONIE, PUGS), but day to day I write less and less Perl as other languages seem to be moving faster.  I know there are lots of very bright and talented people working on Perl 6... and I appreciate that.  But I get the feeling that it is concentrated in a few individuals.  I wonder why Perl 6 development hasn't scaled like some other languages have?  Other dynamic language communitues (i.e. Python/Ruby) seem to be constantly embraced and pushed forward... and with that comes a healthy community with diverse and active contributors.


#    Comments [2] |