goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Wednesday, November 14, 2007

Regex Capture Groups In Python and Perl

I am a Python programmer and ex-Perl hacker.

Regular Expressions are possibly the quintessential feature of Perl and are directly part of the language syntax.

Rather than being part of the syntax, Python's Regular expressions are available via the 're' module. For some reason, I had some trouble figuring out matching groups when I first started using Python's Regular Expressions.

He are examples of extracting capture groups in both Perl and Python.

Lets say we have a string containing a date: '11/14/2007', and we want to capture only the year from this string.

A regex to match this format might be something like this:

[0-9]{2}/[0-9]{2}/[0-9]{4}

We can then put parenthesis around the piece we want to extract (the 4-digit year) to denote a capture group.

So now our regex would look like this:

[0-9]{2}/[0-9]{2}/([0-9]{4})


Perl Example:

$foo = '11/14/2007';

if ($foo =~ m^[0-9]{2}/[0-9]{2}/([0-9]{4})^) {
    print $1;
}

output:

2007

* Note the string we captured ended up in the special variable $1


Python Example:

import re

foo = '11/14/2007'

match = re.search('[0-9]{2}/[0-9]{2}/([0-9]{4})', foo)
if match:
    print match.group(1)

output:

2007

* Note the string we captured ended up in a match object, which can be accessed with the 'group()' method.

#    Comments [6] |
Wednesday, November 14, 2007 6:23:09 PM (Eastern Standard Time, UTC-05:00)
As I keep reading these things from you on PYTHON, it makes me thing I might have to start looking into it more and loosen the ties I have to PERL.....but not just yet ;)
Thursday, November 15, 2007 9:06:59 AM (Eastern Standard Time, UTC-05:00)
What is PERL? No such language. ; )
Thursday, November 15, 2007 9:29:30 AM (Eastern Standard Time, UTC-05:00)
After programming in Python for a number of years, I've just now started using regular expression and feel like kicking myself for neglecting them so long. I cringe looking back at my old, string-based web form validation code.
Jim Storch
Saturday, November 17, 2007 6:39:33 PM (Eastern Standard Time, UTC-05:00)
I prefer using \d

\d{2}/\d{2}/(\d{4}) is much easier to read than

[0-9]{2}/[0-9]{2}/([0-9]{4})

A slightly less readable variant:

(?:\d{2}/){2}(\d{4})
K
Sunday, November 18, 2007 12:19:34 AM (Eastern Standard Time, UTC-05:00)
I know the above is about regex, but in Python a better solution to this particular problem might be:

import time
foo = "11/14/2007"
fmt = "%m/%d/%Y"
print time.strptime(foo, fmt).tm_year

Dealing with dates using strings and regex can be a bit hairy. Python's time, datetime, and especially dateutil modules make it dead simple.

(See e.g. datetime.timedelta and dateutil.relativedelta for stuff you'd never want to deal with using pure strings!)

Andrew
Monday, November 19, 2007 10:29:41 AM (Eastern Standard Time, UTC-05:00)
@Andrew

yeah, the point of the post was about regexes, not date handling. Using a date for parsing was just the first example that popped into my head.
Comments are closed.