goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Tuesday, February 06, 2007

Improving Regular Expression Performance

Alex from Dojo just linked to a fascinating article about by Russ Cox about Regular Expressions:  Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

from the article:  
"This article reviews the good theory: regular expressions, finite automata, and a regular expression search algorithm invented by Ken Thompson in the mid-1960s. It also puts the theory into practice, describing a simple implementation of Thompson's algorithm. That implementation, less than 400 lines of C, is the one that went head to head with Perl above. It outperforms the more complex real-world implementations used by Perl, Python, PCRE, and others. The article concludes with a discussion of how theory might yet be converted into practice in the real-world implementations."
so.. there is a 40 year old technique that improves performance of regexes dramatically?

The following graph plots time required to check whether a?^na^n matches a^n:



wow... so awk and grep use the Thomson NFA implementation of regexes, while most programming languages don't.  

... and here I thought Perl was the regex king.

#    Comments [0] |
Comments are closed.