goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Friday, January 26, 2007

Python - Sort A Nested Sequence With DSU

The DSU (Decorate, Sort, Undecorate) idiom originates from Lisp.  I first learned it in Perl, where it is called the Schwartzian Transform (coolest name ever?), named after longtime Perl hacker Randal L. Schwartz.

I find myself using this same DSU idiom in Python when I need to sort a nested sequence (single level sequence of sequences).

Lets say I have the following list of lists:

seq = [
    ['a', 1, 5],
    ['b', 3, 4],
    ['c', 2, 2],
    ['d', 4, 3],
    ['e', 5, 1],
]

... and I want the outer list to contain the inner lists sorted by their last column (in this case, index 2).

How would I do this?

Here is an implementations of the DSU (Decorate, Sort, Undecorate) idiom in a Python function:

def dsu_sort(idx, seq):
    for i, e in enumerate(seq):
        seq[i] = (e[idx], e)
    seq.sort()
    for i, e in enumerate(seq):
        seq[i] = e[1]
    return seq
   
(Keep in mind that lists in Python are mutable and this will transform your original sequence.)


So applying this to the sequence above like this:

dsu_sort(2, seq)

gives us:

[['e', 5, 1], ['c', 2, 2], ['d', 4, 3], ['b', 3, 4], ['a', 1, 5]]

which is the original sequence, transformed so it is sorted by the last column (index 2).



Randal's original implementation in Perl from 1994:
#!/usr/bin/perl
 print
     map { $_->[0] }
     sort { $a->[1] cmp $b->[1] }
     map { [$_, /(\S+)$/] }
     <>;

#    Comments [3] |
Tuesday, January 30, 2007 7:33:51 AM (Eastern Standard Time, UTC-05:00)
Why not simply using the abilities of sort() with a lambda? I like the DSU pattern, but most of the time Python get it done on its own.

<pre>
>>> seq.sort(cmp=lambda x, y: cmp(x[2], y[2]))
>>> seq
[['e', 5, 1], ['c', 2, 2], ['d', 4, 3], ['b', 3, 4], ['a', 1, 5]]
</pre>

my 2 cents

r.
Tuesday, January 30, 2007 1:35:08 PM (Eastern Standard Time, UTC-05:00)
cool.. that should work also.
I've just always liked the DSU pattern since it can be implemented in any language.
Wednesday, January 31, 2007 7:20:36 PM (Eastern Standard Time, UTC-05:00)
Thanks for referencing the algorithm named after me, but not BY me. I'm not that vain. :)
Comments are closed.