goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Wednesday, February 28, 2007

One Laptop Per Child - It's All About the Python!

Wow. I just read something interesting about the One Laptop Per Child (OLPC) project in Guido's PyCon writeup:

"The software is far from finished.  An early version of the GUI and window manager are available, and a few small demo applications: chat, video, two games, and a web browser, and that's about it!  The plan is to write all applications in Python (except for the web browser), and a "view source" button should show the Python source for the currently running application.  In the tradition of Smalltalk (Alan Kay is on the OLPC board, and has endorsed the project's use of Python) the user should be able to edit any part of a "live" application and see the effects of the change immediately in the application's behavior."


So... they are going to be running a GNU/Linux OS (a stripped down version of Fedora), with essentially all applications in Python.

This is very cool on many levels. It is the ultimate endorsement of Python.  It also makes me think about the future...  If OLPC is successful, a few years down the road we might be looking at several million young new Open Source/Python hackers.  Nice!


#    Comments [0] |

Reading Outlook/Exchange Email Programatically with Python

With Python's Windows Extensions, you can talk via COM to an Exchange Server and read/process your email.  You must have the Outlook Client installed on the box you are running this from.

Here is a sample script that will:

  • connect to your mailbox
  • print the inbox name
  • print the message count
  • print the subjects for all your email messages


#!/usr/bin/env python

from win32com.client import Dispatch

session = Dispatch("MAPI.session")
session.Logon('OUTLOOK')  # MAPI profile name
inbox = session.Inbox

print "Inbox name is:", inbox.Name
print "Number of messages:", inbox.Messages.Count

for i in range(inbox.Messages.Count):
    message = inbox.Messages.Item(i + 1)
    print message.Subject


#    Comments [0] |
 Monday, February 26, 2007

Python 3000 Video and Slides

Guido van Rossum just published his slides from the PyCon 2007 Keynote where he discusses Python 3000.

His talk is also available on Google Video.

I'm psyched to watch this!

#    Comments [0] |

Boston - Back Bay Snow Pics

New England weather is crazy. In Boston, we get pounded with snow some winters.  Other winters we hardly get any of the white stuff.  This year has had really minimal snowfall, but it was coming down this morning.

I snapped a few pics on my way to work (I live in Back Bay).

Looking out my apartment window:

Looking out from my front stoop:

The view down Comm. Ave:

Copley Square, Old Trinity Church, John Hancock Building, Boston Public Library:


#    Comments [2] |
 Sunday, February 25, 2007

Wes Dyer on Type Systems

Awesome post by Wes Dyer about Type Systems:
Types Of Confusion


He tackles a lot of misconceptions and myths about Typing.

To frame his points, he starts with some fantastic definitions.  I am all for clarifying vocabulary and I really like this:


Based on the apparent confusion, I think it is best to clarify what I mean by each of the following terms:

    Type Checking - Verifying that code respects type constraints.

    Statically Typed - Type checking occurs at compile time.

    Dynamically Typed - Type checking occurs at run time.

    Type Safe Language - A language which protects its own abstractions.

    Type Unsafe Language - A language which is not type safe.

    Strongly Typed and Weakly Typed - Depends on the author; The definitions are so many and so varied that the terms are practically useless. It seems that anyone can claim that language X is either strongly typed or weakly typed based on sound reasoning derived from one of the various definitions.

    Dynamic Language - A language which enables runtime inspection or modification of a program; most languages can do this but dynamic languages make it easy. It is common for people to refer to "dynamic languages" and mean "dynamically typed languages" as the term is defined here.

good read.

#    Comments [0] |

How To See Your Swallowed Exceptions In .NET (Visual Studio debugger)

Good to know..


See all the exceptions you are swallowing in .NET (Visual Studio debugger):

Turn on 'all exceptions' and watch the fireworks fly


One of the most glaring differences for me between Java and .NET is the difference between Checked/Unchecked Exceptions, so stuff like this has been helpful to me in figuring out how exception handling in C#/.NET really works.

#    Comments [0] |

4-Space Indents - 4Eva!

I personally use 4-space indents when I write code (in any language, period).


Oddly... in Python, where whitespace matters, there is no single common practice (which would fit in nicely with Python's TOOWTDI ideology).

Some observations of source code indentations in Python:

  • I mostly see 4-space indents in code I read (colleagues, libraries, 3rd party code)
  • Python's own source (the Python parts) use 4-space indents
  • Google uses 2-space indents in their Python
  • I hate tabs (if I randomly sucker-punch you, this is why)


Guido van Rossum (creator, lead developer, and BDFL of Python) writes

"If it uses two-space indents, it's corporate code; if it uses four-space indents, it's open source. (If it uses tabs, I didn't write it! :)"
#    Comments [0] |

IronPython Community Edition - Free IronPython

IronPython is the Python implementation that runs on the .NET platform... originally created by Jim Hugginin, but later backed (overtaken?) by Microsoft.

Microsoft's ambivalence towards Free software is a bit hard to follow sometimes and it really makes me question their entire approach to the software community  (wait.. have I ever *not* questioned that? :).

As a Python advocate and *nix geek trying to make my way working in a .NET shop, I am really excited about IronPython.  I was also initially impressed with Microsoft's embracement of the Python community and toe dipping into Open Source.  But then I hear Microsoft will not take patches from non-Microsoft developers and will not bundle IronPython with other applications which have certain Free licenses(LGPL, BSD). To me, this is really a shame. That is not how you approach a community.

Well... at least somebody has stepped up and is maintaining IronPython Community Edition (IPCE).

props.


So check out IPCE and the FePy Project!

#    Comments [0] |
 Saturday, February 24, 2007

The Web as a Data Integration and Machine-Oriented Publishing Layer

This is what I was talking about when I wrote:

"one of the most attractive things about the Web is the ability to use HTTP as a simple transport protocol abstraction.  [...]  with this additional transport abstraction in place, you can build another application layer protocol on top of this and use that as your API for distributed operations.  That is where the rubber meets the road in modern large scale systems, and that is where the action is taking place in the current debate about SOA, REST, Web Services, and distributed architectures.  Furthermore, the foundation for this style is built directly into HTTP 1.1."

Bill de hÓra states it better in his "Confederacy" post:

"the Web is not just the presentation tier anymore; it's becoming a data integration and machine-oriented publishing layer.  The presentation layer is being pushed down to the client machine in the form of AJAX, XUL and Flex."

A new web/middleware layer has been forming, and this is the engine that is driving Web 2.0 and creating a new level of integration and interoperability.

#    Comments [0] |
 Friday, February 23, 2007

“Humanity Lobotomy” - Net Neutrality Open Source Documentary

Spread Awareness.

Awesome new video (via Lessig): 

“Humanity Lobotomy” - Net Neutrality Open Source Documentary


What a great month for videos!  I feel inspired to be working in technology again.  We control the future.

Check out more here

#    Comments [0] |

New Degree - Psyched About A Piece Of Paper

A little while back I wrote about finishing my Master's Degree at Boston Universtiy. 

Well... this bad boy arrived in the mail today:



I guess I am now officially educated... or something.



O'Toole's Corollary of Finagle's Law: “The perversity of the Universe tends towards a maximum”
#    Comments [3] |

Open Source Feedback and Participation (The WebInject Story)

I posted a lengthy entry to the "getting feedback on tools" thread going on at the toolsmith-guild group

The thread is discussing how to get feedback and participation going in test tools projects.  This experience report sheds some light on this, but more generally can be applied to getting your open source project noticed and used.  I thought other open source developers might find this useful.


Here is my post:



Hi.. new to the list.

I have some experience in this area I want to comment on:

I had a specific need for a tool to test web services a few years ago ('02-'03). I had been rolling my own tools in various languages for quite a while and had some stuff that I thought others might be interested in; so I setup a SourceForge project and released it in Jan'04. The tool is called WebInject. Not many people used it at first. I just kept adding features and releasing. I was scratching my own itches and using my tool internally at a company I was working for. Eventually I got a few bites and some people started to download it and try it out.

As of today, the project site has served over a quarter million pageviews and the tool has been downloaded over 30,000 times. I've had feedback from all corners of the world.

See the Download Stats

In the early days, *any* bug that was reported, I would dive into immediately.. I was turning around fixes and new releases in crazy time. This personal attention sparked a lot of lengthy email conversations where I was starting to get real feedback. It wasn't the case where I could just sit back and expect quality feedback. I really engaged the users and asked questions. Pretty soon I was getting more feedback than I could reasonably handle. Around this time I started getting some patches sent to me. This accelerated development as you start to get various ideas from other people. A few silly bugs were also cleaned up.. stuff I never would have caught on my own. Encouraging patches is huge.. once someone gets a piece of their own code into your code base, they have an intimate connection.

I was also contacted by someone writing a book that wanted to use my tool as an example. I was really excited about this and put in some serious time getting the tool cleaned up and more useful. I never saw a published copy, but apparently it exists (I saw the PDF drafts :) It was also mentioned in a few testing magazines and articles which was cool for me to see.

I don't maintain WebInject much any more. I think it is cool and useful, but it's a rats nest of Perl code that needs some serious love. There is some stuff that bothers me so much about it that I barely want to touch it sometimes :) I spend all my time on newer tools these days, but I still keep on top of it enough to facilitate others using it and posting patches/updates to it (though I haven't released in over a year I think). Oddly, it has become somewhat entrenched in monitoring systems. Most of the users lately seem to be people running Nagios (open source monitoring system) that need an intelligent web plugin/agent.


Some basic thoughts on open source tool adoption and getting feedback:

  • A tool without documentation doesn't exist.. without quality documentation, don't even bother.. nobody will touch it.
  • If it takes someone more than 10 minutes to install, configure, run, and see the results from a simple test case.. nobody will use it. Once someone sees something work, it is tangible and they become engaged. I achieved this with my tool.. if you download and run it, you are presented with a GUI. Pressing "run" executes a preconfigured sample test case that hits my site. I wanted the most clueless of people to be able to immediately see a green "PASS", and say "hey, cool it actually did something". This satisfaction must come immediately or it will be abandoned. (unless of course you already have a user base that has made it past this hurdle and can encourage others to put in the effort to get it working)
  • Make your tool easy to get. The online presence of your tool should allow someone to visit the website and immediately download it. I have seen tools released that take 5 levels of navigation before you get to the download options, or don't even have a website or project page somewhere.
  • Announce your tools.. release early, release often.. who cares.. blog it, forum it, email it, digg it, usenet it, comment about it.. see if anyone has interest. Don't annoyingly spam it, bet let people know it exists. If you announce your tool once only, buried deep in a technical mailing list, don't expect people to come rushing.
  • Encourage participation. Go out of your way to let people know that you encourage and appreciate feedback, bug reports, offers for collaboration, patches, etc.
  • You need a forum or a mailing list with a web accessible archive. It has to be super simple to talk to you in public and let others see answers to questions and begin to participate on their own and to offer advice. Using a many-to-one scheme where you just tell people to email you isn't gonna work. It is too private.. people want to know what is going on and hear from each other and offer tips. I have had several thousand posts to my forums for this little tool.


- Corey Goldberg


#    Comments [0] |

Developer + Tester == Develester ?

I saw this posted in the toolsmith-guild group (Danny Faught):

"The developers were very surprised to find a whole room full of testers who didn't cringe at the thought of reading and writing code. The rather odd terms Developer-Tester/Tester-Developer emerged from AWTA." (Austin Workshop on Test Automation)

ahh, the elusive "develester"


... and.. the develester algorithm in Python :)

#!/usr/bin/env python

def what_am_i(skills):
    if 'developer' in skills:
        role = 'developer'
    if ('tester' in skills):
        role = 'tester'
    if ('developer' and 'tester') in skills:
        role = 'develester'
    return role
        

skills = 'developer-tester'
print 'you are a %s' % what_am_i(skills)
#    Comments [0] |

Google Apps Premier - The Office Battle Is On

Last year, I made a prediction/bet that Google was gonna make a huge push into office applications and we were gonna see the MS Office monopoly start to erode.  Well, its on!  We actually have quite a cool phenomenon brewing, with Open Office striking from one side and online office apps striking from the other.

It is a classic disruptive play.. still far from a tipping point, but serious shots were fired over Microsoft's bow.   Google is going at it pretty aggressively too.  I just got an invite for a seminar in Boston that explains the new enterprise office tools (Google@Work Seminar).

It will be interesting to watch this unfold.

#    Comments [3] |
 Thursday, February 22, 2007

64-bit WEP Is Better Than 128-bit WEP

I am setting up a new wireless network tonight and enabling WEP encryption on my cheapo router.

When using WEP, you have a choice of key strengths: 64-bit and 128-bit

In the past I always chose 128-bit encryption because it is "stronger".  This time I went with 64-bit, and here is my reasoning:

All I care about is keeping away casual snoopers and freeloaders from my wireless connection.  I live in a dense part of Boston and can pick up 10+ wireless signals from my apartment at any given time, so this is likely.  Anybody with skills who seriously wanted to penetrate my network could do so with either key strength. Cracking the 128-bit encryption isn't much more of a hurdle than the 64-bit, so why bother?

Why slow down all your packets, waste your batteries (decryption is processor hungry and therefore laptop/battery hungry), and have a longer key to keep track of, when the added security it gives you doesn't exist?

(note: If I wanted to block anything more than casual usage, I wouldn't use WEP in the first place)

#    Comments [0] |

Joe Gregario on MOM vs. RPC

RPC: Remote Procedure Call
MOM: Message Oriented Middleware

Both RPC and MOM are communication models for distributed systems.  Each has strengths and advantages. However, when you get into large heterogenous distributed systems, message passing is the way to achieve scalability.

I like this quote:

"In a large system you may be faced with either a multitude of clients or a menagerie of them; in either case you have to stop serializing objects and start exchanging documents."
- Joe Gregorio, 2007
#    Comments [0] |

Not Using ASP.NET Session State? Then Turn It Off

I am developing some small ASP.NET 2.0 web applications.  They are stateless and I am not doing anything with Session State.  However, I noticed that ASP.NET enables Session State by default (In-process mode is the default setting).  Therefore, if you have a truly stateless site or application, session state does nothing more than slow down performance.

In-process session state is still relatively fast, as the memory used to handle session is allocated by the same process on the local machine (no cross-process calls or data marshaling).  But this is needless overhead if you are not using your Session State.

So... to turn it off for the whole application, add the following line to your web.config, inside the system.web section:


<sessionState mode="Off" />


#    Comments [0] |
 Tuesday, February 20, 2007

Compiling Python Scripts to Windows Executables

I often write quick Python scripts that I need to run on other machines. It is sometimes easier to just drop a windows .EXE onto a machine (with a Python Interpreter compiled into it), rather than doing a full Python installation. To do this, I use py2exe

py2exe is a Python Distutils extension which converts Python scripts into executable Windows programs. This enables your Python scripts to be run on Windows platforms without a Python installation.

You can run py2exe directly from the command line, or you can script it. I wrote a small convenience script that I use for general compilation.

Let's call the compilation script: compile.py
Let's say we have a script we want to compile named: foo.py

You would then invoke it from the command line like this:

>python compile.py foo.py

This will create a 'dist' subdirectory containing the newly created executable along with some necessary DLL's.


Here is the code I use for my compile.py:


#!/usr/bin/env python
# Corey Goldberg

from distutils.core import setup
import py2exe
import sys

if len(sys.argv) == 2:
    entry_point = sys.argv[1]
    sys.argv.pop()
    sys.argv.append('py2exe')
    sys.argv.append('-q')
else:
    print 'usage: compile.py <python_script>\n'
    raw_input("press ENTER to exit...")
    sys.exit(1)

opts = {
    'py2exe': {
        'compressed': 1,
        'optimize': 2,
        'bundle_files': 1
    }
}

setup(console=[entry_point], options=opts, zipfile=None)

(note: you need to have Python and py2exe installed on a Windows box to run this)

#    Comments [0] |
 Saturday, February 17, 2007

The Summer That Mr. Gates Retired and Mr. Stallman Didn't

Eben Moglen [2006]:

"This will be remembered in history as the summer that Mr. Gates retired and Mr. Stallman didn't"

Another excellent and inspiring speech by Moglen


.. where he reminds us of the inevitable, unavoidable, endgame:

"Software that can do everything, runs everywhere, needs no additional testing, just works, can do anything human beings want, and costs $0.00 per unit."


It eludes me why people don't seem to grasp this:

"Excluding people from ideas works wells only if you think of ideas as something only one person at a time has.

When you live in our world and you know that ideas are created by people cooperating, sharing, yelling, screaming, waving their arms and typing code at 2 in the morning; the 20 year old monopoly on the ownership of an idea is an abomination."


(but Eben, tell us how you *really* feel :)


#    Comments [0] |

Clarifying Architectural Styles for the Web

In his latest finely crafted post, REST and WS, Joe Gregorio gives the quick definitive overview of web services and modern distributed architecture, while clarifying much confusion.


First of all, what REST really is:

"REST is not a specific piece of technology but an Architectural Style that was abstracted from HTTP during the transition from HTTP 1.0 to HTTP 1.1."


OK..  I get it.  From a network perspective, going up the OSI Model/TCP Stack... starting from Layer 4, TCP is the transport layer protocol.   HTTP is the [Layer 7] application layer protocol that rides on top of it.  However, one of the most attractive things about the Web is the ability to use HTTP as a simple transport protocol abstraction, rather than interfacing wih TCP directly.  So with this additional transport abstraction in place, you can build another application layer protocol on top of this and use that as your API for distributed operations. That is where the rubber meets the road in modern large scale systems, and that is where the action is taking place in the current debate about SOA, REST, Web Services, and distributed architectures.  Furthermore, the foundation for this style is built directly into HTTP 1.1.

The problem with whole debate going on is that we are talking apples and oranges. Different architectural styles offer certain advantages, and these become apparent as your system grows in scale:


"REST and WS-* are two different tools whose strengths shine at different scales. The easiest way to think about this is an example from nature: at the scale of the atom the forces responsible for most of the action are different from the forces at the scale of a cell. Quantum effects and the strong nuclear force determine the structure and operation of an atom, while the operation of a cell is dominated by molecular reactions and Van der Waals' forces.

Another example closer to home; when programming and making calls into other functions and libraries, you pass along classes and types in the function call parameters. You expect those classes and types to be perfectly understood on the other side of that function call. Those are the rules at that scale; that type information can be counted on to survive and be useful over the function call boundary. As your scale grows, as you move outside the single executable, the same machine, or the same platform, that assumption begins to weaken, to the point that when you get to Internet scale services that assumption is actually harmful.

When working at the smaller scale the assumption that types can move across a boundary is powerful and allows many optimizations. Working in a homogeneous environment such as Java, WS-* has real advantages; you can very quickly create interfaces in your target programming language and expose those interfaces via WSDL and have them consumed just as easily on the calling side using the same WSDL.

As you move to larger systems, either many more clients connecting, or a non-homogeneous pool of clients, this paradigm starts to break down. If there are many clients then the demands for caching semantics will be begin to dominate. In that case you need to abandon HTTP as just a simple transport and start using the application level semantics of HTTP to start leveraging the caching architecture already built into the Internet."


Well.. that pretty much cements the whole idea in my head.  When you move towards larger distributed systems and/or less-homogeneous environments, scalability and interoperability become a concern.  There have been some clever approaches to solving these issues. Systems continue to become larger, more loosely coupled, and more interoperable... this is good... but as you approach this space, there are some tradeoffs you must make.

The real question is: should you think in those larger and better organized terms right from the start, or do you want to quickly exploit some of the advantages and optimizations available in another approach?  And of course the answer is context...  "It depends on the system".


#    Comments [0] |
 Friday, February 16, 2007

Google - All Your Search Traffic Are Belong To Us

(Yes, the title is intentionally ungrammatical

I run a personal website: www.goldb.org This is where I host my blog as well as content pages mostly dealing with computer programming. I was just looking over my traffic/visitors stats for the past month and noticed something interesting.

Basically, all of my search traffic comes from Google (I am indexed in every major search engine). I keep reading about search volume comparisons and how Google is slightly leading, and how more parity in the search market now exists.

Obviously my website visitors are skewed towards technical types, and the search terms they use to find my site are all technical/programming/software terms. The takeaway from this is that nearly all technical users are searching from Google instead of the other popular search engines.


Here is a breakdown of some stats from the last 30 days:

Where did my traffic come from?

  • 14.8% came directly
  • 70.4% from searches
  • 14.8% from other sites


Search Engine - # Visitors

  • Google - 1729
  • Yahoo - 20
  • Microsoft Live - 17
  • Technorati - 4
  • Del.icio.us - 2
  • AOL Search - 1



97.52% of visitors that reached my site in the past 30 days via search, came from Google.

#    Comments [0] |
 Thursday, February 15, 2007

Perl - File Slurping

A common idiom in Perl 5 is "slurping".  Slurping is the process of reading a file into an array, split by line breaks.  You can then iterate over the array and perform an operation on each line.  This is the basic input mechanism I use to process all sorts of data/text files.


The basic slurp goes like this...

Open a file in read mode and assign it a file handle:

open(FILE, 'foo.txt') or die $!;

Read (slurp) the file into an array of lines (splitting the file on newlines):

@file = <FILE>;


You can then process the array in a foreach loop and "Un-slurp" (De-slurp?) it back to the file system like this...

Now we have an array which we can iterate through and do whatever we want with each line:

foreach (@file) { # do something here }

Re-open the file in overwrite mode:

open(FILE, '>foo.txt') or die $!;

Print the contents of the array back to the file:

print FILE @file;


The following script shows some slurping in a action. This script will read a file named "foo.txt" and replace all intances of "foo" with "bar"

#!/usr/bin/perl replace('foo.txt', 'foo', 'bar'); sub replace { ($filename, $original, $substituted) = @_; open(FILE, $filename) or die $!; @file = ; foreach (@file) { s/$original/$substituted/g; } open(FILE, '>foo.txt') or die $!; print FILE @file; }
#    Comments [0] |
 Tuesday, February 13, 2007

Trampolining With Generators - Roll Your Own Scheduler?

Even the subject sounds confusing huh?

I was reading Neil Mix's: Threading in JavaScript 1.7 post and was really fascinated by the concept he discusses: trampolining

Basically, trampolining it is a way to achieve concurrency by using Generators to create a coroutine scheduler.

In JavaScript 1.7 (which Firefox 2 supports), you can already do concurrent programming with this technique.


Neil Mix:

"The way trampolining works is that a scheduler object (written in JavaScript) manages the execution of a series of generators, cobbling together a stack-like execution. Here’s how it works: The scheduler sets the starting generator as the base “frame” in the call stack. The scheduler then calls next() on the generator to obtain a yield value. If the yielded value is itself a generator, the scheduler pushes this new generator on the stack and calls next() on it, again obtaining a yield value. This continues until the top generator yields a non-generator value. This value could be a special directive to the scheduler (for example, a SUSPEND value that tells the scheduler to freeze execution of the “stack” of generators we’ve piled up). If not, the scheduler treats it as a return value. The scheduler then pops and closes the now complete generator and sends the return value back into the next generator in the stack."

pretty sick, huh?  ... definitely a twisted idea :)

The interesting takeaway is that this technique could be used to implement concurrency in any language that supports Generators.  It looks like Python has a similar capability.  This is described in detail in: PEP 342 - Coroutines via Enhanced Generators.

Generator-based state machines sound really interesting.  Hopefully I'll find some time to play with them [in python] as an alternate to threading.

#    Comments [0] |

Screen Scraping in Python

Mads Kristensen just posted an article: Screen scraping in C#, where he shows several ways to make HTTP requests in C# that can be used for screen scraping.

from Mads:

"Some say that screen scraping is a lost art because it is no longer an advanced discipline. That may be right, but there are different ways of doing it. Here are some different ways that all are perfectly acceptable, but can be used for various different purposes."


Not to be outdone... here are 2 examples of how to do the same thing in Python:

using httplib:

conn = httplib.HTTPConnection("www.python.org")
conn.request("GET", '/')
print conn.getresponse().read()


using urllib:

f = urllib.urlopen('http://www.python.org/')
print f.read()


#    Comments [2] |

Solaris Zero-Day Exploit - TELNET Insanity

A few days ago there was a zero-day exploit announced for Solaris that allowed people to use TELNET to gain root access to your machine.  This seems pretty bad and a lot of people on the inter-web are freaking over it.

However, the question is not:  "How did this bug go undiscovered?"

The question should be:  "What were you smoking when you enabled the TELNET daemon facing the public Internet??"  (and can I have some?)

#    Comments [0] |
 Monday, February 12, 2007

Microsoft Performance Testing Guidance

Scott Barber just posted some info about: Patterns & Practices: Performance Testing Guidance, a new guide to Performance Testing over at Microsoft's Codeplex.

from Scott:

"I am involved in Microsoft's Patterns & Practices Performance Testing Guidance project. We have reached a critical mass with regards to our "mostly final" content and have made that content publicly available"

"We're tackling various flavors of performance testing (stress, load, capacity) as well as how to bake performance testing into your life cycle."

from the Patterns & Practices site:

"The purpose of this project is to build some insightful and practical guidance around doing performance testing and using Visual Studio 2005. It's a collaborative effort between industry experts, Microsoft ACE, patterns & practices, Premier, and VSTS team members."

I have always had ambivalent feelings towards MS, but it is great to see all of the effort they are putting into the Performance field. Performance has always been a sort of niche domain. It straddles the disciplines of development, testing, and operations, while also involving the integration of software, hardware, and networks. The past few years have provided much more thought leadership that is pushing the state of Performance (load, stress, scalability, capacity, availability) into more mature territory.

For those that don't know Scott Barber, he certainly gets my vote for the most prolific writer in Performance over the past several years. His body of writing is second to none: http://www.perftestplus.com/pubs.htm

#    Comments [0] |
 Sunday, February 11, 2007

Sqeezebox - Hackable Device For Streaming My Tunes

(* I am not affiliated with Slim Devices.. this is just a fanboy post.)

I wanted to hook up something that would integrate my home PC (jacked full of glorious DRM-free MP3's) and my home audio system. I didn't want to go the full HTPC route, I am more of an audio guy and my immediate need is for an audio-only solution.

After browsing the various Media Servers, Sound Cards, External Components, and other music playing gizmos; I finally figured out the type of unit I was looking for and what it should do...

Here are my requirements:

  • Must be able to play the MP3's stored on my computer
  • Must have a cross platform client application (I use both Linux and Windows at home)
  • Must have high quality analog and digital output that can connect to my receiver/amp
  • Must have a remote with basic controls (volume, tuning, select, etc)
  • Must be able to control it from my computer (change songs, playlists, etc)
  • Must be able to stream Internet Radio (of some sort)
  • No wires connected to my PC

Obviously some of these requirements became apparent when I read about the Squeezebox from Slim Devices.  It seems to do everything I need (and more), and isn't outrageously expensive ($299 retail).



So I ended up ordering one of these bad boys to play around with.  (of course I got the all black version)

OK.. but now the real reason I chose the Squeezebox... It is Open Source and has a developer community!

Well... it is sort of Open Source.  Its SlimServer software (the audio server software) is GPL licensed, so the Source Code is available to modify, hack, and contribute to.  But.. the device's firmware is proprietary and closed (boo).

The real kicker came when I realized that the SlimServer is written in Perl It is web based, and uses templated HTML/CSS.  Hmm.. I've written a Perl program or 2 [thousand] in my day... this could get interesting.  I already have SlimServer running from Source and I am making little tweaks to the interface to customize it for myself.  Hopefully someday I'll do something worth contributing back...  I love that I have the option.

... More to come once I actually get the Sqeezebox and hook it up..  I just ordered it last night.

#    Comments [0] |
 Friday, February 09, 2007

Python - use Psyco (x86 JIT-like compiler) for a speed boost

Psyco is a Python extension module which can speed up the execution of any Python code.

from the Psyco site:

"Think of Psyco as a kind of just-in-time (JIT) compiler, a little bit like what exists for other languages, that emit machine code on the fly instead of interpreting your Python program step by step. The difference with the traditional approach to JIT compilers is that Psyco writes several version of the same blocks (a block is a bit of a function), which are optimized by being specialized to some kinds of variables (a "kind" can mean a type, but it is more general). The result is that your unmodified Python programs run faster"


I have been working on some Python projects recently where Pysco has given me a a really substantial performance increase. The type of work I am doing mostly involves statistcal analysis of large numerical data sets.. array math.. percentiles.. time-series.. etc, etc.

To use it, all I do is copy Psyco to my system (to Python's Lib/site-packages), and add the following to the top of my python source file:

import psyco
psyco.full()


Thats all ...

... or even better; wrap it in a try/except so your program still runs on systems without Psyco installed.

try:
    import psyco
    psyco.full()
except:
    pass
#    Comments [0] |

Live Brain-Surgery With Python

Another example of improved productivity with dynamic languages...

Gojko Adzic on prototyping in Python:

"writing the prototype in Python allowed us to start a web server and open an interactive console to re-wire it and perform live brain-surgery while the server is running. I cannot imagine doing that in Java or C#. In a month, we wrote the functional equivalent of at least 4-5 months of C# code."

#    Comments [0] |
 Thursday, February 08, 2007

Are We In The Matrix?

When I first saw the Matrix in '99, it instantly became one of my favorite movies.  To this day, the concept this movie delves into is pretty disturbing.  Ever since seeing it, I had the realization that there is no proof we are not in a Matrix-like simulated reality... which I can't even get my head around.

The 2002 Edge Question asked was: "Is the universe a quantum computer?"

Seth Lloyd validated my thoughts in his answer:

"The universe is quantum mechanical, and its dynamics can be simulated precisely and efficiently using quantum information processing. The amount of quantum computation required to perform this simulation is finite and has been calculated. Consequently, there is no obvious way to distinguish the universe from a very large quantum logic circuit."

.. that really freaks me out

[insert Keanu Reeves joke here]

#    Comments [0] |
 Tuesday, February 06, 2007

Anders Hejlsberg on LINQ and Functional Programming

Anders Hejlsberg on LINQ and Functional Programming

This is a good video of an interview with Anders Hejlsberg on LINQ and Functional Programming.  He is the designer of C# and talks about some of the upcoming features in Orcas (the next Visual Studio with C# 3.0).

I think it is very interesting (and good) that Microsoft (and many other modern language designers) are adding functional programming features.  If functional programming, lambda expressions, list comprehensions, and set processing, are your bag.. watch this.

#    Comments [0] |

Improving Regular Expression Performance

Alex from Dojo just linked to a fascinating article about by Russ Cox about Regular Expressions:  Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

from the article:  
"This article reviews the good theory: regular expressions, finite automata, and a regular expression search algorithm invented by Ken Thompson in the mid-1960s. It also puts the theory into practice, describing a simple implementation of Thompson's algorithm. That implementation, less than 400 lines of C, is the one that went head to head with Perl above. It outperforms the more complex real-world implementations used by Perl, Python, PCRE, and others. The article concludes with a discussion of how theory might yet be converted into practice in the real-world implementations."
so.. there is a 40 year old technique that improves performance of regexes dramatically?

The following graph plots time required to check whether a?^na^n matches a^n:



wow... so awk and grep use the Thomson NFA implementation of regexes, while most programming languages don't.  

... and here I thought Perl was the regex king.

#    Comments [0] |
 Sunday, February 04, 2007

Perl - Building Web Clients

The following is a short tutorial on web programming in Perl I wrote several years ago.  This type of programming was my first foray into the guts of the web.  Writing tools at the protocol level forced me to gain a deep understanding of HTTP and Web Architecture, which has been extremely helpful to me since.


These examples show how to use Perl's 'LWP' (libwww-perl) modules to make requests to a web server. The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface to the web.


Using 'LWP' to do an HTTP GET Request:

This will request the main Google page and store the entire contents of the response in the the '$response' object.
#!/usr/bin/perl

use LWP;

$useragent = LWP::UserAgent->new;
$request = new HTTP::Request('GET',"http://www.example.com");
$response = $useragent->simple_request($request);

print $response->as_string();

(*use "useragent->request" instead of "useragent->simple_request" to follow server redirects)


Working With Cookies:

Here is the http header returned by the initial http request to Google:
(first part of 'print $response->as_string();' output in the previous example)
Date: Mon, 14 Apr 2003 18:38:28 GMT
Server: GWS/2.0
Content-Length: 2691
Content-Type: text/html
Content-Type: text/html; charset=ISO-8859-1
Client-Date: Mon, 14 Apr 2003 18:38:29 GMT
Client-Peer: 216.239.57.99:80
Client-Response-Num: 1
Connection: Close
Set-Cookie: PREF=ID=48fd767576ebd920:TM=1050345508:LM=1050345508:S=qLA8i5XyvLX37lG6;
expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Title: Google

Notice the "Set-Cookie:" line in the header. This is what tells your web browser that a cookie needs to be set and returned as part of the http header in subsequent http requests to this server. In this case the cookie doesnt do much, but for a site that requires a login, this is how the server knows who you are to maintain a session.

In Perl, cookies can be handled for you by using the HTTP::Cookies module.

You first need to construct the object to contain your cookies:
$cookie_jar = HTTP::Cookies->new;

After an http request is sent, you can then extract the cookie from the response header:
$cookie_jar->extract_cookies($response);

Once you have the cookie stored in your cookie_jar, it needs to be sent back to the server in the header of every subsequent http request. This is done by adding the following command after you format each request:
$cookie_jar->add_cookie_header($request);


Now for the whole thing in a script:


The following script will make a request to the main Google page and store the cookie it receives. It will then make a request to Google to change the default language (user preference) to Spanish. A new cookie will be returned that we will store and use it to make another request to the main Google page. Google will recognize the information stored in our cookie and return the page in Spanish.
#!/usr/bin/perl

use LWP;
use HTTP::Cookies;

# construct objects
$useragent = LWP::UserAgent->new;
$cookie_jar = HTTP::Cookies->new;

# send request for main Google page
$request = new HTTP::Request('GET',"http://www.google.com");
$response = $useragent->simple_request($request);

# extract cookie from response header
$cookie_jar->extract_cookies($response);

# set user preference on Google to Spanish language
$request = new HTTP::Request('GET',"http://www.google.com/setprefs?
               submit2=Save+Preferences+&hl=es<=all&safe=images&num=10
               &q=&prev=http%3A%2F%2Fwww.google.com%2F&ie=UTF-8&oe=UTF-8");
$cookie_jar->add_cookie_header($request);
$response = $useragent->simple_request($request);

# extract new cookie from response header
$cookie_jar->extract_cookies($response);

# send request for main Google page (will return Spanish Google page)    
$request = new HTTP::Request('GET',"http://www.google.com");
$cookie_jar->add_cookie_header($request);
$response = $useragent->simple_request($request);

print $response->as_string; # print response body to verify cookies work (some text now in spanish)

#    Comments [0] |