goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Tuesday, December 18, 2007

The Python Papers - Screen Scraping Article

The new issue of the Python Papers is out.  It includes a small article I wrote called: Screen Scraping Web Pages

The issue can be downloaded here:  The Python Papers, Volume 2, Issue 4 (pdf)

This tutorial shows how to programmatically retrieve a stock quote from Google Finance.  It uses Python's high level Web API and screen scraping with regular expressions.
#    Comments [2] |
 Monday, December 17, 2007

Python Experts - Why They Do Python

I was recently interviewed for the article:
Python Experts - Why They Do Python

I don't think I am even close to an "expert", but it was nice being asked to participate.

#    Comments [0] |
 Friday, December 14, 2007

Technical Skills For Performance Testers

A performance engineer is a little bit of a jack-of-all-trades.  Rather than focusing on small technological niche, performance testers must have a very wide range of technical skills to understand the inner working of a complex system under test.

As far as skill go, Scott Barber has said that you need to be a "mid-level everything":

"Become a "Mid-Level Everything" – Developer, DBA, Network Admin, Systems Admin, Architect, Business Analyst, etc."

If you want to become proficient in analyzing system performance and scalability, there are many technical areas you should study.  Here are some skills I have found to be invaluable in my success as a performance engineer:


Performance Concepts:

  • Methodology
  • Load Generation Tools
  • User/Workload Modeling
  • Results Analysis (Latency, Throughput, Metrics)
  • Bottleneck Detection
  • Code Profiling
  • Scalability
  • Concurrency
  • Charting/Graphing
  • Statistics

Operating Systems and Servers:

  • Monitoring (CPU/Network/Mem/Disk/etc)
  • System Tuning
  • Web/Application/Middleware Server Tuning
  • System Administration
  • Virtualization
  • OS Concepts (CPU Scheduling, Memory Management, etc)

Database:

  • SQL
  • Stored Procedures
  • Monitoring
  • Tuning

Network:

  • Topology
  • Monitoring
  • Load Balancing
  • TCP/IP
  • HTTP
  • Packet Sniffing and Protocol Analysis
  • Caching
  • OSI Model

Programming:

  • Proficiency in at least on general programming language. Preferably a dynamic scripting language (Python/Perl/Ruby/etc)
  • Code/Algorithm Analysis


(Note: There are lots of "soft skills" a performance tester would need to be successful. This post focuses only on technical skills)

#    Comments [2] |
 Tuesday, December 11, 2007

Copyright - Music For Music's Sake - Grateful Dead and Woody Guthrie

Sometimes music is treated as art, not as a vehicle for hoarding money and restricting consumers.

This is exemplified by the copyright policies of certain performers.  Here are my favorite policies from some influential acts:

Grateful Dead's Mp3 Policy:

"The Grateful Dead and our managing organizations have long encouraged the purely non-commercial exchange of music taped at our concerts and those of our individual members. That a new medium of distribution has arisen - digital audio files being traded over the Internet - does not change our policy in this regard."

Woody Guthrie's Standard Copyright Notice:

"This song is Copyrighted in U.S., under Seal of Copyright # 154085, for a period of 28 years, and anybody caught singin’ it without our permission, will be mighty good friends of ourn, cause we don’t give a dern. Publish it. Write it. Sing it. Swing to it. Yodel it. We wrote it, that’s all we wanted to do."

#    Comments [1] |
 Friday, December 07, 2007

Lies, Damned Lies, And Statistics

The 3 most important caveats to be aware of when dealing with statistics:

  • Correlation does not imply causation.
  • You can make statistics tell you nearly anything you want.
  • Statistics without proper context are meaningless.
#    Comments [2] |
 Wednesday, December 05, 2007

Open Source Testing - Community Donations Program

Open Source Testing is a great resource that lists most of the open source tools available to testers.  The site is run by Mark Aberdour and has been around since 2003.

Mark's contributions to the testing and open source communities have been very valuable.

Well... he just stepped it up a notch by posting details of his new Community Donations Program:

"during 2007 Open Source Testing has begun to generate fairly regular revenue. It has always been my aim, should the site become commercially viable, to put some profits back into the open source community. I will be aiming to make bi-monthly donations (funds providing) to open source testing projects and open source organisations of my own choosing. The donations will not be earth shattering, but whether they cover hosting and hardware costs, contractor costs, publicity, trips to events or just some extra motivation, they will certainly make a difference."

Great work Mark!

#    Comments [0] |
 Tuesday, November 27, 2007

Python - Extracting Files From Zip Archives

Here is a way to unzip files in Python.  If you have a zip containing multiple files, you can unzip it like this:

import zipfile

fh = open('foo.zip', 'rb')
z = zipfile.ZipFile(fh)
for name in z.namelist():
outfile = open(name, 'wb')
outfile.write(z.read(name))
outfile.close()
fh.close()
#    Comments [6] |
 Monday, November 26, 2007

wxPython - Hello World!

Here is a simple example for those getting started with Python GUI Programming, wxWidgets, and the wxPython Bindings.

This small program will display a Frame and the static text "Hello World!", positioned with a BoxSixer.

Output looks like this:



#!/usr/bin/env python

import wx

class Application(wx.Frame):
    def __init__(self, parent):
        wx.Frame.__init__(self, parent, -1, 'My GUI', size=(300, 200))
        panel = wx.Panel(self)
        sizer = wx.BoxSizer(wx.VERTICAL)
        panel.SetSizer(sizer)
        txt = wx.StaticText(panel, -1, 'Hello World!')
        sizer.Add(txt, 0, wx.TOP|wx.LEFT, 20)
        self.Centre()
        self.Show(True)

app = wx.App(0)
Application(None)
app.MainLoop()
#    Comments [0] |
 Monday, November 19, 2007

FSF Releases GNU Affero General Public License

The Free Software Foundation just released the final version of the GNU Affero General Public License (GNU AFDL).  This license covers software that is hosted on a computer network (SaaS - Software as a Service).  The regular GNU GPL only covers software distribution, so you are able to run modified GPL code on a network server without releasing your modified source code.  The GNU AFDL prohibits this and ensures source code for hosted software is made available.

from FSF:

"The Free Software Foundation (FSF) today published the GNU Affero General Public License version 3 (GNU AGPLv3). This is a new license; it is based on version 3 of the GNU General Public License (GNU GPLv3), but has an additional term to allow users who interact with the licensed software overa network to receive the source for that program."

It will be interesting to see which projects adopt this license and what its effects will be.  I can imagine that commercial companies would be very hesitant to use AFDL code.

#    Comments [0] |
 Thursday, November 15, 2007

Mike Kelly On Vendor Hype

Mike Kelly wrote an excellent piece about dealing with product/service vendors.  Go read the full post:

I think I’m ready to close the dialog… 

It is a rant, but the points he makes are right on the mark. If you want to cut through the crap that vendors like to spew, Mike has some great points and observations to think about. He defines some of the common sales terminolgy, what they actually mean. Vendors should take a close look at this.

I have been in the meetings he described where slicked up sales guys try to hawk some product that will not only solve all of your problems, but will also cure world hunger and save baby seals. This is usually followed by some fancy powerpoint and the same talking points that you have already read in their website's FAQ. Half of the time it takes everything I have to not stand up and just yell: "just give me the damn white paper and tell me the price already!".

My favorite observations he makes about terminology:

“Additional value add”:

"It’s a superfluous phrase. I’m tempted to ask, “What services or products do you provide that don’t add value? I just want to know so I can be sure I’m not paying for any of those.” I hope all your services add value. If they don’t, don’t offer them. When you use terms like “additional value add,” in my mind you become the used car salesman of your industry."
"Opening a dialog”

"The last thing I want to do is “open a dialog” about my problems. If I put a specific problem in front of you, I’m interested in specific solutions. Put the person in front of me who can help me understand what we need to do, and what you can do to help. If you’re not that person, I’ll find another vendor who is. I don’t have time to dialog, I have problems."

Nice post Mike!

#    Comments [0] |
 Wednesday, November 14, 2007

Regex Capture Groups In Python and Perl

I am a Python programmer and ex-Perl hacker.

Regular Expressions are possibly the quintessential feature of Perl and are directly part of the language syntax.

Rather than being part of the syntax, Python's Regular expressions are available via the 're' module. For some reason, I had some trouble figuring out matching groups when I first started using Python's Regular Expressions.

He are examples of extracting capture groups in both Perl and Python.

Lets say we have a string containing a date: '11/14/2007', and we want to capture only the year from this string.

A regex to match this format might be something like this:

[0-9]{2}/[0-9]{2}/[0-9]{4}

We can then put parenthesis around the piece we want to extract (the 4-digit year) to denote a capture group.

So now our regex would look like this:

[0-9]{2}/[0-9]{2}/([0-9]{4})


Perl Example:

$foo = '11/14/2007';

if ($foo =~ m^[0-9]{2}/[0-9]{2}/([0-9]{4})^) {
    print $1;
}

output:

2007

* Note the string we captured ended up in the special variable $1


Python Example:

import re

foo = '11/14/2007'

match = re.search('[0-9]{2}/[0-9]{2}/([0-9]{4})', foo)
if match:
    print match.group(1)

output:

2007

* Note the string we captured ended up in a match object, which can be accessed with the 'group()' method.

#    Comments [6] |
 Tuesday, November 13, 2007

Lintel (Linux/Intel) Dominates Supercomputers

Pretty interesting...

via BetaNews article:

"Twice each year, the rankings of 500 of the world's supercomputers are assessed by the University of Mannheim in association with Berkeley National Laboratory and the University of Tennessee, Knoxville. Their figures are then sorted by tested clusters' maximal observed peak performance, in gigaflops."
"Intel-based processors walked away with one, if not two, lions' shares worth of the Top 500 list, with a staggering 354 total systems."
"460 of the Top 500 systems were running one flavor of Linux or another, including all of the Top 10."
#    Comments [0] |
 Thursday, November 08, 2007

A Quick Guide To GPLv3

The FSF just posted this:

A Quick Guide to GPLv3

A very nice high level overview of the current GPL and what it means.

#    Comments [1] |
 Wednesday, November 07, 2007

Python - Processing Large Text Files One Line At A Time

I want to process some very large text files one line at a time.  Normally when I process text files, I slurp them into a list using the readlines() method.   However, sometimes the files are huge and it isn't feasible or optimal to read the entire content into memory upfront.   In this case, it makes sense to process them one line at a time.

The best solution I can come up with is this:


fh = open('foo.txt', 'r')
line = fh.readline()
while line:
    # do something here
    line = fh.readline()

It doesn't feel very pythonic/idiomatic.  Anyone have a better solution?


Update
Thanks to the comments below, I found a few different ways to do it. The best and most Pythonic way seems to be this:


for line in open('foo.txt', 'r'):
    # do something here

Python file objects support the iterator protocol, so you can just open it and go.   This is the same as using a while loop and calling readline() but more compact.

#    Comments [7] |

Done With Bloglines (So Long, And Thanks For All The Fish)

I was a Bloglines user for several years.  I liked the old-school frame interface and it generally met all of my needs to keep up to date with the hundreds of feeds I read regularly.

Recently, Bloglines released a Beta of their new feed reader.  It uses lots of AJAX and is significantly different than the classic version.  Since this will entail learning a new web-based feed reader, I thought it would be a good time to check out Google Reader.  So I exported my OPML and gave it a try.

First impressions are very good. I like the interface and keyboard shortcuts a lot. So.. now I'm hooked and it is the feed reader I will use going forward.

The only gripe I have is that Google Reader doesn't display the Favicon (little graphic icon) next to each feed.  All you get is a generic blue icon.  What's up with that?

#    Comments [6] |
 Tuesday, November 06, 2007

Extreme Linux Performance Monitoring And Tuning

I just came across a great site with lots of papers related to performance monitoring and tuning for Linux:

http://www.ufsdump.org

One paper I especially liked:

Extreme Linux Performance Monitoring And Tuning

"The purpose of this document is to describe how to monitor Linux operating systems for performance.  This paper examines how to interpret common Linux performance tool output.  After collecting this output, the paper describes how to make conclusions about performance bottlenecks."

Lots of great info!

#    Comments [2] |
 Friday, November 02, 2007

Is Wal-Mart's $200 Linux-based PC "Unacceptably Low End"?

Wal-Mart unveiled its $200 Linux-based PC.

from the Wired blog:

"It has a 1.5 Ghz VIA C7 CPU embedded in a Mini-ITX motherboard, 512MB of RAM and an 80GB hard drive. Normally, this would simply mark it as unacceptably low-end for use with modern software."

I'm not so sure about "unacceptably low-end".  The specs on this PC are substantially better than my home machine.  I have a box at home that I primarily use for web surfing.  It was an old castaway Windows NT machine from an old job.  I run Ubuntu (with Gnome) on it, and it works like a charm.  It's a 933MHz P3 with a 256MB RAM and a dog slow hard drive.

So.. with superior specs, I think the Wal-Mart machine would be a great PC for basic home use.

.. though the "VIA C7" chip scares me a bit.  Any idea how it stacks up against a similar spec'ed Intel or AMD?

#    Comments [0] |
 Wednesday, October 31, 2007

Which Version Of Python Ships With Mac OS X Leopard?

I am not a Mac user, but in case anyone is interested in knowing which version of Python ships with OS X Leopard, the answer is Python 2.5.

#    Comments [0] |

Learn The Ideals And History Of Free And Open Source Software

There are lots of resources available online to learn about Free and Open Source Software.

If you want to understand the essence and ideals of this movement, a great start would be to read the following 4 books. After reading these, you will have a good grasp of the history and philosophy of freedom in the technology world.

#    Comments [0] |
 Wednesday, October 24, 2007

Python - List Comprehensions Leak Variables

One thing to remember when using List Comprehensions is that they "leak" their temporary iteration variable to the outside.

what does that mean?

In the following example, we still have access to 'x' after we run the list comprehension.

foo = ['a', 'b', 'c']
my_list = [x for x in foo]
print x

output:
>> c

This behaviour is different from how a Generator Expression works. We could have wrote the List Comprehension using a Generator Expression like this:

my_list = list(x for x in foo)

Now, the temporary variable we used is not accessible from outside the scope of the expression.

foo = ['a', 'b', 'c']
my_list = list(x for x in foo)
print x

output:
>> NameError: name 'x' is not defined

Note: This is fixed in Python 3000

#    Comments [5] |
 Monday, October 22, 2007

OpenSTA 1.4.4 Release (Open Source HTTP Performance Test Tool)

The OpenSTA team has announced the release of version 1.4.4

OpenSTA is a distributed software testing architecture designed around CORBA.  The applications that make up the current OpenSTA toolset were designed to be used by performance testing practitioners for web load testing.

Info:
http://portal.opensta.org/index.php?name=News&file=article&sid=51

Download:
http://opensta.org/download.html

Congrats and thanks to Bernie Velivis, Daniel Sutcliffe, Jerome Delemarche for making this release possible.




#    Comments [1] |
 Thursday, October 18, 2007

Charts And Graphs - Modern Solutions

To all the chart/graph/plot/visualization weenies out there...
Here is a great overview of some modern charting and graphing technologies.

Some options I will be exploring:

#    Comments [0] |
 Sunday, October 14, 2007

Python - Simple Multithreaded HTTP Load Generator/Timer

This is a module for generating concurrent requests to an HTTP server.  Each thread makes HTTP GET requests to a single URL at the specified interval.  Threads are added over a given rampup time if you want to generate increasing load.  Response times are printed to STDOUT.  Can be used for cursory performance benchmarking or load testing a web resource.

load_generator.py module

sample usage:


#!/usr/bin/env python

from load_generator import LoadManager

lm = LoadManager()
lm.msg = ('www.example.com', '/')
lm.start(threads=5, interval=2, rampup=2)
#    Comments [3] |
 Wednesday, October 10, 2007

Twelve Networking Truths - Good, Fast, Cheap: Pick Any Two

I love reading old RFC's.

One of my favorites is RFC 1925 - The Twelve Networking Truths:

The Fundamental Truths

  1. It Has To Work.
  2. No matter how hard you push and no matter what the priority, you can't increase the speed of light.
    (corollary) No matter how hard you try, you can't make a baby in much less than 9 months. Trying to speed this up *might* make it slower, but it won't make it happen any quicker.
  3. With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.
  4. Some things in life can never be fully appreciated nor understood unless experienced firsthand. Some things in networking can never be fully understood by someone who neither builds commercial networking equipment nor runs an operational network.
  5. It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea.
  6. It is easier to move a problem around (for example, by moving the problem to a different part of the overall network architecture) than it is to solve it.
    (corollary) It is always possible to add another level of indirection.
  7. It is always something.
    (corollary) Good, Fast, Cheap: Pick any two (you can't have all three).
  8. It is more complicated than you think.
  9. For all resources, whatever it is, you need more.
    (corollary) Every networking problem always takes longer to solve than it seems like it should.
  10. One size never fits all.
  11. Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.
  12. In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away.
#    Comments [1] |