goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Monday, November 19, 2007

FSF Releases GNU Affero General Public License

The Free Software Foundation just released the final version of the GNU Affero General Public License (GNU AFDL).  This license covers software that is hosted on a computer network (SaaS - Software as a Service).  The regular GNU GPL only covers software distribution, so you are able to run modified GPL code on a network server without releasing your modified source code.  The GNU AFDL prohibits this and ensures source code for hosted software is made available.

from FSF:

"The Free Software Foundation (FSF) today published the GNU Affero General Public License version 3 (GNU AGPLv3). This is a new license; it is based on version 3 of the GNU General Public License (GNU GPLv3), but has an additional term to allow users who interact with the licensed software overa network to receive the source for that program."

It will be interesting to see which projects adopt this license and what its effects will be.  I can imagine that commercial companies would be very hesitant to use AFDL code.

#    Comments [0] |
 Thursday, November 15, 2007

Mike Kelly On Vendor Hype

Mike Kelly wrote an excellent piece about dealing with product/service vendors.  Go read the full post:

I think I’m ready to close the dialog… 

It is a rant, but the points he makes are right on the mark. If you want to cut through the crap that vendors like to spew, Mike has some great points and observations to think about. He defines some of the common sales terminolgy, what they actually mean. Vendors should take a close look at this.

I have been in the meetings he described where slicked up sales guys try to hawk some product that will not only solve all of your problems, but will also cure world hunger and save baby seals. This is usually followed by some fancy powerpoint and the same talking points that you have already read in their website's FAQ. Half of the time it takes everything I have to not stand up and just yell: "just give me the damn white paper and tell me the price already!".

My favorite observations he makes about terminology:

“Additional value add”:

"It’s a superfluous phrase. I’m tempted to ask, “What services or products do you provide that don’t add value? I just want to know so I can be sure I’m not paying for any of those.” I hope all your services add value. If they don’t, don’t offer them. When you use terms like “additional value add,” in my mind you become the used car salesman of your industry."
"Opening a dialog”

"The last thing I want to do is “open a dialog” about my problems. If I put a specific problem in front of you, I’m interested in specific solutions. Put the person in front of me who can help me understand what we need to do, and what you can do to help. If you’re not that person, I’ll find another vendor who is. I don’t have time to dialog, I have problems."

Nice post Mike!

#    Comments [0] |
 Wednesday, November 14, 2007

Regex Capture Groups In Python and Perl

I am a Python programmer and ex-Perl hacker.

Regular Expressions are possibly the quintessential feature of Perl and are directly part of the language syntax.

Rather than being part of the syntax, Python's Regular expressions are available via the 're' module. For some reason, I had some trouble figuring out matching groups when I first started using Python's Regular Expressions.

He are examples of extracting capture groups in both Perl and Python.

Lets say we have a string containing a date: '11/14/2007', and we want to capture only the year from this string.

A regex to match this format might be something like this:

[0-9]{2}/[0-9]{2}/[0-9]{4}

We can then put parenthesis around the piece we want to extract (the 4-digit year) to denote a capture group.

So now our regex would look like this:

[0-9]{2}/[0-9]{2}/([0-9]{4})


Perl Example:

$foo = '11/14/2007';

if ($foo =~ m^[0-9]{2}/[0-9]{2}/([0-9]{4})^) {
    print $1;
}

output:

2007

* Note the string we captured ended up in the special variable $1


Python Example:

import re

foo = '11/14/2007'

match = re.search('[0-9]{2}/[0-9]{2}/([0-9]{4})', foo)
if match:
    print match.group(1)

output:

2007

* Note the string we captured ended up in a match object, which can be accessed with the 'group()' method.

#    Comments [6] |
 Tuesday, November 13, 2007

Lintel (Linux/Intel) Dominates Supercomputers

Pretty interesting...

via BetaNews article:

"Twice each year, the rankings of 500 of the world's supercomputers are assessed by the University of Mannheim in association with Berkeley National Laboratory and the University of Tennessee, Knoxville. Their figures are then sorted by tested clusters' maximal observed peak performance, in gigaflops."
"Intel-based processors walked away with one, if not two, lions' shares worth of the Top 500 list, with a staggering 354 total systems."
"460 of the Top 500 systems were running one flavor of Linux or another, including all of the Top 10."
#    Comments [0] |
 Thursday, November 08, 2007

A Quick Guide To GPLv3

The FSF just posted this:

A Quick Guide to GPLv3

A very nice high level overview of the current GPL and what it means.

#    Comments [1] |
 Wednesday, November 07, 2007

Python - Processing Large Text Files One Line At A Time

I want to process some very large text files one line at a time.  Normally when I process text files, I slurp them into a list using the readlines() method.   However, sometimes the files are huge and it isn't feasible or optimal to read the entire content into memory upfront.   In this case, it makes sense to process them one line at a time.

The best solution I can come up with is this:


fh = open('foo.txt', 'r')
line = fh.readline()
while line:
    # do something here
    line = fh.readline()

It doesn't feel very pythonic/idiomatic.  Anyone have a better solution?


Update
Thanks to the comments below, I found a few different ways to do it. The best and most Pythonic way seems to be this:


for line in open('foo.txt', 'r'):
    # do something here

Python file objects support the iterator protocol, so you can just open it and go.   This is the same as using a while loop and calling readline() but more compact.

#    Comments [7] |

Done With Bloglines (So Long, And Thanks For All The Fish)

I was a Bloglines user for several years.  I liked the old-school frame interface and it generally met all of my needs to keep up to date with the hundreds of feeds I read regularly.

Recently, Bloglines released a Beta of their new feed reader.  It uses lots of AJAX and is significantly different than the classic version.  Since this will entail learning a new web-based feed reader, I thought it would be a good time to check out Google Reader.  So I exported my OPML and gave it a try.

First impressions are very good. I like the interface and keyboard shortcuts a lot. So.. now I'm hooked and it is the feed reader I will use going forward.

The only gripe I have is that Google Reader doesn't display the Favicon (little graphic icon) next to each feed.  All you get is a generic blue icon.  What's up with that?

#    Comments [6] |
 Tuesday, November 06, 2007

Extreme Linux Performance Monitoring And Tuning

I just came across a great site with lots of papers related to performance monitoring and tuning for Linux:

http://www.ufsdump.org

One paper I especially liked:

Extreme Linux Performance Monitoring And Tuning

"The purpose of this document is to describe how to monitor Linux operating systems for performance.  This paper examines how to interpret common Linux performance tool output.  After collecting this output, the paper describes how to make conclusions about performance bottlenecks."

Lots of great info!

#    Comments [2] |
 Friday, November 02, 2007

Is Wal-Mart's $200 Linux-based PC "Unacceptably Low End"?

Wal-Mart unveiled its $200 Linux-based PC.

from the Wired blog:

"It has a 1.5 Ghz VIA C7 CPU embedded in a Mini-ITX motherboard, 512MB of RAM and an 80GB hard drive. Normally, this would simply mark it as unacceptably low-end for use with modern software."

I'm not so sure about "unacceptably low-end".  The specs on this PC are substantially better than my home machine.  I have a box at home that I primarily use for web surfing.  It was an old castaway Windows NT machine from an old job.  I run Ubuntu (with Gnome) on it, and it works like a charm.  It's a 933MHz P3 with a 256MB RAM and a dog slow hard drive.

So.. with superior specs, I think the Wal-Mart machine would be a great PC for basic home use.

.. though the "VIA C7" chip scares me a bit.  Any idea how it stacks up against a similar spec'ed Intel or AMD?

#    Comments [0] |
 Wednesday, October 31, 2007

Which Version Of Python Ships With Mac OS X Leopard?

I am not a Mac user, but in case anyone is interested in knowing which version of Python ships with OS X Leopard, the answer is Python 2.5.

#    Comments [0] |

Learn The Ideals And History Of Free And Open Source Software

There are lots of resources available online to learn about Free and Open Source Software.

If you want to understand the essence and ideals of this movement, a great start would be to read the following 4 books. After reading these, you will have a good grasp of the history and philosophy of freedom in the technology world.

#    Comments [0] |
 Wednesday, October 24, 2007

Python - List Comprehensions Leak Variables

One thing to remember when using List Comprehensions is that they "leak" their temporary iteration variable to the outside.

what does that mean?

In the following example, we still have access to 'x' after we run the list comprehension.

foo = ['a', 'b', 'c']
my_list = [x for x in foo]
print x

output:
>> c

This behaviour is different from how a Generator Expression works. We could have wrote the List Comprehension using a Generator Expression like this:

my_list = list(x for x in foo)

Now, the temporary variable we used is not accessible from outside the scope of the expression.

foo = ['a', 'b', 'c']
my_list = list(x for x in foo)
print x

output:
>> NameError: name 'x' is not defined

Note: This is fixed in Python 3000

#    Comments [5] |
 Monday, October 22, 2007

OpenSTA 1.4.4 Release (Open Source HTTP Performance Test Tool)

The OpenSTA team has announced the release of version 1.4.4

OpenSTA is a distributed software testing architecture designed around CORBA.  The applications that make up the current OpenSTA toolset were designed to be used by performance testing practitioners for web load testing.

Info:
http://portal.opensta.org/index.php?name=News&file=article&sid=51

Download:
http://opensta.org/download.html

Congrats and thanks to Bernie Velivis, Daniel Sutcliffe, Jerome Delemarche for making this release possible.




#    Comments [1] |
 Thursday, October 18, 2007

Charts And Graphs - Modern Solutions

To all the chart/graph/plot/visualization weenies out there...
Here is a great overview of some modern charting and graphing technologies.

Some options I will be exploring:

#    Comments [0] |
 Sunday, October 14, 2007

Python - Simple Multithreaded HTTP Load Generator/Timer

This is a module for generating concurrent requests to an HTTP server.  Each thread makes HTTP GET requests to a single URL at the specified interval.  Threads are added over a given rampup time if you want to generate increasing load.  Response times are printed to STDOUT.  Can be used for cursory performance benchmarking or load testing a web resource.

load_generator.py module

sample usage:


#!/usr/bin/env python

from load_generator import LoadManager

lm = LoadManager()
lm.msg = ('www.example.com', '/')
lm.start(threads=5, interval=2, rampup=2)
#    Comments [3] |
 Wednesday, October 10, 2007

Twelve Networking Truths - Good, Fast, Cheap: Pick Any Two

I love reading old RFC's.

One of my favorites is RFC 1925 - The Twelve Networking Truths:

The Fundamental Truths

  1. It Has To Work.
  2. No matter how hard you push and no matter what the priority, you can't increase the speed of light.
    (corollary) No matter how hard you try, you can't make a baby in much less than 9 months. Trying to speed this up *might* make it slower, but it won't make it happen any quicker.
  3. With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.
  4. Some things in life can never be fully appreciated nor understood unless experienced firsthand. Some things in networking can never be fully understood by someone who neither builds commercial networking equipment nor runs an operational network.
  5. It is always possible to agglutinate multiple separate problems into a single complex interdependent solution. In most cases this is a bad idea.
  6. It is easier to move a problem around (for example, by moving the problem to a different part of the overall network architecture) than it is to solve it.
    (corollary) It is always possible to add another level of indirection.
  7. It is always something.
    (corollary) Good, Fast, Cheap: Pick any two (you can't have all three).
  8. It is more complicated than you think.
  9. For all resources, whatever it is, you need more.
    (corollary) Every networking problem always takes longer to solve than it seems like it should.
  10. One size never fits all.
  11. Every old idea will be proposed again with a different name and a different presentation, regardless of whether it works.
  12. In protocol design, perfection has been reached not when there is nothing left to add, but when there is nothing left to take away.
#    Comments [1] |
 Tuesday, October 09, 2007

Forrester Report - Evaluating Functional Testing Solutions

I was tipped off about this report in a recent forum discussion.

Report from Forrester Research:
    Evaluating Functional Testing Solutions (Powerpoint)

It is a nice overview of automated test execution and popular functional testing tools. It also gives an overview of the top tool vendors.

#    Comments [0] |

Google StockQuote Gadget - Back By Popular Demand!

A while back, I created a Google Gadget for displaying stock quotes and daily charts.  It was very popular and I was getting more than 12,000 page views per day.  I was scraping data from Google Finance and then received a takedown notice from Google.  So.. I took down the service that I was running.

Well... it seems the gadget was really popular and I have received lots of emails asking to revive the service.

And now... it's baaaack (using data from Yahoo)

- Add my gadget to your Google Personalized Homepage
- Add my gadget to your own web page


Have at it!

#    Comments [0] |
 Monday, October 08, 2007

Asleep On The Job

This is embarrassing.

After a long day of hacking, I fell asleep at my desk.  This has *never* happened to me at work.

Of course, somebody got a picture of it on their camera phone:

#    Comments [1] |
 Wednesday, September 26, 2007

Python - Tk Graph Example

I found a snippet to draw bar graphs in Python using Tk:
http://www.daniweb.com/code/snippet583.html

The output looks like this:


Here is a modified version that creates a bar graph in a Tk panel:

import Tkinter as tk

def graph_points(seq, width=375, height=325):
root = tk.Tk()
c = tk.Canvas(root, width=width, height=height, bg='white')
c.pack()
y_stretch = 15
y_gap = 20
x_stretch = 10
x_width = 20
x_gap = 20
for x, y in enumerate(data):
x0 = x * x_stretch + x * x_width + x_gap
y0 = height - (y * y_stretch + y_gap)
x1 = x * x_stretch + x * x_width + x_width + x_gap
y1 = height - y_gap
c.create_rectangle(x0, y0, x1, y1, fill="red")
c.create_text(x0+2, y0, anchor=tk.SW, text=str(y))
root.mainloop()

data = (18, 15, 10, 7, 5, 4, 2, 5, 8, 10, 13)
graph_points(data)
#    Comments [0] |
 Monday, September 24, 2007

Give One Get One - I Want My OLPC XO

OK.. looks like I need to plunk down $400 to get an XO.  When you buy one, they will give one away to a child in a developing nation.  Do some good, and get my own OLPC?  Too good to pass up.

For 2 weeks only, starting Nov. 12:  www.xogiving.org

#    Comments [0] |
 Monday, September 17, 2007

Old-School Pair Programming And My Inclination To Become A Tester

My first programming course was as an undergrad freshman in 1993. It was the basic introductory programming class for CS majors. The course was pretty difficult and was a great filter to separate the real CS students from the wannabes. About one-third of the students dropped the class, and out of the remaining two-thirds, many changed majors after this course was complete.

The course was taught using Scheme, with SICP as the text book. We programmed on a VAX cluster with Ultrix (DEC's Unix flavor) as the Operating System. We had to learn the Unix shells, VI, and all sorts of fun stuff to get us up and running.

The computer lab ("the cluster") was a large sterile room with rows of green-screen dumb terminals. I remember our professor told us that the VAX had 128 MB of memory and I was blown away by how huge that was (my rippin' fast PC had 4 MB at the time).

Spending hours in the computer lab was no fun at all. But I was one of the lucky ones. I owned a brand new 486-DX33 PC running Windows 3.1. I had a blazing fast 14.4 bps Zoom modem and could use Procomm Plus to dial into the VAX and program from the comfort of my own dorm room. I also found a Scheme interpreter that ran on DOS, giving me further options to do my work offline.

The programming assignments were brutal. All-nighters were the norm. Collaboration on the assignments was encouraged, but we were all expected to turn in our own original work. I found a fellow student that I got along with well and we decided to work together (unfortunately, I don't even remember his name... all I remember is that he was a lot smarter than me).

So the basic workflow was that we would get together, work out the basics of the assignment, get most of the algorithms working, then each take the code and finish it on our own. Since I had the bad-ass PC, we would work in my room. Two things quickly became apparent: He was a much better programmer than me, but I had a better eye for subtle details and debugging. Eventually we settled into a pattern where he would do the programming and I would look over his shoulder to give advice and input. Every few minutes, he would shoot a copy of the code to my dot-matrix printer. I would grab it, go through it line by line, and mark errors with my red pen and hand-write parts of the code that weren't correct. I would then hand the printout back to him and let him enter the changes. We iterated like this until we had all of the core code working.

For some reason, that instinct for attention to detail and debugging has always stuck with me. Because of that, my career has always been influenced by testing. I am a develop/tester, rather than just a developer. Most of the impact I have had in all of my jobs is from creating test tools and bringing in new ways to test software.

Just an interesting observation. I wonder how many others were naturally drawn to testing as soon as they started writing code?

#    Comments [2] |

Python - Yahoo Stock Quote Module

Last week I wrote a small Python module for retrieving stock prices.

It used screen scraping to get data from Google Finance.  Yahoo offers stock data in a much more digestible form which allowed me to get values without screen scraping and regular expressions.  So, I wrote a module based around this.

This new module is much more comprehensive and exposes a Python API for retrieving all sorts of stock data from Yahoo Finance.

My ystockquote module provides a Python API for retrieving stock data from Yahoo Finance.  This module contains the following functions:

  • get_all(symbol)
  • get_price(symbol)
  • get_change(symbol)
  • get_volume(symbol)
  • get_avg_daily_volume(symbol)
  • get_stock_exchange(symbol)
  • get_market_cap(symbol)
  • get_book_value(symbol)
  • get_ebitda(symbol)
  • get_dividend_per_share(symbol)
  • get_dividend_yield(symbol)
  • get_earnings_per_share(symbol)
  • get_52_week_high(symbol)
  • get_52_week_low(symbol)
  • get_50day_moving_avg(symbol)
  • get_200day_moving_avg(symbol)
  • get_price_earnings_ratio(symbol)
  • get_price_earnings_growth_ratio(symbol)
  • get_price_sales_ratio(symbol)
  • get_price_book_ratio(symbol)
  • get_short_ratio(symbol)

Sample Usage:


>>> import ystockquote
>>> print ystockquote.get_price('GOOG')
529.46
>>> print ystockquote.get_all('MSFT')
{'stock_exchange': '"NasdaqNM"', 'market_cap': '268.6B', 
'200day_moving_avg': '29.2879', '52_week_high': '31.84', 
'price_earnings_growth_ratio': '1.45', 'price_sales_ratio': '5.33',
'price': '28.65', 'earnings_per_share': '1.423', 
'50day_moving_avg': '28.7981', 'avg_daily_volume': '55579700',
'volume': '25330856', '52_week_low': '26.48', 'short_ratio': '1.60', 
'price_earnings_ratio': '28.65', 'dividend_yield': '1.38', 
'dividend_per_share': '0.40', 'price_book_ratio': '8.76', 
'ebitda': '20.441B', 'change': '-0.39', 'book_value': '3.315'}

The module is available here:  http://www.goldb.org/ystockquote.html

#    Comments [11] |
 Friday, September 14, 2007

Python - Stock Quote Module

I just wrote a tiny Python module for programmatically retrieving stock quotes from Google Finance:

The module:


import urllib
import re

def get_quote(symbol):
    base_url = 'http://finance.google.com/finance?q='
    content = urllib.urlopen(base_url + symbol).read()
    m = re.search('class="pr".*?>(.*?)<', content)
    if m:
        quote = m.group(1)
    else:
        quote = 'no quote available for: ' + symbol
    return quote


Sample usage:


#!/usr/bin/env python

import stockquote

print stockquote.get_quote('goog')


Output:


>> 529.56
#    Comments [8] |
 Tuesday, September 11, 2007

WebLOAD Open Source - Ain't So Open Source

"Open Source"

In the words of Inigo Montoya [The Princess Bride]:  "You keep using that word.  I do not think it means, what you think it means."

A few months back, Radview Software announced that they are releasing an open source version of WebLOAD, their web performance and load testing tool.  I was very excited about this and thought it was a fantastic move that would have a big impact in the test tool market.  I am a performance engineer and a huge Free/Open Source Software advocate, so I love to see companies in the space that interests me most come around to embrace openness.

In their press release, Radview stated:

"WebLOAD Open Source, licensed under the GNU Public License (GPL) version 2, is based on WebLOAD, the company's flagship product that is already deployed at 1,600 sites. Immediately available for free download and use, WebLOAD is a commercial-grade open source project with more than 250 engineering years of product development."

Ok cool.. they used the GPL and opened up the whole shebang.  Wow, this company actually "gets it"... right?

Umm.. not quite.

If you look through the source code that is available for WebLOAD Open Source, you will notice that only code for a small subset of the product is available.  In actuality, WebLOAD Open Source is a partially proprietary tool which is marketed as Open Source Software.  The software has significant limitations in functionality and scalability.  The source code which needs to be modified to remove these restrictions is not distributed.  So what we are left with is a crippled version of the tool.

In a recent post to the WebLOAD OS Forum, someone asked to see the source code for "proxynator", which is the recording feature in WebLOAD.

The response from the Forum Admin (Amir Shoval, a Radview employee) was this:

"Currently the source code for the proxynator is not available as part of the open source code of WebLOAD."

This is in direct contradiction to what their website states:

"WebLOAD Open Source introduces a unified script authoring environment for recording, editing and debugging."

When further probed about this, he stated the following:

WebLOAD Open Source is dual licensed:
  1. the WebLOAD Load Engine is totally open sourced and hence is licensed under the GPL
  2. but the complete WebLOAD is still licensed under a proprietary license, which grants free usage in WebLOAD Open Source.

wait.. wait.. WHAT?
I thought the press release said "licensed under the GNU Public License", and "WebLOAD Open Source is a fully functional, commercial-grade performance testing product"?  Nowhere on their website or marketing materials to they talk about this dual licensing and limited availability of source code.

Now, if we look at the End User License Agreement (EULA) that applies to WebLOAD Open Source, it gets worse:

2. License Restrictions. This License does not permit you or any third party to:
(i) modify, translate, reverse engineer, decompile, disassemble (except to the extent that this restriction is expressly prohibited by law) or otherwise attempt to discover the source code of all or any portion of the Software;
(ii) modify, translate or create derivative works of all or any portion of the Software;
(iii) copy the Software (other than a single copy solely for back-up or archival purposes);
(iv) rent, lease, sell, offer to sell, distribute, or otherwise transfer rights to the Software;

OK... so we have an "open source" product that is actually dual licensed, where a large portion of the toolset is proprietary.  And furthermore, by accepting the EULA, you give up all of your rights that were granted under the GPL.  Huh?

So... to reiterate, WebLOAD Open Source is *not* open source.  A subset of it is open source: the [crippled] load engine.  Contrary to what their press release and website says, it contains proprietary components that are released in binary form with no source code.  It is rather disappointing to see a company jump on the bandwagon of open source without respecting the freedom that is supposed to come with it.

In conclusion: Is WebLOAD Open Source currently Open Source?  No
Will WebLOAD Open Source actually become Open Source?  Well.. that's up to Radview

Hey, I'm all for Freedom.  I applaud Radview for any of the code they released under the GPL.  But lets be fair, if you want to call your product open source and reap any benefits that come along with that... you gotta walk the walk.

#    Comments [6] |