goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Wednesday, April 11, 2007

Radview WebLOAD goes Open Source!

OK, this is huge news: www.webload.org

The commercial performance/load test tool market is dominated by large proprietary commercial vendors (HP/Mercury, Borland/Segue, etc). Radview has a nice product called WebLOAD that competes in the space.

As of this morning, Radview announced they have released WebLOAD OS, an open source version of WebLOAD. It is full-on GPL licensed (no fake open source). I already browsed their source tree. They have a Subversion repository.. code is in C and C++,

The Open Source performance/load test tool market doesn't offer many choices. Currently the most popular tools are JMeter and OpenSTA

This will be exciting. I wonder how well Radview will deal with the community on this. Though if it's not good, GNU GPL certainly allows forking :)

more to come...

#    Comments [3] |
 Tuesday, April 10, 2007

Python and IEC - Stupid-Simple Windows Browser Automation

I have been using IEC lately for automating repetitive administrative tasks within my company:

IEC.py - Automating Internet Explorer with Python

IEC is a simple library with a nice API for automating an IE browser. I found it simple to work with for basic automation needs. I have also used it as the core of a small UI testing framework.

From Mayukh Bose:

IEC is a python library designed to help you automate and control an Internet Explorer window. You can use this library to navigate to web pages, read the values of various HTML elements, set the values of checkboxes, text boxes, radio buttons etc., click on buttons and submit forms.

Yeah I know.. pretty lame it only works with IE, but in the environment I was working in, the applications ran on *IE Only*.


A personal story:

My company is very analytical and detail oriented when it comes to tracking/planning project resource allocation. We track all sorts of projections, budgets, resources, etc. The workflow is basically: some business guys (no idea what they actually do) take data from some reports and enter them into some arcane hosted tracking software. This is done by entering copious amounts of data into web form after web form. Then they submit the form to run a report. Once that is finished, they cut & paste the data into MS Excel. Then they take the Excel spreadsheet and follow some wild sequence of copying, cutting, pasting, converting, running macros, graphing, etc. At the end of this, a few images are produced so some wizz-bang graphs can go into a monthly Powerpoint... wow.

So... I wrote a Python script that takes their input data, drives a web browser to do the report, screen scrapes the result, processes it, generates some fancy graphs with Matplotlib, and presents a web page with the results.  End result: Converted a multi-hour manual process into the click of an icon and 20 seconds of processing.

I could have done this with HTTP directly, but this UI automation technique made it very quick to develop; and it looked impressive ("whoa it's like.. making my browser move on its own").


To use IEC, you need the Python for Windows Extensions. If you use the ActiveState Python distribution, these are already included.

I used to use ActiveState Python for Windows programming (because I was a big fan of ActiveState Perl, where the installer and PPM package manager rocked). I recently spent close to an hour getting SSL (HTTP) to work with ActiveState.  I couldn't get it to work so I ditched it for the standard Python distro.


--
Happy Hacking.

#    Comments [4] |
 Monday, April 09, 2007

Geo Location Mashup - Python, Yahoo Maps AJAX API

Mapping User Metro Concentration by IP Address

I just posted this: http://www.goldb.org/geo_maps

It is a tutorial/example showing how to create a geolocation mashup by generting HTML/JavaScript code from a Python script.  The resulting code is an HTML page with embedded JavaScript that you can open with your browser.  It works with the Yahoo Maps AJAX API to plot markers at specified locations.  I also explain how this technique can be used to create a [near] real-time map of user concentration based on IP addresses.

... feedback welcome.


It generates cool AJAXy eye-candy like this:

and this:

Since I use the AJAX control, the rendered map has a zooming, panning, dynamic, tiled interface.  Pretty Slick.

#    Comments [1] |
 Sunday, April 08, 2007

Got Scalability? Compute/Storage Grids

It seems to be all about size and massive buildouts of compute and storage grids these days... both commercially (see Google, Microsoft, Amazon, Yahoo, Sun, IBM, HP, Oracle, etc) and in academia.  The interesting thing is that the technology used is good for both distributed and centralized (tight clusters of distributed nodes) computing. Processing and storage can be pushed to the edges, or gathered centrally... it's up to you... the mechanisms are there...  it's all converging.

The world is becoming a massive digital fabric.

I'm just fascinated by the scale of the data centers, operations, and services that are being deployed.

Article from NY Times last summer (June '06):

"The best guess is that Google now has more than 450,000 servers spread over at least 25 locations around the world. The company has major operations in Ireland, and a big computing center has recently been completed in Atlanta. Connecting these centers is a high-capacity fiber optic network that the company has assembled over the last few years.

Google has found that for search engines, every millisecond longer it takes to give users their results leads to lower satisfaction. So the speed of light ends up being a constraint, and the company wants to put significant processing power close to all of its users."

Wow. Now we do we understand why our systems must scale?


Related:
http://www.tbray.org/ongoing/When/200x/2006/05/24/On-Grids
http://www.globus.org/toolkit/
http://www.sun.com/service/grid/
http://www.amazon.com/gp/browse.html?node=16427261
http://www.amazon.com/gp/browse.html?node=201590011

#    Comments [0] |
 Monday, April 02, 2007

I Need Better Web Hosting

My website and blog were down most of today, after getting pounded with traffic from Reddit Programming.

The day started great... I already had 600 visitors today when I woke up for work at 7AM.  Then one of my posts started floating near the top of Reddit.  My server couldn't handle the traffic and soon fell over.  It didn't come back online until just now.

Granted, I am using ultra cheap shared hosting, so this shouldn't come as a huge surprise.  However, I am now looking for some hosting that is slightly more reliable.  Aside from the the heavy traffic today, my site goes up and down intermittently all the time anyways.

Can anybody recommend some good cheap web hosting?  Basically I am looking for about 1 gig of storage and at least 3 gigs of transfer per month.  I understand that reliability and availability are something one must pay for (and usually mutually exclusive with shared hosting).  So.. I would sacrifice availability for price, as long as availability and reliability were decent.

I need both Windows (with ASP.NET 2.0) and Linux (with Python/Perl) hosting. These can be from a single provider, or with 2 different providers.  I have used lots shared hosting services over the years and all of them generally suck.

.. any good hosting recommendations?

#    Comments [1] |
 Sunday, April 01, 2007

Massive Concurreny with PyPy Stackless

(via)
PyPy had its 1.0 release recently.

Now, This looks *really* interesting:

PyPy Stackless

PyPy can expose to its user language features similar to the ones present in Stackless Python: no recursion depth limit, and the ability to write code in a massively concurrent style. It actually exposes three different paradigms to choose from:
  • Tasklets and Channels
  • Greenlets
  • Plain Coroutines
#    Comments [0] |

One Laptop Per Child - More Prototype Pics and Info

I posted some pics of the latest OLPC prototypes a few weeks ago.  Well... I got to see them 2 weeks in a row; so here are some more pics of the machine up close.

... Seems the whole "hand crank" idea is gone.  There is now a pullchord on the external power supply with a 10:1 ratio (1 minute of pulling = 10 mins of computing) for manually recharging power... The keyboard is tiny and soft feeling.  The screen is small but is very viewable in direct light without backlighting (which is probably the #1 power drain on laptops).

OLPC rocks!

Me geeking out:

Old school meets new school...
Gerald J. Sussman (yes, the MIT Scheme guy) playing with the latest OLPC prototype:

Closeups:


.. these machines run a scaled down version of Fedora Linux that is loaded with Python applications.

-Corey

#    Comments [3] |
 Saturday, March 31, 2007

Digital Ethnography

(I can't even tell you how many times I've watched this video since it came out a few months ago)

For posterity...

Professor Michael Wesch:

teaching the machine.
the machine is us.

we'll need to rethink a few things...
copyright
authorship
identity
ethics
aesthetics
rhetorics
governance
privacy
commerce
love
family
ourselves

- The Machine is Us/ing Us

#    Comments [0] |
 Thursday, March 29, 2007

Python - Remove Duplicate Items From a Sequence

Say you have a sequence like:

[1, 1, 2, 2, 2, 3, 4, 4, 4]

... and you want a sequence containing all the unique items (remove duplicates) like:

[1, 2, 3, 4]


Here is a function to do it:

def remove_dups(seq):
    x = {}
    for y in seq:
        x[y] = 1
    u = x.keys()
    return u


or a one-liner:

u = [x for x in seq if x not in locals()['_[1]']]



update: in the comments below, some other ways were suggested..

with 'set'.. like this:

u = list(set(seq))

or with a dictionary.. like this:

u = dict.fromkeys(seq).keys()
#    Comments [4] |
 Wednesday, March 28, 2007

Microsoft IIS - Welcome to Last Decade (Performant CGI)

Wow..
CGI will run well on IIS
Rails will run well on IIS.

Rob Conery on running Ruby on Rails (or other CGI based platforms) on IIS:

"Rails works using CGI - basically an executable that gets run each time a request comes into a web site. Most of the frameworks out there do NOT support multi-threading, so each time a request comes in that requires anything dynamic, CGI is "instanced" and executed. If you have a lot of requests at once, this isn't really a good thing. Now some servers are built to mitigate this (Apache, Lighttpd, etc); IIS is not.

... I would imagine that in the next 6 months we'll see a great addition to IIS 6 and 7 for all the CGI-enabled platforms out there."


hmm.. good to hear. (seriously)
but damn... weren't we doing this 10 years ago with Perl/Apache? :)

#    Comments [2] |
 Sunday, March 25, 2007

Real World Web Scalability

(via reddit programming)

Very lengthy overview of performance and scalability issues for web systems by Ask Bjorne Hansen.  This presentation covers a vast range of information:

Real World Web Scalability  (warning large PDF)

The takeaway?
Create horizontally scalable distributed systems..  always.

#    Comments [0] |

Scalability Comparison of Virtualization Tools

A report about scalability of virtualization techniques:

SCALABILITY COMPARISON OF 4 HOST VIRTUALIZATION TOOLS (QUETIER B / NERI V / CAPPELLO F)

"Virtualization tools are becoming popular in the context of Grid Computing because they allow running multiple operating systems on a single host and provide a confined execution environment.  In several Grid projects, virtualization tools are envisioned to run many virtual machines per host.  This immediately raises the issue of virtualization scalability."

4 types of virtualization tools are discussed in the context of scalability:

  • Processor Virtualization
  • Kernel Replication
  • Operating System Virtualization
  • Resource Virtualization
#    Comments [0] |

Operating System Genealogy - Timelines

Sweet...
The history of entirely too many operating systems in way too high resolution.
... but great fun for OS geeks.

Operating System Genealogy:

#    Comments [0] |
 Saturday, March 24, 2007

Free Software Foundation - 2007 Associate Member Meeting

The Free Software Foundation's annual Associate Members Meeting is always an inspiring event for me.  It serves as a sort of State of The Free Software Union; where members gather to discuss ideas and listen to speakers.  Most of the FSF Board of Directors were there to speak.

I attended the meeting today (Saturday 03/24/2007) for the 4th time in the past 5 years.

It was held at MIT (Cambridge, Massachusetts):

 

I arrived during Joshua Ginsberg's (FSF Senior System Administrator) speech on “FSF Systems Administration”.  He gave an overview of some of the systems and internal work going at the FSF offices. Some highlights:

  • FSF now runs LinuxBIOS on new Tyan servers for FSF and GNU Project resources.  They will be contributing documentation and information to help others install a Free BIOS.
  • New and much improved FSF network infrastructure and connectivity for FSF/GNU hosted resources.
  • FSF is switching from Zope to Django (both Python powered!) for web application development...  Lots of new stuff coming soon, including contributions back to the Django community.

Next up was Brett Smith, the new GPL Compliance Engineer at the Compliance Lab.  One thing Brett mentioned was that GPL license violations are pretty much kept secret and not disclosed to the community.  FSF prefers to negotiate with violators and talk them into compliance behind closed doors.  I'm not sure I agree with this practice.  I asked Richard Stallman about this during his Q&A Session... stating that I thought this information should be released to the public.  I don't see it as an overly aggressive move and I think publicly outing companies that are GPL violators would be a good way to give exposure to Free Software and help curb future violations.  RMS doesn't quite agree with my standpoint, but he asked some FSF staff to explore generically publicizing more types of violations.

Next was Gerald Jay Sussman, speaking about "Robust Design". Gerry was the author of my first Computer Science book, the venerable Wizard Book (SICP), and one of the authors of Scheme (a programming language dialect of LISP).  I was able to thank him for the pain and enlightenment his texts brought me during my CS studies.

Gerry is a complete madman when he gives presentations.  Forget the powerpoints and fancy presentation gear... he just slings around old school projector slides at blazing speed.  Admittedly, the stuff he talks about is far over my head.  I'm just a lowly computer programmer.  This guy has been at MIT since 1964 studying the cutting edge of computer science, mechanics, and electrical engineering. Watching him ease through functional programming and Scheme code is a little intimidating, but the entertainment value alone is worth it.

OK.. now the person most people came to see speak... the GNU Project founder, FSF President, former MIT AI Lab hacker, Emacs/GCC/GDB author, Chief GNUisance, and St. Gnucius himself... Richard Stallman:

RMS was in a surprisingly jovial mood. He is usually sorta moody and prone to outbursts.  I saw him shout at, and absolutely berate Larry Lessig a few years ago in front of a large audience at an FSF meeting.  However, today he was in fine form and gave his speech "Free Software and Software Patents".  He delivered well and really punched home the point about the absurdity of patents when applied to software.

After RMS was Eben Moglen, FSF Chief Council, Columbia Law Professor, and founder of the Software Freedom Law Center.  Eben is my favorite speaker.. bar none.  He speaks with passion and insight that is truly inspiring to watch.  He gave his "After GPLv3" speech.  It was an update on the current state of the GPL revision process.  Stallman and Moglen are leading the massive effort to complete GPLv3.  I am very thankful that people like Eben Moglen are on the front lines protecting our freedom.

Eben Moglen:

Bruce Perens was in attendance: 

He seems to have taken a very strong interest in the GPLv3 recently.

... and of course there were the obligatory FSF activist signs:

RMS listening to Moglen's speech:


Now... everyone... go join the FSF and become an Associate Member.
... or at least continue your Free Software hacking and advocacy.


Goldberg... out!

#    Comments [0] |
 Friday, March 23, 2007

Python - Creating Bar Graphs with Matplotlib

Matplotlib is an open source 2D plotting library for Python.  It is very impressive and robust, but the API and documentation is maddeningly difficult to follow.

Here I have provided a function that will create a bar graph [as a png image] from a Python dictionary using the Matplotlib API.

It will auto-size the bars and auto-adjust the axis labels for you. All you need to pass into it is a dictionary data structure (and optionally a graph title and output name).


We start with a Python dictionary like this:

{'A': 70, 'B': 290, 'C': 130}


... and the function will use Matplotlib to create a graph like this:


Here is a sample script that uses my function:


#!/usr/bin/env python

from pylab import *

def main():  
    my_dict = {'A': 70, 'B': 290, 'C': 130}
    bar_graph(my_dict, graph_title='ABC')


def bar_graph(name_value_dict, graph_title='', output_name='bargraph.png'):
    figure(figsize=(4, 2)) # image dimensions  
    title(graph_title, size='x-small')
   
    # add bars
    for i, key in zip(range(len(name_value_dict)), name_value_dict.keys()):
        bar(i + 0.25 , name_value_dict[key], color='red')
   
    # axis setup
    xticks(arange(0.65, len(name_value_dict)),
        [('%s: %d' % (name, value)) for name, value in
        zip(name_value_dict.keys(), name_value_dict.values())],
        size='xx-small')
    max_value = max(name_value_dict.values())
    tick_range = arange(0, max_value, (max_value / 7))
    yticks(tick_range, size='xx-small')
    formatter = FixedFormatter([str(x) for x in tick_range])
    gca().yaxis.set_major_formatter(formatter)
    gca().yaxis.grid(which='major')
   
    savefig(output_name)


if __name__ == "__main__":
    main()


enjoy.

-Corey

#    Comments [6] |
 Thursday, March 22, 2007

Python - Convert Date/Time to Epoch

I'm not sure why, but this took me forever to figure out; so I'm posting it here for others...

Let's say you have a string representing a date and a time and you want to convert it to epoch time (# secs since the epoch).

First you will need to create a pattern for your time format, using time format directives.

For example, the pattern for:

'2007-02-05 16:15:18'

Would be:

'%Y-%m-%d %H:%M:%S'

You can then convert it to epoch like this:

int(time.mktime(time.strptime('2007-02-05 16:15:18', '%Y-%m-%d %H:%M:%S')))


Now in a script:

#!/usr/bin/env python

import time

date_time = '2007-02-05 16:15:18'
pattern = '%Y-%m-%d %H:%M:%S'
epoch = int(time.mktime(time.strptime(date_time, pattern)))
print epoch
#    Comments [0] |
 Wednesday, March 21, 2007

Sun Giving GNU Credit

RMS has been on the "GNU/Linux" naming convention rant for years; urging people to give the GNU Project and the legions of contributors credit they deserve.  Afterall, the bulk of Free Software OS userland is made of GNU contributions.

One might think that a company like Sun Microsystems wouldn't grok this concept, since most GNU/Linux distributions themselves don't.


However, some folks at Sun definitely get it:

Tim Bray - Director of Web Technologies (talking about Ian Murdoch joining Sun):

"As of this weekend Ian wasn’t even on the payroll yet and was already in in a peppy little email debate over when to say “Linux” and when to say “GNU” and when to say both."

Simon Phipps - Chief Open Source Officer:

"the combination of the GNU operating system pioneered by Richard Stallman with the inclusive development delivered around the Linux kernel by Linus Torvalds has brought a new life and energy to the extended family tree of Unix. The popularity of GNU/Linux bears testament to the vision and skill Stallman and Torvalds exhibit."
#    Comments [0] |

New O'Reilly Book About Web Performance - Coming Soon

(Note to self: buy this book when it comes out)

Steve Souders (Chief Performance Yahoo! at Yahoo) is writing a book for O'Reilly about web performance:

High Performance Web Sites


It's great to see Performance continue to gain exposure.

-Corey

#    Comments [0] |

Google Summer of Code 2007 - No Perl for You

The Perl Foundation won't be involved in Google Summer of Code 2007.

Bill Odom:

"The short version: We submitted an application to be a mentoring organization, but we weren't accepted."

However, even without the Perl community represented, the list of mentoring organizations and projects is really good!

#    Comments [0] |
 Monday, March 19, 2007

Making Applications Scalable With Load Balancing

I am in the process of tuning a large distributed system; using an F5 BIG-IP Load Balancer to distribute traffic.

Willy Tarreau has a very good overview of load balancing options:

Making applications scalable with Load Balancing

#    Comments [0] |
 Sunday, March 18, 2007

Linux - Symmetric Multiprocessing

Tim Jones gives a brief overview of SMP and discusses working with the Linux kernel:

Linux and symmetric multiprocessing


Tim Jones:

"As processor frequencies reach their limits, a popular way to increase performance is simply to add more processors. In the early days, this meant adding more processors to the motherboard or clustering multiple independent computers together. Today, chip-level multiprocessing provides more CPUs on a single chip, permitting even greater performance due to reduced memory latency.

You'll find SMP systems not only in servers, but also desktops, particularly with the introduction of virtualization. Like most cutting-edge technologies, Linux provides support for SMP. The kernel does its part to optimize the load across the available CPUs (from threads to virtualized operating systems). All that's left is to ensure that the application can be sufficiently multi-threaded to exploit the power in SMP."
#    Comments [0] |

Going Transactionless - Scalable Data Tiers

Dan Pritchett posted his excellent "How eBay Scales" presentation a few months back.

It is a great look into a real-world massive distributed system and the evolution of its scalable architecture.  One interesting thing to notice is that eBay is a transactionless environment (meaning it doesn't use Database Transactions).

I have always seen the data layer as the difficult part to scale.  Separating logic from data and working in a purely transactionless environment can mitigate this issue.

Martin fowler commented on this today:

"The rationale for not using transactions was that they harm performance at the sort of scale that eBay deals with. This effect is exacerbated by the fact that eBay heavily partitions its data into many, many physical databases. As a result using transactions would mean using distributed transactions, which is a common thing to be wary of.

This heavy partitioning, and the database's central role in performance issues, means that eBay doesn't use many other database facilities. Referential integrity and sorting are done in application code. There's hardly any triggers or stored procedures."
#    Comments [0] |
 Saturday, March 17, 2007

OLPC Machine Up Close at BarCamp Boston 2

I was at the BarCamp2 "unconference" today at MIT's Stata Center and got to see the OLPC machine  ... very cool.

Chris Ball had a prototype on hand.  Chris heads One Laptop Per Child's performance testing work.  I was able to chat with him for a bit and take some pics:

One thing that struck me was the size of the laptop. It is really very small.  The keys are much smaller than typical laptop keys (designed for children's hands).

Chris with the laptop:


This project fascinates me.  I can't wait for the abundance of future hackers.

#    Comments [0] |

Python3000 vs. Perl6 ... Wanna Bet?

Perl6...
Python3000...

Both are redesigns of very popular dynamic/scripting languages.  Both have very strong, though very different, communities supporting them.

Out of the gate, Perl's plans were much more ambitious, including a new generic virtual machine.  Python's plans were more pragmatic; more of a language cleanup than a drastic redesign.

Perl 6 was officially announced nearly 7 years ago and I don't see a stable production release coming *any* time soon.  On the other hand, the idea of Python3000 was sorta tossed around for a while and swung into gear 2 years ago.

Guido (Python's BDFL) has been spearheading the effort, whereas Perl's leadership structure is much more anarchic (Where is Larry Wall these days?).  Guido has been very transparent and kept the community aware of his worries.

Some people saw this as a slippery slope...

Chromatic:

"Language redesign is difficult, isn’t it?  Once you start challenging base assumptions, you find that a lot of your previous conclusions are shaky, and good luck reigning in blue-sky ideas!

See you in 2007… or 2008… or 2009.

Best wishes,
a Perl 6 hacker"

I disagree..

I'd bet anyone money that I will be hacking on a stable release of Python3000 long before I'm using a stable version of Perl6... any takers?


(disclosure: I have written lots of code in both Perl and Python and am a fan of both)

#    Comments [6] |
 Friday, March 16, 2007

JakeBrake's Level of Geekdom

Impressive...

JakeBrake is a true geek:

"I am so technical that:
  • I routinely do unit-level performance/timing tests on Cialis to see if will time-out at 4 hours.
  • For meals I eat only donuts and hotdogs; arranging them on my plate as "bites" in patterns of ones and zeros.  I use a burnt hotdog as a signed bit."


If you are into testing and performance, Sounds of Jake Braking blog is a great read.

#    Comments [0] |