goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Wednesday, May 23, 2007

StockQuote Google Gadget - Usage Stats

A few months ago I deployed my StockQuote Google Gadget, which is used for retrieving stock quotes and daily price graphs.

Behind the gadget is a remote .NET/C# service I created which scrapes stock quotes and charts from Google Finance.

You can see it and play with the demo: cgoldberg.googlepages.com

- Add my gadget to your Google Personalized Homepage
- Add my gadget to your own web page

I have been logging usage stats; just to see how many people are using it and how many transactions it is doing. Stats have been collected for about 4 months:

12000 transactions per day and growing fast.. yikes.


Update: My StockQuote gadget is no longer in service.  I Received a takedown notice from Google Finance on 05/23/2007.   umm...  sorta saw that comin' :)

#    Comments [0] |
 Monday, May 21, 2007

Zed Shaw's Statistics Rant

Programmers Need To Learn Statistics Or I Will Kill Them All

Zed Shaw:

"I have a major pet peeve that I need to confess. I go insane when I hear programmers talking about statistics like they know sh-t when it’s clearly obvious they do not. I’ve been studying it for years and years and still don’t think I know anything. This article is my call for all programmers to finally learn enough about statistics to at least know they don’t know sh-t."
#    Comments [1] |
 Thursday, May 17, 2007

RESTful Web Services - 10 Years of 'Programmable Web' Books

I just got the RESTful Web Services book (Leonard Richardson & Sam Ruby, O'Reilly, 2007) in the mail today.  I've only read the beginning, but so far it is great.  In fact, it brings me back to when I first started working with the "programmable web".  I got into the programmable web back when the web was only a few years old.  I spent years doing performance/scalability testing and tuning for large Web 1.0 applications and bizarre custom Web API's (think huge financial services rushing to get online).  Building tools to run realistic workloads through a system involves writing custom clients to simulate real user/browser interaction.  This is pretty ugly stuff when you are dealing with an application that was designed with only humans in mind (AKA all).  It involves lots of HTTP protocol level work.. screen scraping.. protocol sniffing and analyzing.. requests.. header mangling.. cookie handling.. redirects.. authentication.. session information parsing.. etc, etc.

Application simulation is pretty messy work.  There is no simple API to hide behind; you had to figure out what the API was for yourself.  See.. *every* web application has an API.  Though it might have been designed by accident.  This allowed me to see first hand how developers and frameworks butchered the use of the "Web" as a platform.  Staring at naked HTTP let me see every little bit of the hairball underneath.  Alas, any standardization around web services (or the concept to be officially named) was far off.

A friend (bearded Perl hacker) let me borrow a book to show me how Perl can do this cool web stuff:  Web Client Programming with Perl (Clinton Wong, O'Reilly, 1997).  This book helped me build my first web clients to do application simulation and testing.  There wasn't a ton of documentation at the time to do this sort of thing, so i relied heavily on this book.

So now.. 10 years later..  the Web has changed..  it has morphed into *the* distributed platform..  it is becoming organized.

As I flip through Restful Web Services, it all just looks right..  REST looks right..   It is simple..  it is HTTP..  it is all the guts I already know.  It almost feels like a sequel to my old favorite:

I have traded Perl for Python as my preferred scripting language the past few years, but I am still building simulators, web clients, and virtual users. I am excited to work on some new stuff in this area.

#    Comments [0] |
 Wednesday, May 16, 2007

WebInject - Open Source Web Service Testing Tool Gets High Marks

InfoWorld article:

Three open source Web service testing tools get high marks

Rick Grehan of InfoWorld reviewed 3 popular open source tools for testing web services.  Rick is a contributing editor of the InfoWorld Test Center.  One of the tools he reviewed was WebInject (which I wrote).

"In this roundup, I examined three tools that purport to verify that your Web services do what they are supposed to do, that they resist graceless failure, and (in some cases) that they conduct themselves with efficiency. The tools are soapUI, TestMaker, and WebInject. All are open source, and are available for free download and incorporation into your next Web services project."
My tool (WebInject) scored pretty well in the comparison.

From the article:

WebInject

WebInject is a super-lightweight testing tool that can automate the testing of both Web services and Web applications. In fact, WebInject's ability to test XML/SOAP Web services appears to be a recent addition to the tool, as earlier versions could not readily handle the SOAP protocol.

Written in Perl, WebInject is primarily a command-line tool, though its author provides a thin Perl/Tk user interface that at least simplifies the execution of tests for those unwilling to spend too much time at the command prompt. If you're not familiar with Perl, don't panic. WebInject is built so that you can construct your tests without having to touch so much as a byte of Perl code.

WebInject is really an execution and reporting engine. Unlike the other tools, it has no IDE-style user interface, so tests must be written in an editor outside of the WebInject UI. This gives WebInject a less professional feel, but doesn't hamper the tool. I envision users of WebInject having directories filled with text files of various test “templates.” To add a new test case, the user just pops open his or her favorite editor, does some cutting, some pasting, and a bit of tweaking to alter the template to fit the specific circumstance, and ba-ding!, you've got a new test case.

...

In essence, a WebInject “project” is nothing more than an XML file filled with a set of elements strung one after the other. WebInject's simple structure lets you build tests with amazing rapidity. You must, however, have a moderately good understanding of the mechanics of SOAP protocols as well as a tool that lets you generate and capture HTTP/SOAP requests and responses. You'll need the requests to build the POST body and the responses so that you can create proper “verifypositive” and “verifynegative” regular expressions to check for success or failure. I used the Web Service Toolkit add-on for Eclipse to grab requests and responses for WebInject; once I had gotten the hang of it, I fell easily into the groove of building test cases.


Criteria Score Weight
Documentation 8 20%
Features 8 20%
Scalability 8 20%
Ease-of-use 8 15%
Portability 9 15%
Value 9 10%

Review Score:
Very Good 8.3

Cost:
Free download - open source

Platforms
Any platform that runs Perl or has a Perl interpreter installed

Bottom Line:
Much less feature-rich than the other tools, the lightweight WebInject nonetheless bolts out of the starting gate. If you need testing that will be off the ground and flying in minutes, reach for WebInject. On the other hand, it has far fewer capabilities than the other two products in this test, and unless you want to hack the Perl code, WebInject's feature set is pretty much what you install.


visit www.webInject.org
for more of my tools, visit: www.goldb.org

#    Comments [0] |
 Tuesday, May 15, 2007

Litigate vs. Innovate: Free Advice for the Litigious

Jonathan Schwartz (CEO of Sun Microsystems) posted an excellent article describing Sun's stark choice of how to re-invent itself.  They stepped towards Free software and embraced Open Source.  Microsoft is taking a much different stance.  They are asserting patent claims over many pieces of the GNU/Linux system.

Jonathan gives some great advice in his Free Advice for the Litigious:

"No amount of fear can stop the rise of free media, or free software (they are the same, after all). The community is vastly more innovative and powerful than a single company. And you will never turn back the clock on elementary school students and developing economies and aid agencies and fledgling universities - or the Fortune 500 - that have found value in the wisdom of the open source community. Open standards and open source software are literally changing the face of the planet - creating opportunity wherever the network can reach."

Can you hear us *now*?

#    Comments [0] |
 Friday, May 11, 2007

Mnesia - Scalable Data Persistence in Erlang

SlideAway - There is a world outside of Ruby on Rails:

"Who needs Oracle/Mysql when you have Mnesia, a free, distributed, in memory database ? The ability to store native Erlang structures out of the box is so liberating: suddenly the need for your object-database mapping layer almost vanishes (well, not 100% to be fairly honest, but a big chunk of it: no need to create a 1-to-n relationship or a n-to-n relationship and a mapping table in many simple cases)

Not to mention that Mnesia supports table replication and is fully distributed, with the ability to add new 'nodes' on the fly. All of this out of the box ! (did I mention it was free too ?) This makes scaling up almost a joke. Compare this to the usual nightmares (and cost) of trying to implement a distributed Mysql/Oracle."


Awesome.

#    Comments [0] |

Mike Shaver on New RIA Tools vs. Web Standards

Via The high cost of some free tools (Mike Shaver):

"If you choose a platform that needs tools, if you give up the viral soft collaboration of View Source and copy-and-paste mashups and being able to jam jQuery in the hole that used to have Prototype in it, you lose what gave the web its distributed evolution and incrementalism. You lose what made the web great, and what made the web win. If someone tells you that their platform is the web, only better, there is a very easy test that you can use:

When the tool spits out some bundle of shining Deployment-Ready Code Artifact, do you get something that can be mashed up, styled, scripted, indexed by search engines, read aloud by screen readers, read by humans, customized with greasemonkey, reformatted for mobile devices, machine-translated, excerpted, transcluded, edited live with tools like Firebug? Or do you get a chunk of dead code with some scripted frills about the edges, frozen in time and space, until you need to update it later and have to figure out how to get the same tool setup you had before, and hope that the platform is still getting security and feature updates? (I’m talking to you, pre-VB.NET Visual Basic developers.)"

All hail "View Source".

#    Comments [0] |
 Thursday, May 10, 2007

Sticky ToolLook - Tools and System Performance with Corey Goldberg

I was recently interview by Joseph McAllister for his Sticky ToolLook newsletter.  Sticky ToolLook is an extension of StickyMinds.com and Better Software magazine.  I mostly talk about Performance testing and tools.

The article can be found here: http://www.stickyminds.com/stickytoollook/index.asp?cd=5/10/2007


Transcript:


A Word with the Wise:
Tools and System Performance with Corey Goldberg
by Joseph McAllister

Corey Goldberg is a Boston-based software engineer who focuses on performance engineering and tool development. He also contributes to open source projects and has developed some of his own, such as WebInject. I spoke with him earlier this year about his passion for the craft of software tools.

Joseph McAllister:
What makes you passionate about performance test tools?

Corey Goldberg:
I am passionate about system performance, so tools are an integral part of that. Performance is an interesting and diverse space. It touches technology in so many ways, and the skill set it requires allows me to be close to several different technical areas at once: development, testing, analysis, design, operations, etc.

The thinking becomes pervasive, though. For example, last night I was standing in line for movie tickets, and all I could think about was how they could improve the queuing system to get better sales throughput. I regularly have conversations with my colleagues about service performance and input/output contention at our local burrito joint.

JM:
Is there a particular type of performance tool that is more "fun" to use? Is there a type that tends to offer more results?

CG:
The various types of tools all work together to form your full tool set or test suite. They all have their own fun parts. Load generation can be complex, as it involves software development alongside workload modeling. But it is also the fun part where you get to slam load through a system and watch it react.

The most satisfying tools to build are analysis and monitoring tools, especially tools with real-time monitoring and graphing. This enables you to look inside your test runs or production system and actually see what is happening in real time. Complex data sets and metrics collected from deep within a system are transformed into informative graphs as things happen. That is pretty exciting to work with.

JM:
Is there a clear division between the commercial tools you've used and the tools you've written? Do you prefer one or the other?

CG:
Yes and no. First, for terminology, I like to think of things in terms of proprietary vs. free tools. Proprietary tools tend to force you into a certain pattern of use and often don't provide the flexibility to change or extend them in ways you might want.

Lately I have been building a lot of my own tools that work alongside some commercial tools. I recently developed a reporting and analysis suite that replaced the analytics in a commercial tool we were using.

Commercial tools also offer some rich features that are sometimes not feasible to re-create in a reasonable amount of time. So you have to remember that building your own tools is only worthwhile if it is cost effective.

JM:
Describe your open source test tool, WebInject.

CG: WebInject is a test tool that I developed in Perl that is used for functional testing of Web application/services and ad-hoc monitoring of HTTP response times. It can run as its own GUI application with real-time graphing capabilities or can be integrated as a plug-in with other tools.

I was doing this type of stuff in various scripts for years, so I packaged a lot of it together and made a more generic interface that can be used across a variety of projects. I thought others might be interested in it, so I setup a SourceForge project and released it in January 2004.

The basic concept is that you define test cases in XML files that are fed through WebInject and executed against your system under test. WebInject provides a basic harness/framework that includes HTTP transport, parsing, cookie handling, authentication, SSL, etc. It gives you real-time response timing as well as functional verification using regex-based content verification and HTTP status codes.

I spend all of my time on newer tools these days, but I still keep on top of WebInject enough to facilitate others' using it and posting patches/updates to it. Other test tools have progressed a lot in the past few years, so I am sure there are lots of new options for doing this type of testing. Oddly, WebInject has become somewhat entrenched in monitoring systems. Most of the users lately seem to be people running Nagios (an open source monitoring system) that need an intelligent Web plug-in/agent.

JM:
What is your favorite element of creating and distributing an open source software tool?

CG:
Community feedback feels really good. I like sharing and collaborating. The feedback is also tremendously useful in terms of discovering bugs and offering suggestions, advice, or even patches of working code. I care about my craft, and I realize the only way to advance is through open collaboration.

I am also pretty influenced by the free software movement and do some volunteer work with the GNU Project. I have some core beliefs about the ethics of software freedom. Creating and distributing my own GPL-licensed software is my own little way to help that cause.

#    Comments [0] |
 Wednesday, May 09, 2007

PerfLog - Performance Analysis Tool for Web Server Logs (Python)

I wrote a small tool that I have found useful.  It is a Python script that parses and analyzes web log files (in W3C Extended Log File Format).  It creates and HTML report with data and PNG images showing graphs of things like: request throughput, error rates, HTTP method distribution, content type distribution, time-series, etc.

Many log parsing/analysis tools exist, but I was looking for something more specific to Performance than something a webmaster would want to look at.

The script is pretty basic. It was very useful for my own needs, but others might want to modify it.  If anyone has good suggestions to add to it, I am willing to enhance it at some point (or just grab my code and hack it yourself if you know Python).


Project Home

Features

  • Produces metrics and graphs from web logs (W3C Extended Log File Format)
  • Useful during performance testing and analysis
  • Output is created in XHTML/CSS with embedded PNG images
  • PerfLog is written in Python and uses Matplotlib for graphs and plotting

License

Project Info

Requirements

  • Python 2.4+
  • Matplotlib (requires Numeric or Numpy)

Platforms

  • Cross-Platform.  PerfLog will run on any system that supports Python and Matplotlib.
#    Comments [1] |
 Thursday, May 03, 2007

Mark Pilgrim on Vendor-Specific Hype

Mark Pilgrim speaks the truth about this hype going on with the new announcements of proprietary/vendor specific web stacks and runtimes (Microsoft Silverlight, Adobe Apollo, etc).  Don't get fooled again!:

"Y’all have fun. Play with your vendor-specific runtimes. Don’t call me when you wake up one morning with a pink line in the round window and your BFF vendor won’t return your calls. If you need me (but of course you won’t), I’ll be holed up in my drab unpainted toolshed around the corner, quietly building applications on the web that works."

Love it.

#    Comments [0] |
 Monday, April 30, 2007

I Am LISP?

I just took the "Which Programming Language Are You?" quiz. Was hoping to be Python.

Apparently I am LISP?

You are Lisp.  Very few people like you (Probably because you use too many parenthesis (You better stop it (Reallly)))
Which Programming Language are You?

#    Comments [2] |
 Thursday, April 19, 2007

Linus Torvalds on Competition by Technical Merit

I saw this message from Linus on the LKLM and I thought it was well stated.  I love the way Linus runs the crazy bazaar of Linux Kernel development.  He stays true to technical merit and essentially bases all of his decisions on this.  (though sometimes this is in conflict with the ethics of Free Software).

Linus Torvalds from the Linux Kernel Mailing List:

"One of the most motivating things there *is* in open source is "personal pride".

It's a really good thing, and it means that if somebody shows that your code is flawed in some way (by, for example, making a patch that people claim gets better behaviour or numbers), any *good* programmer that actually cares about his code will obviously suddenly be very motivated to out-do the out-doer!

Does this mean that there will be tension and rivalry? Hell yes. But that's kind of the point. Life is a game, and if you aren't in it to win, what the heck are you still doing here?

As long as it's reasonably civil (I'm not personally a huge believer in being too polite or "politically correct", so I think the "reasonably" is more important than the "civil" part!), and as long as the end result is judged on TECHNICAL MERIT, it's all good.

We don't want to play politics. But encouraging peoples competitive feelings? Oh, yes."
#    Comments [0] |
 Wednesday, April 18, 2007

Microsoft Silverlight - Flash Killer? Lose the Geeks, Lose the Battle

Microsoft has renamed "WPF/E" to "Silverlight":

"Silverlight is a cross-browser, cross-platform plug-in for delivering the next generation of media experiences and rich interactive applications (RIAs) for the Web."

It looks like Microsoft is pushing this technology aggressively to CDN's and content distributors:

"Early supporters of the new platform include Akamai, Brightcove, Eyeblaster, Limelight, Major League Baseball, Navisite, Netflix, Skinkers, Sonic Solutions, SyncCast, Tarari, Telestream, Winnov, and more."

Silverlight will work on Windows and Mac OSX.  OK.. so no Linux support?  I think if Microsoft hopes to supplant Flash, it truly needs to be cross platform (not just Windows and OSX).

from the Silverlight FAQ:

"Microsoft is gathering feedback from customers like you on Silverlight and to help determine which platforms should be supported in the future."

Better hop to it boys.. With the proliferation of GNU/Linux, pushing a presentation framework that doesn't run on it is a large oversight.

You need the geeks on board.. lose the geeks.. lose the battle.

#    Comments [2] |
 Wednesday, April 11, 2007

Radview WebLOAD goes Open Source!

OK, this is huge news: www.webload.org

The commercial performance/load test tool market is dominated by large proprietary commercial vendors (HP/Mercury, Borland/Segue, etc). Radview has a nice product called WebLOAD that competes in the space.

As of this morning, Radview announced they have released WebLOAD OS, an open source version of WebLOAD. It is full-on GPL licensed (no fake open source). I already browsed their source tree. They have a Subversion repository.. code is in C and C++,

The Open Source performance/load test tool market doesn't offer many choices. Currently the most popular tools are JMeter and OpenSTA

This will be exciting. I wonder how well Radview will deal with the community on this. Though if it's not good, GNU GPL certainly allows forking :)

more to come...

#    Comments [3] |
 Tuesday, April 10, 2007

Python and IEC - Stupid-Simple Windows Browser Automation

I have been using IEC lately for automating repetitive administrative tasks within my company:

IEC.py - Automating Internet Explorer with Python

IEC is a simple library with a nice API for automating an IE browser. I found it simple to work with for basic automation needs. I have also used it as the core of a small UI testing framework.

From Mayukh Bose:

IEC is a python library designed to help you automate and control an Internet Explorer window. You can use this library to navigate to web pages, read the values of various HTML elements, set the values of checkboxes, text boxes, radio buttons etc., click on buttons and submit forms.

Yeah I know.. pretty lame it only works with IE, but in the environment I was working in, the applications ran on *IE Only*.


A personal story:

My company is very analytical and detail oriented when it comes to tracking/planning project resource allocation. We track all sorts of projections, budgets, resources, etc. The workflow is basically: some business guys (no idea what they actually do) take data from some reports and enter them into some arcane hosted tracking software. This is done by entering copious amounts of data into web form after web form. Then they submit the form to run a report. Once that is finished, they cut & paste the data into MS Excel. Then they take the Excel spreadsheet and follow some wild sequence of copying, cutting, pasting, converting, running macros, graphing, etc. At the end of this, a few images are produced so some wizz-bang graphs can go into a monthly Powerpoint... wow.

So... I wrote a Python script that takes their input data, drives a web browser to do the report, screen scrapes the result, processes it, generates some fancy graphs with Matplotlib, and presents a web page with the results.  End result: Converted a multi-hour manual process into the click of an icon and 20 seconds of processing.

I could have done this with HTTP directly, but this UI automation technique made it very quick to develop; and it looked impressive ("whoa it's like.. making my browser move on its own").


To use IEC, you need the Python for Windows Extensions. If you use the ActiveState Python distribution, these are already included.

I used to use ActiveState Python for Windows programming (because I was a big fan of ActiveState Perl, where the installer and PPM package manager rocked). I recently spent close to an hour getting SSL (HTTP) to work with ActiveState.  I couldn't get it to work so I ditched it for the standard Python distro.


--
Happy Hacking.

#    Comments [4] |
 Monday, April 09, 2007

Geo Location Mashup - Python, Yahoo Maps AJAX API

Mapping User Metro Concentration by IP Address

I just posted this: http://www.goldb.org/geo_maps

It is a tutorial/example showing how to create a geolocation mashup by generting HTML/JavaScript code from a Python script.  The resulting code is an HTML page with embedded JavaScript that you can open with your browser.  It works with the Yahoo Maps AJAX API to plot markers at specified locations.  I also explain how this technique can be used to create a [near] real-time map of user concentration based on IP addresses.

... feedback welcome.


It generates cool AJAXy eye-candy like this:

and this:

Since I use the AJAX control, the rendered map has a zooming, panning, dynamic, tiled interface.  Pretty Slick.

#    Comments [1] |
 Sunday, April 08, 2007

Got Scalability? Compute/Storage Grids

It seems to be all about size and massive buildouts of compute and storage grids these days... both commercially (see Google, Microsoft, Amazon, Yahoo, Sun, IBM, HP, Oracle, etc) and in academia.  The interesting thing is that the technology used is good for both distributed and centralized (tight clusters of distributed nodes) computing. Processing and storage can be pushed to the edges, or gathered centrally... it's up to you... the mechanisms are there...  it's all converging.

The world is becoming a massive digital fabric.

I'm just fascinated by the scale of the data centers, operations, and services that are being deployed.

Article from NY Times last summer (June '06):

"The best guess is that Google now has more than 450,000 servers spread over at least 25 locations around the world. The company has major operations in Ireland, and a big computing center has recently been completed in Atlanta. Connecting these centers is a high-capacity fiber optic network that the company has assembled over the last few years.

Google has found that for search engines, every millisecond longer it takes to give users their results leads to lower satisfaction. So the speed of light ends up being a constraint, and the company wants to put significant processing power close to all of its users."

Wow. Now we do we understand why our systems must scale?


Related:
http://www.tbray.org/ongoing/When/200x/2006/05/24/On-Grids
http://www.globus.org/toolkit/
http://www.sun.com/service/grid/
http://www.amazon.com/gp/browse.html?node=16427261
http://www.amazon.com/gp/browse.html?node=201590011

#    Comments [0] |
 Monday, April 02, 2007

I Need Better Web Hosting

My website and blog were down most of today, after getting pounded with traffic from Reddit Programming.

The day started great... I already had 600 visitors today when I woke up for work at 7AM.  Then one of my posts started floating near the top of Reddit.  My server couldn't handle the traffic and soon fell over.  It didn't come back online until just now.

Granted, I am using ultra cheap shared hosting, so this shouldn't come as a huge surprise.  However, I am now looking for some hosting that is slightly more reliable.  Aside from the the heavy traffic today, my site goes up and down intermittently all the time anyways.

Can anybody recommend some good cheap web hosting?  Basically I am looking for about 1 gig of storage and at least 3 gigs of transfer per month.  I understand that reliability and availability are something one must pay for (and usually mutually exclusive with shared hosting).  So.. I would sacrifice availability for price, as long as availability and reliability were decent.

I need both Windows (with ASP.NET 2.0) and Linux (with Python/Perl) hosting. These can be from a single provider, or with 2 different providers.  I have used lots shared hosting services over the years and all of them generally suck.

.. any good hosting recommendations?

#    Comments [1] |
 Sunday, April 01, 2007

Massive Concurreny with PyPy Stackless

(via)
PyPy had its 1.0 release recently.

Now, This looks *really* interesting:

PyPy Stackless

PyPy can expose to its user language features similar to the ones present in Stackless Python: no recursion depth limit, and the ability to write code in a massively concurrent style. It actually exposes three different paradigms to choose from:
  • Tasklets and Channels
  • Greenlets
  • Plain Coroutines
#    Comments [0] |

One Laptop Per Child - More Prototype Pics and Info

I posted some pics of the latest OLPC prototypes a few weeks ago.  Well... I got to see them 2 weeks in a row; so here are some more pics of the machine up close.

... Seems the whole "hand crank" idea is gone.  There is now a pullchord on the external power supply with a 10:1 ratio (1 minute of pulling = 10 mins of computing) for manually recharging power... The keyboard is tiny and soft feeling.  The screen is small but is very viewable in direct light without backlighting (which is probably the #1 power drain on laptops).

OLPC rocks!

Me geeking out:

Old school meets new school...
Gerald J. Sussman (yes, the MIT Scheme guy) playing with the latest OLPC prototype:

Closeups:


.. these machines run a scaled down version of Fedora Linux that is loaded with Python applications.

-Corey

#    Comments [3] |
 Saturday, March 31, 2007

Digital Ethnography

(I can't even tell you how many times I've watched this video since it came out a few months ago)

For posterity...

Professor Michael Wesch:

teaching the machine.
the machine is us.

we'll need to rethink a few things...
copyright
authorship
identity
ethics
aesthetics
rhetorics
governance
privacy
commerce
love
family
ourselves

- The Machine is Us/ing Us

#    Comments [0] |
 Thursday, March 29, 2007

Python - Remove Duplicate Items From a Sequence

Say you have a sequence like:

[1, 1, 2, 2, 2, 3, 4, 4, 4]

... and you want a sequence containing all the unique items (remove duplicates) like:

[1, 2, 3, 4]


Here is a function to do it:

def remove_dups(seq):
    x = {}
    for y in seq:
        x[y] = 1
    u = x.keys()
    return u


or a one-liner:

u = [x for x in seq if x not in locals()['_[1]']]



update: in the comments below, some other ways were suggested..

with 'set'.. like this:

u = list(set(seq))

or with a dictionary.. like this:

u = dict.fromkeys(seq).keys()
#    Comments [4] |
 Wednesday, March 28, 2007

Microsoft IIS - Welcome to Last Decade (Performant CGI)

Wow..
CGI will run well on IIS
Rails will run well on IIS.

Rob Conery on running Ruby on Rails (or other CGI based platforms) on IIS:

"Rails works using CGI - basically an executable that gets run each time a request comes into a web site. Most of the frameworks out there do NOT support multi-threading, so each time a request comes in that requires anything dynamic, CGI is "instanced" and executed. If you have a lot of requests at once, this isn't really a good thing. Now some servers are built to mitigate this (Apache, Lighttpd, etc); IIS is not.

... I would imagine that in the next 6 months we'll see a great addition to IIS 6 and 7 for all the CGI-enabled platforms out there."


hmm.. good to hear. (seriously)
but damn... weren't we doing this 10 years ago with Perl/Apache? :)

#    Comments [2] |
 Sunday, March 25, 2007

Real World Web Scalability

(via reddit programming)

Very lengthy overview of performance and scalability issues for web systems by Ask Bjorne Hansen.  This presentation covers a vast range of information:

Real World Web Scalability  (warning large PDF)

The takeaway?
Create horizontally scalable distributed systems..  always.

#    Comments [0] |

Scalability Comparison of Virtualization Tools

A report about scalability of virtualization techniques:

SCALABILITY COMPARISON OF 4 HOST VIRTUALIZATION TOOLS (QUETIER B / NERI V / CAPPELLO F)

"Virtualization tools are becoming popular in the context of Grid Computing because they allow running multiple operating systems on a single host and provide a confined execution environment.  In several Grid projects, virtualization tools are envisioned to run many virtual machines per host.  This immediately raises the issue of virtualization scalability."

4 types of virtualization tools are discussed in the context of scalability:

  • Processor Virtualization
  • Kernel Replication
  • Operating System Virtualization
  • Resource Virtualization
#    Comments [0] |