goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Wednesday, March 21, 2007

Sun Giving GNU Credit

RMS has been on the "GNU/Linux" naming convention rant for years; urging people to give the GNU Project and the legions of contributors credit they deserve.  Afterall, the bulk of Free Software OS userland is made of GNU contributions.

One might think that a company like Sun Microsystems wouldn't grok this concept, since most GNU/Linux distributions themselves don't.


However, some folks at Sun definitely get it:

Tim Bray - Director of Web Technologies (talking about Ian Murdoch joining Sun):

"As of this weekend Ian wasn’t even on the payroll yet and was already in in a peppy little email debate over when to say “Linux” and when to say “GNU” and when to say both."

Simon Phipps - Chief Open Source Officer:

"the combination of the GNU operating system pioneered by Richard Stallman with the inclusive development delivered around the Linux kernel by Linus Torvalds has brought a new life and energy to the extended family tree of Unix. The popularity of GNU/Linux bears testament to the vision and skill Stallman and Torvalds exhibit."
#    Comments [0] |

New O'Reilly Book About Web Performance - Coming Soon

(Note to self: buy this book when it comes out)

Steve Souders (Chief Performance Yahoo! at Yahoo) is writing a book for O'Reilly about web performance:

High Performance Web Sites


It's great to see Performance continue to gain exposure.

-Corey

#    Comments [0] |

Google Summer of Code 2007 - No Perl for You

The Perl Foundation won't be involved in Google Summer of Code 2007.

Bill Odom:

"The short version: We submitted an application to be a mentoring organization, but we weren't accepted."

However, even without the Perl community represented, the list of mentoring organizations and projects is really good!

#    Comments [0] |
 Monday, March 19, 2007

Making Applications Scalable With Load Balancing

I am in the process of tuning a large distributed system; using an F5 BIG-IP Load Balancer to distribute traffic.

Willy Tarreau has a very good overview of load balancing options:

Making applications scalable with Load Balancing

#    Comments [0] |
 Sunday, March 18, 2007

Linux - Symmetric Multiprocessing

Tim Jones gives a brief overview of SMP and discusses working with the Linux kernel:

Linux and symmetric multiprocessing


Tim Jones:

"As processor frequencies reach their limits, a popular way to increase performance is simply to add more processors. In the early days, this meant adding more processors to the motherboard or clustering multiple independent computers together. Today, chip-level multiprocessing provides more CPUs on a single chip, permitting even greater performance due to reduced memory latency.

You'll find SMP systems not only in servers, but also desktops, particularly with the introduction of virtualization. Like most cutting-edge technologies, Linux provides support for SMP. The kernel does its part to optimize the load across the available CPUs (from threads to virtualized operating systems). All that's left is to ensure that the application can be sufficiently multi-threaded to exploit the power in SMP."
#    Comments [0] |

Going Transactionless - Scalable Data Tiers

Dan Pritchett posted his excellent "How eBay Scales" presentation a few months back.

It is a great look into a real-world massive distributed system and the evolution of its scalable architecture.  One interesting thing to notice is that eBay is a transactionless environment (meaning it doesn't use Database Transactions).

I have always seen the data layer as the difficult part to scale.  Separating logic from data and working in a purely transactionless environment can mitigate this issue.

Martin fowler commented on this today:

"The rationale for not using transactions was that they harm performance at the sort of scale that eBay deals with. This effect is exacerbated by the fact that eBay heavily partitions its data into many, many physical databases. As a result using transactions would mean using distributed transactions, which is a common thing to be wary of.

This heavy partitioning, and the database's central role in performance issues, means that eBay doesn't use many other database facilities. Referential integrity and sorting are done in application code. There's hardly any triggers or stored procedures."
#    Comments [0] |
 Saturday, March 17, 2007

OLPC Machine Up Close at BarCamp Boston 2

I was at the BarCamp2 "unconference" today at MIT's Stata Center and got to see the OLPC machine  ... very cool.

Chris Ball had a prototype on hand.  Chris heads One Laptop Per Child's performance testing work.  I was able to chat with him for a bit and take some pics:

One thing that struck me was the size of the laptop. It is really very small.  The keys are much smaller than typical laptop keys (designed for children's hands).

Chris with the laptop:


This project fascinates me.  I can't wait for the abundance of future hackers.

#    Comments [0] |

Python3000 vs. Perl6 ... Wanna Bet?

Perl6...
Python3000...

Both are redesigns of very popular dynamic/scripting languages.  Both have very strong, though very different, communities supporting them.

Out of the gate, Perl's plans were much more ambitious, including a new generic virtual machine.  Python's plans were more pragmatic; more of a language cleanup than a drastic redesign.

Perl 6 was officially announced nearly 7 years ago and I don't see a stable production release coming *any* time soon.  On the other hand, the idea of Python3000 was sorta tossed around for a while and swung into gear 2 years ago.

Guido (Python's BDFL) has been spearheading the effort, whereas Perl's leadership structure is much more anarchic (Where is Larry Wall these days?).  Guido has been very transparent and kept the community aware of his worries.

Some people saw this as a slippery slope...

Chromatic:

"Language redesign is difficult, isn’t it?  Once you start challenging base assumptions, you find that a lot of your previous conclusions are shaky, and good luck reigning in blue-sky ideas!

See you in 2007… or 2008… or 2009.

Best wishes,
a Perl 6 hacker"

I disagree..

I'd bet anyone money that I will be hacking on a stable release of Python3000 long before I'm using a stable version of Perl6... any takers?


(disclosure: I have written lots of code in both Perl and Python and am a fan of both)

#    Comments [6] |
 Friday, March 16, 2007

JakeBrake's Level of Geekdom

Impressive...

JakeBrake is a true geek:

"I am so technical that:
  • I routinely do unit-level performance/timing tests on Cialis to see if will time-out at 4 hours.
  • For meals I eat only donuts and hotdogs; arranging them on my plate as "bites" in patterns of ones and zeros.  I use a burnt hotdog as a signed bit."


If you are into testing and performance, Sounds of Jake Braking blog is a great read.

#    Comments [0] |
 Wednesday, March 14, 2007

Regex "Match" in Python vs. C#

I have been writing a lot of code in both C# and Python lately... flipping back and forth between both languages.  One thing I keep getting tripped up on is the terminology used in regular expression syntax, and what a "match" is.

So for my own disambiguation:

  • Python's re.match() is different than C#'s Regex.IsMatch()
  • Python's re.search() is similar to C#'s Regex.IsMatch()


Better explained in code:


Using Regex.IsMatch() in C# to match a pattern with some text:

if (Regex.IsMatch("foobar", "bar"))
{
    Console.WriteLine("Match");
}
else
{
    Console.WriteLine("No Match");
}

this prints 'Match'


Same thing, using re.match() in Python:

if re.match('bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'No Match'


oops.. didn't get a match. What happened?

match() only checks if the regex matches at the beginning of the string, while search() will scan forward through the string for a match.


If you were expecting the pattern to match anywhere in the string, you need to use re.search() instead:

if re.search('bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'Match'


... or else you must supply a pattern that will match from the beginning of the string:

if re.match('.*bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'Match'


#    Comments [0] |
 Monday, March 12, 2007

Bernie Velivis on Performance Testing Strategies

OpenSTA is a distributed performance testing architecture and tool.  Recently, on the opensta-users mailing list, Bernie Velivis (from Performax Inc) posted an excellent article about performance testing strategies.  I thought this information would be useful for all performance testers who are just learning the craft.  Rather than letting it sit in some arcane mailing list archive, I asked Bernie if I could re-publish it here.

enjoy.

-Corey


Bernie Velivis on Performance Testing Strategies:


PERFORMANCE TESTING STRATEGIES

The terminology used to discuss Performance testing in technical publications and support forums can be ambiguous or inconsistent. Hopefully this article will help participants in the OpenSTA user support forum by providing a common frame of reference for discussing tools, testing, and test results. It may also be helpful to those new to performance testing.

CAPACITY TESTING

If your goal is to determine the CAPACITY of the system under test, start by creating a "realistic" workload consisting of a mix of the most popular transactions plus those deemed critical or known to cause problems even when executed infrequently. Pick a manageable set of transactions to emulate (considering time, budget, and goals), determine the probability of executing each transaction, the work rate for the emulated users, and the "success criteria for performance metrics (i.e. response time limits, concurrent users, and throughput).

One way to implement this approach is to create a master script, assign it to each VU, and have it generate random numbers and then call other scripts which model the individual workload transactions based on a table of probabilities. The scripts should be modeled with think times consistent with the way your users interact with the system. This varies greatly from one application to another and unless you are mining log files from an application already in use, this is a somewhat subjective process. The best advice I can give in defining the workload is to get input from people who know how the application is (or will be) used, make conservative assumptions (but no so much so that the sum of all your conservative decisions is pathological), and balance the scope of the workload vs. time to complete the project. Another important consideration is the data demographics of the transactions and the size and contents of the database.

When its time to test, increase the number of emulated users and monitor how response times, server resource utilization (CPU, disk IO, network, and memory), and throughput (the rate of tasks completed system wide) vary with the increased load. You might construct a test that ramps up to a specific number of users, lets them run for a while, and then repeats as necessary. This way, you can observe the behavior of the system in various steady states under increasing load. Workloads containing transactions having a low probability of being executed and/or a disproportionately large impact on the performance of other transactions usually need to run longer to reach a steady state. If you can't get repeatable results, your steady state interval might be too small. As a rule of thumb I would suggest a minimum ramp up time equal to the duration of the longest running script and the steady state observation period at least twice as long as the ramp up period. I also tend to ignore response times and performance statistics gathered during the ramp up periods and focus instead on the data collected during the steady state periods.

That's a rough outline of one approach to capacity testing which in summary is an attempt to load up the system with VUs in a way that is indistinguishable from a "real users" in order to find the capacity limit. Pick the wrong workload however and you might miss something very important or end up solving problems that won’t exist in the real world.

The end game here is to increase load until response times become excessive at which point you have found the system’s capacity limit. This limit will be due to either a hardware or software bottleneck. If time and goals allow, analyze the performance metrics captured, do some tuning, improve code efficiency or concurrency, or add some hardware resources. Make one change at a time and repeat as necessary until you meet capacity goals, find the limits to the architecture, or run out of time (which happens more then most performance engineers would like).

SOAK TESTING

The same scripts created for capacity testing can also be used for SOAK TESTING where you load up the system close to its maximum capacity and let it run for hours, days, etc. This is a great way to spot stability problems that only occur after the system has been running a long time (memory leaks are a good example of things you might find).

FAILOVER TESTING

Get the system under test into a steady state and start failing components (servers, routers, etc) and observe how response times are effected during and after the failover and how long the system takes to transition back to steady state and you are on your way towards FAILOVER TESTING. (A gross simplification and again there is lots of good reading material out there on failover and high availability testing).

STRESS TESTING

If your goal is to determine where or how (not if) the system will fail under load, then you are doing STRESS TESTING. One way to do this is to comment out the think times and increase VUs until something (hopefully not your emulator!) breaks. This is one form of stress testing, a valuable aspect of performance testing, but not the same as capacity testing. How the VUs compare to "real users" may be irrelevant as you are trying to determine how the system behaves when pushed past its limits.


A report illustrating how these concepts were used to performance test a SOAP application using OpenSTA can be downloaded here:
SamplePerformaxPerformanceReport.pdf


Bernie Velivis
Principle Consultant and President, Performax Inc
www.iPerformax.com


#    Comments [0] |

Zabbix - Open Source Network/Infrastructure Monitoring

I have used Nagios for several years, and it has served me well as an open source distributed monitoring system.

I just read about Zabbix, and I'm posting here so I won't forget to check it out.  Zabbix is GPL (v2) licensed and looks interesting.  I will post more once I get a chance to play with it.

#    Comments [0] |
 Saturday, March 10, 2007

Python - Iterating Multiple Sequences

Here are some examples of iterating through multiple sequences simultaneously in Python:


I start with 2 lists of numbers:

foos = [0, 1, 2, 3, 4]
bars = [1, 2, 3, 4, 5]

I want to create a new list that is made up of the sum of the items at each position in the original lists.  So I will end up with this:

>>> print foobars

[1, 3, 5, 7, 9]


Starting with an unpythonic way...
Here I use a counter to iterate through the indexes of each sequence and build a new list:

foobars = []
for i in range(len(foos)):
    foo = foos[i]
    bar = bars[i]
    foobars.append(foo + bar)


Getting more pythonic...
Here I use zip. Zip allows me to iterate each sequence simultaneously, assigning the current sequence values each time through the loop:

foobars = []
for foo, bar in zip(foos, bars):
    foobars.append(foo + bar)


The older pythonic way to do this was with map:

foobars = []
for foo, bar in map(None, foos, bars):
    foobars.append(foo + bar)


Getting even more pythonic and more concise...
I can combine zip with a list comprehension and do it in a one-liner like this:

foobars = [foo + bar for (foo, bar) in zip(foos, bars)]


*note:  zip will not be part of Python 3000.  It will be replaced by izip and iterators to achieve similar results.

#    Comments [2] |
 Friday, March 09, 2007

Joe Barr Lays the Smack Down on OSS FUD

In his article: "Joe Barr rips proprietary software vendor a new one", Joe does exactly what the title states  :)

His article was a response to an earlier piece by Roger Greene (CEO of Ipswitch), where Roger says some very confused/uninformed things about Open Source software.


One thing Joe didn't rip was this excerpt from Roger Greene:

"The open source community claims bugs can be fixed faster for open source software than commercial software because hundreds, if not thousands, of people are looking at the code daily and can help with fixes. [ ... ] Even when those individuals generously offer their time for free, can you really afford to wait for one to agree with you on the urgency of action if your network is down?"

Huh?

That is a very odd and misleading way to look at it.  Open Source gives you the ability to modify the code yourself.  You don't have to wait for anyone.  You can hire a freelance developer or consultancy to fix it on the spot.  If you find a problem in a proprietary vendor's software, can you do the same?

No.. proprietary software puts you at the mercy of your vendor.

#    Comments [0] |
 Thursday, March 08, 2007

PLEAC - Programming Language Examples Alike Cookbook

I just stumbled across the PLEAC Project (Programming Language Examples Alike Cookbook).

Project Description:

"Following the great Perl Cookbook (by Tom Christiansen & Nathan Torkington, published by O'Reilly; you can freely browse an excerpt of the book here) which presents a suite of common programming problems solved in the Perl language, this project aims to gather fans of programming, in order to implement the solutions in other programming languages."


There is sample code in many popular languages.  The Python examples are really good.  They would serve as an excellent primer for someone moving from Perl to Python, or as a general Python reference with cookbook-style examples.

It is hosted at SourceForge and licensed under the GNU Free Documentation License (GFDL).

#    Comments [0] |
 Wednesday, March 07, 2007

Play Corey's Tunes - Last.fm

I got hooked on Last.fm last summer.  Since then, I've scrobbled over 6,800 tracks.. not bad!  (I've been a music junkie my entire life).

I submit my played tracks from all of my music players (Foobar2000, Winamp, iTunes, Squeezebox/SlimServer).  It populates fantastic statistics and charts of my listening habits and lets me listen to streams from people with similar tastes.

Here are my top 10 artists since I started using it:


Once you scrobble enough tracks, you can start streaming custom stations based on your listening habits.  Something new they just added is the ability to embed a Last.fm player widget into your own site. I just knocked up a quick web page with the embedded player so I can listen to my own station from anywhere that has a browser (requires Flash).  Check it out and have a listen.

#    Comments [0] |
 Tuesday, March 06, 2007

Show Us Your Rack!

Reverend Ted just started the "Show Us Your Rack" blog campaign ;)

Well.. I've seen a few nice racks in my day.  I took these pics last year.  Forget the tiny colo cage, this data center gives you some room to spread out:


plenty of room:


looking down the aisle:


got load balancing?:


the big dogs:

#    Comments [0] |
 Monday, March 05, 2007

.NET CLR - Covertly Throttling Thread Creation

Joe Duffy (from Microsoft) talking about the .NET 2.0 CLR:

"It's also worth noting that the threadpool throttles its creation of threads to 2/second once the count has exceeded the # of CPUs."


Yuck...  I don't like throttling like that behind the scenes.  It can make performance problems *very* hard to diagnose.

#    Comments [2] |
 Sunday, March 04, 2007

Python Parameters - Pass-By-Value or Pass-By-Reference?

Passing parameters to functions and methods.  Pass-by-value?  Pass-by-reference?  Which does your language use?

You probably learned this in your first CS class... so did I.

Then why did it take me a frakin' month to understand what Python does? :)

Well... if you look online, you will find some very ambiguous answers about Python being pass-by-reference or pass-by-value.  (which ends up boiling down to semantics and how you use certain terminology, but forget that for now)


To review, how do other languages handle this concept?

C is pass-by-value

Straightforward. You can simulate pass-by-reference with pointers.  Not much else to say here.

Java is pass-by-value

Primitive Types (non-object built-in types) are simply passed by value.  Passing Object References feels like pass-by-reference, but it isn't.  What you are really doing is passing references-to-objects by value.

OK, so what about Python?

Python passes references to objects by value (like Java), and everything in Python is an object. This sounds simple, but then you will notice that some data types seem to exhibit pass-by-value characteristics, while others seem to act like pass-by-reference... what's the deal?

It is important to understand mutable and immutable objects. Some objects, like strings, tuples, and numbers, are immutable.  Altering them inside a function/method will create a new instance and the original instance outside the function/method is not changed.  Other objects, like lists and dictionaries are mutable, which means you can change the object in-place.  Therefore, altering an object inside a function/method will also change the original object outside.



For entirely too much information about this topic in Python and across many other languages (Java, Scheme, C#, C, C++, Python), read the thread where these quotes come from:

Is Python By Value Or By Reference?

Alex Martelli:

The terminology problem may be due to the fact that, in python, the value of a name is a reference to an object. So, you always pass the value (no implicit copying), and that value is always a reference.
[...]
Now if you want to coin a name for that, such as "by object reference", "by uncopied value", or whatever, be my guest. Trying to reuse terminology that is more generally applied to languages where "variables are boxes" to a language where "variables are post-it tags" is, IMHO, more likely to confuse than to help.

Michael Hoffman:

Alex is right that trying to shoehorn Python into a "pass-by-reference" or "pass-by-value" paradigm is misleading and probably not very helpful. In Python every variable assignment (even an assignment of a small integer) is an assignment of a reference. Every function call involves passing the values of those references.


word.

#    Comments [0] |

Many Hats of a Performance Engineer

The many hats of a performance engineer:

  • tester
  • developer
  • toolsmith
  • dba
  • sysadmin
  • network/ops/tech
  • system integrator
  • architect
  • statistician
  • data visualizer
  • sadist



#!/usr/bin/env python


def main():
    me = PerformanceEngineer()
    print 'just another %s:' % me.role
    print me.skillz



class PerformanceEngineer(object):
   
    role = 'performance engineer'
    skillz= (
        'tester',
        'developer',
        'toolsmith',
        'sysadmin',
        'network/ops/tech',
        'system integrator',
        'architect',
        'statistician',
        'data visualizer',
        'sadist'
    )
  
    def __init__(self):
        pass      



if __name__ == '__main__':
    main()
#    Comments [0] |
 Wednesday, February 28, 2007

One Laptop Per Child - It's All About the Python!

Wow. I just read something interesting about the One Laptop Per Child (OLPC) project in Guido's PyCon writeup:

"The software is far from finished.  An early version of the GUI and window manager are available, and a few small demo applications: chat, video, two games, and a web browser, and that's about it!  The plan is to write all applications in Python (except for the web browser), and a "view source" button should show the Python source for the currently running application.  In the tradition of Smalltalk (Alan Kay is on the OLPC board, and has endorsed the project's use of Python) the user should be able to edit any part of a "live" application and see the effects of the change immediately in the application's behavior."


So... they are going to be running a GNU/Linux OS (a stripped down version of Fedora), with essentially all applications in Python.

This is very cool on many levels. It is the ultimate endorsement of Python.  It also makes me think about the future...  If OLPC is successful, a few years down the road we might be looking at several million young new Open Source/Python hackers.  Nice!


#    Comments [0] |

Reading Outlook/Exchange Email Programatically with Python

With Python's Windows Extensions, you can talk via COM to an Exchange Server and read/process your email.  You must have the Outlook Client installed on the box you are running this from.

Here is a sample script that will:

  • connect to your mailbox
  • print the inbox name
  • print the message count
  • print the subjects for all your email messages


#!/usr/bin/env python

from win32com.client import Dispatch

session = Dispatch("MAPI.session")
session.Logon('OUTLOOK')  # MAPI profile name
inbox = session.Inbox

print "Inbox name is:", inbox.Name
print "Number of messages:", inbox.Messages.Count

for i in range(inbox.Messages.Count):
    message = inbox.Messages.Item(i + 1)
    print message.Subject


#    Comments [0] |
 Monday, February 26, 2007

Python 3000 Video and Slides

Guido van Rossum just published his slides from the PyCon 2007 Keynote where he discusses Python 3000.

His talk is also available on Google Video.

I'm psyched to watch this!

#    Comments [0] |

Boston - Back Bay Snow Pics

New England weather is crazy. In Boston, we get pounded with snow some winters.  Other winters we hardly get any of the white stuff.  This year has had really minimal snowfall, but it was coming down this morning.

I snapped a few pics on my way to work (I live in Back Bay).

Looking out my apartment window:

Looking out from my front stoop:

The view down Comm. Ave:

Copley Square, Old Trinity Church, John Hancock Building, Boston Public Library:


#    Comments [2] |
 Sunday, February 25, 2007

Wes Dyer on Type Systems

Awesome post by Wes Dyer about Type Systems:
Types Of Confusion


He tackles a lot of misconceptions and myths about Typing.

To frame his points, he starts with some fantastic definitions.  I am all for clarifying vocabulary and I really like this:


Based on the apparent confusion, I think it is best to clarify what I mean by each of the following terms:

    Type Checking - Verifying that code respects type constraints.

    Statically Typed - Type checking occurs at compile time.

    Dynamically Typed - Type checking occurs at run time.

    Type Safe Language - A language which protects its own abstractions.

    Type Unsafe Language - A language which is not type safe.

    Strongly Typed and Weakly Typed - Depends on the author; The definitions are so many and so varied that the terms are practically useless. It seems that anyone can claim that language X is either strongly typed or weakly typed based on sound reasoning derived from one of the various definitions.

    Dynamic Language - A language which enables runtime inspection or modification of a program; most languages can do this but dynamic languages make it easy. It is common for people to refer to "dynamic languages" and mean "dynamically typed languages" as the term is defined here.

good read.

#    Comments [0] |