goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Sunday, March 25, 2007

Real World Web Scalability

(via reddit programming)

Very lengthy overview of performance and scalability issues for web systems by Ask Bjorne Hansen.  This presentation covers a vast range of information:

Real World Web Scalability  (warning large PDF)

The takeaway?
Create horizontally scalable distributed systems..  always.

#    Comments [0] |

Scalability Comparison of Virtualization Tools

A report about scalability of virtualization techniques:

SCALABILITY COMPARISON OF 4 HOST VIRTUALIZATION TOOLS (QUETIER B / NERI V / CAPPELLO F)

"Virtualization tools are becoming popular in the context of Grid Computing because they allow running multiple operating systems on a single host and provide a confined execution environment.  In several Grid projects, virtualization tools are envisioned to run many virtual machines per host.  This immediately raises the issue of virtualization scalability."

4 types of virtualization tools are discussed in the context of scalability:

  • Processor Virtualization
  • Kernel Replication
  • Operating System Virtualization
  • Resource Virtualization
#    Comments [0] |

Operating System Genealogy - Timelines

Sweet...
The history of entirely too many operating systems in way too high resolution.
... but great fun for OS geeks.

Operating System Genealogy:

#    Comments [0] |
 Saturday, March 24, 2007

Free Software Foundation - 2007 Associate Member Meeting

The Free Software Foundation's annual Associate Members Meeting is always an inspiring event for me.  It serves as a sort of State of The Free Software Union; where members gather to discuss ideas and listen to speakers.  Most of the FSF Board of Directors were there to speak.

I attended the meeting today (Saturday 03/24/2007) for the 4th time in the past 5 years.

It was held at MIT (Cambridge, Massachusetts):

 

I arrived during Joshua Ginsberg's (FSF Senior System Administrator) speech on “FSF Systems Administration”.  He gave an overview of some of the systems and internal work going at the FSF offices. Some highlights:

  • FSF now runs LinuxBIOS on new Tyan servers for FSF and GNU Project resources.  They will be contributing documentation and information to help others install a Free BIOS.
  • New and much improved FSF network infrastructure and connectivity for FSF/GNU hosted resources.
  • FSF is switching from Zope to Django (both Python powered!) for web application development...  Lots of new stuff coming soon, including contributions back to the Django community.

Next up was Brett Smith, the new GPL Compliance Engineer at the Compliance Lab.  One thing Brett mentioned was that GPL license violations are pretty much kept secret and not disclosed to the community.  FSF prefers to negotiate with violators and talk them into compliance behind closed doors.  I'm not sure I agree with this practice.  I asked Richard Stallman about this during his Q&A Session... stating that I thought this information should be released to the public.  I don't see it as an overly aggressive move and I think publicly outing companies that are GPL violators would be a good way to give exposure to Free Software and help curb future violations.  RMS doesn't quite agree with my standpoint, but he asked some FSF staff to explore generically publicizing more types of violations.

Next was Gerald Jay Sussman, speaking about "Robust Design". Gerry was the author of my first Computer Science book, the venerable Wizard Book (SICP), and one of the authors of Scheme (a programming language dialect of LISP).  I was able to thank him for the pain and enlightenment his texts brought me during my CS studies.

Gerry is a complete madman when he gives presentations.  Forget the powerpoints and fancy presentation gear... he just slings around old school projector slides at blazing speed.  Admittedly, the stuff he talks about is far over my head.  I'm just a lowly computer programmer.  This guy has been at MIT since 1964 studying the cutting edge of computer science, mechanics, and electrical engineering. Watching him ease through functional programming and Scheme code is a little intimidating, but the entertainment value alone is worth it.

OK.. now the person most people came to see speak... the GNU Project founder, FSF President, former MIT AI Lab hacker, Emacs/GCC/GDB author, Chief GNUisance, and St. Gnucius himself... Richard Stallman:

RMS was in a surprisingly jovial mood. He is usually sorta moody and prone to outbursts.  I saw him shout at, and absolutely berate Larry Lessig a few years ago in front of a large audience at an FSF meeting.  However, today he was in fine form and gave his speech "Free Software and Software Patents".  He delivered well and really punched home the point about the absurdity of patents when applied to software.

After RMS was Eben Moglen, FSF Chief Council, Columbia Law Professor, and founder of the Software Freedom Law Center.  Eben is my favorite speaker.. bar none.  He speaks with passion and insight that is truly inspiring to watch.  He gave his "After GPLv3" speech.  It was an update on the current state of the GPL revision process.  Stallman and Moglen are leading the massive effort to complete GPLv3.  I am very thankful that people like Eben Moglen are on the front lines protecting our freedom.

Eben Moglen:

Bruce Perens was in attendance: 

He seems to have taken a very strong interest in the GPLv3 recently.

... and of course there were the obligatory FSF activist signs:

RMS listening to Moglen's speech:


Now... everyone... go join the FSF and become an Associate Member.
... or at least continue your Free Software hacking and advocacy.


Goldberg... out!

#    Comments [0] |
 Friday, March 23, 2007

Python - Creating Bar Graphs with Matplotlib

Matplotlib is an open source 2D plotting library for Python.  It is very impressive and robust, but the API and documentation is maddeningly difficult to follow.

Here I have provided a function that will create a bar graph [as a png image] from a Python dictionary using the Matplotlib API.

It will auto-size the bars and auto-adjust the axis labels for you. All you need to pass into it is a dictionary data structure (and optionally a graph title and output name).


We start with a Python dictionary like this:

{'A': 70, 'B': 290, 'C': 130}


... and the function will use Matplotlib to create a graph like this:


Here is a sample script that uses my function:


#!/usr/bin/env python

from pylab import *

def main():  
    my_dict = {'A': 70, 'B': 290, 'C': 130}
    bar_graph(my_dict, graph_title='ABC')


def bar_graph(name_value_dict, graph_title='', output_name='bargraph.png'):
    figure(figsize=(4, 2)) # image dimensions  
    title(graph_title, size='x-small')
   
    # add bars
    for i, key in zip(range(len(name_value_dict)), name_value_dict.keys()):
        bar(i + 0.25 , name_value_dict[key], color='red')
   
    # axis setup
    xticks(arange(0.65, len(name_value_dict)),
        [('%s: %d' % (name, value)) for name, value in
        zip(name_value_dict.keys(), name_value_dict.values())],
        size='xx-small')
    max_value = max(name_value_dict.values())
    tick_range = arange(0, max_value, (max_value / 7))
    yticks(tick_range, size='xx-small')
    formatter = FixedFormatter([str(x) for x in tick_range])
    gca().yaxis.set_major_formatter(formatter)
    gca().yaxis.grid(which='major')
   
    savefig(output_name)


if __name__ == "__main__":
    main()


enjoy.

-Corey

#    Comments [6] |
 Thursday, March 22, 2007

Python - Convert Date/Time to Epoch

I'm not sure why, but this took me forever to figure out; so I'm posting it here for others...

Let's say you have a string representing a date and a time and you want to convert it to epoch time (# secs since the epoch).

First you will need to create a pattern for your time format, using time format directives.

For example, the pattern for:

'2007-02-05 16:15:18'

Would be:

'%Y-%m-%d %H:%M:%S'

You can then convert it to epoch like this:

int(time.mktime(time.strptime('2007-02-05 16:15:18', '%Y-%m-%d %H:%M:%S')))


Now in a script:

#!/usr/bin/env python

import time

date_time = '2007-02-05 16:15:18'
pattern = '%Y-%m-%d %H:%M:%S'
epoch = int(time.mktime(time.strptime(date_time, pattern)))
print epoch
#    Comments [0] |
 Wednesday, March 21, 2007

Sun Giving GNU Credit

RMS has been on the "GNU/Linux" naming convention rant for years; urging people to give the GNU Project and the legions of contributors credit they deserve.  Afterall, the bulk of Free Software OS userland is made of GNU contributions.

One might think that a company like Sun Microsystems wouldn't grok this concept, since most GNU/Linux distributions themselves don't.


However, some folks at Sun definitely get it:

Tim Bray - Director of Web Technologies (talking about Ian Murdoch joining Sun):

"As of this weekend Ian wasn’t even on the payroll yet and was already in in a peppy little email debate over when to say “Linux” and when to say “GNU” and when to say both."

Simon Phipps - Chief Open Source Officer:

"the combination of the GNU operating system pioneered by Richard Stallman with the inclusive development delivered around the Linux kernel by Linus Torvalds has brought a new life and energy to the extended family tree of Unix. The popularity of GNU/Linux bears testament to the vision and skill Stallman and Torvalds exhibit."
#    Comments [0] |

New O'Reilly Book About Web Performance - Coming Soon

(Note to self: buy this book when it comes out)

Steve Souders (Chief Performance Yahoo! at Yahoo) is writing a book for O'Reilly about web performance:

High Performance Web Sites


It's great to see Performance continue to gain exposure.

-Corey

#    Comments [0] |

Google Summer of Code 2007 - No Perl for You

The Perl Foundation won't be involved in Google Summer of Code 2007.

Bill Odom:

"The short version: We submitted an application to be a mentoring organization, but we weren't accepted."

However, even without the Perl community represented, the list of mentoring organizations and projects is really good!

#    Comments [0] |
 Monday, March 19, 2007

Making Applications Scalable With Load Balancing

I am in the process of tuning a large distributed system; using an F5 BIG-IP Load Balancer to distribute traffic.

Willy Tarreau has a very good overview of load balancing options:

Making applications scalable with Load Balancing

#    Comments [0] |
 Sunday, March 18, 2007

Linux - Symmetric Multiprocessing

Tim Jones gives a brief overview of SMP and discusses working with the Linux kernel:

Linux and symmetric multiprocessing


Tim Jones:

"As processor frequencies reach their limits, a popular way to increase performance is simply to add more processors. In the early days, this meant adding more processors to the motherboard or clustering multiple independent computers together. Today, chip-level multiprocessing provides more CPUs on a single chip, permitting even greater performance due to reduced memory latency.

You'll find SMP systems not only in servers, but also desktops, particularly with the introduction of virtualization. Like most cutting-edge technologies, Linux provides support for SMP. The kernel does its part to optimize the load across the available CPUs (from threads to virtualized operating systems). All that's left is to ensure that the application can be sufficiently multi-threaded to exploit the power in SMP."
#    Comments [0] |

Going Transactionless - Scalable Data Tiers

Dan Pritchett posted his excellent "How eBay Scales" presentation a few months back.

It is a great look into a real-world massive distributed system and the evolution of its scalable architecture.  One interesting thing to notice is that eBay is a transactionless environment (meaning it doesn't use Database Transactions).

I have always seen the data layer as the difficult part to scale.  Separating logic from data and working in a purely transactionless environment can mitigate this issue.

Martin fowler commented on this today:

"The rationale for not using transactions was that they harm performance at the sort of scale that eBay deals with. This effect is exacerbated by the fact that eBay heavily partitions its data into many, many physical databases. As a result using transactions would mean using distributed transactions, which is a common thing to be wary of.

This heavy partitioning, and the database's central role in performance issues, means that eBay doesn't use many other database facilities. Referential integrity and sorting are done in application code. There's hardly any triggers or stored procedures."
#    Comments [0] |
 Saturday, March 17, 2007

OLPC Machine Up Close at BarCamp Boston 2

I was at the BarCamp2 "unconference" today at MIT's Stata Center and got to see the OLPC machine  ... very cool.

Chris Ball had a prototype on hand.  Chris heads One Laptop Per Child's performance testing work.  I was able to chat with him for a bit and take some pics:

One thing that struck me was the size of the laptop. It is really very small.  The keys are much smaller than typical laptop keys (designed for children's hands).

Chris with the laptop:


This project fascinates me.  I can't wait for the abundance of future hackers.

#    Comments [0] |

Python3000 vs. Perl6 ... Wanna Bet?

Perl6...
Python3000...

Both are redesigns of very popular dynamic/scripting languages.  Both have very strong, though very different, communities supporting them.

Out of the gate, Perl's plans were much more ambitious, including a new generic virtual machine.  Python's plans were more pragmatic; more of a language cleanup than a drastic redesign.

Perl 6 was officially announced nearly 7 years ago and I don't see a stable production release coming *any* time soon.  On the other hand, the idea of Python3000 was sorta tossed around for a while and swung into gear 2 years ago.

Guido (Python's BDFL) has been spearheading the effort, whereas Perl's leadership structure is much more anarchic (Where is Larry Wall these days?).  Guido has been very transparent and kept the community aware of his worries.

Some people saw this as a slippery slope...

Chromatic:

"Language redesign is difficult, isn’t it?  Once you start challenging base assumptions, you find that a lot of your previous conclusions are shaky, and good luck reigning in blue-sky ideas!

See you in 2007… or 2008… or 2009.

Best wishes,
a Perl 6 hacker"

I disagree..

I'd bet anyone money that I will be hacking on a stable release of Python3000 long before I'm using a stable version of Perl6... any takers?


(disclosure: I have written lots of code in both Perl and Python and am a fan of both)

#    Comments [6] |
 Friday, March 16, 2007

JakeBrake's Level of Geekdom

Impressive...

JakeBrake is a true geek:

"I am so technical that:
  • I routinely do unit-level performance/timing tests on Cialis to see if will time-out at 4 hours.
  • For meals I eat only donuts and hotdogs; arranging them on my plate as "bites" in patterns of ones and zeros.  I use a burnt hotdog as a signed bit."


If you are into testing and performance, Sounds of Jake Braking blog is a great read.

#    Comments [0] |
 Wednesday, March 14, 2007

Regex "Match" in Python vs. C#

I have been writing a lot of code in both C# and Python lately... flipping back and forth between both languages.  One thing I keep getting tripped up on is the terminology used in regular expression syntax, and what a "match" is.

So for my own disambiguation:

  • Python's re.match() is different than C#'s Regex.IsMatch()
  • Python's re.search() is similar to C#'s Regex.IsMatch()


Better explained in code:


Using Regex.IsMatch() in C# to match a pattern with some text:

if (Regex.IsMatch("foobar", "bar"))
{
    Console.WriteLine("Match");
}
else
{
    Console.WriteLine("No Match");
}

this prints 'Match'


Same thing, using re.match() in Python:

if re.match('bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'No Match'


oops.. didn't get a match. What happened?

match() only checks if the regex matches at the beginning of the string, while search() will scan forward through the string for a match.


If you were expecting the pattern to match anywhere in the string, you need to use re.search() instead:

if re.search('bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'Match'


... or else you must supply a pattern that will match from the beginning of the string:

if re.match('.*bar', 'foobar'):
    print 'Match'
else:
    print 'No Match'

this prints 'Match'


#    Comments [0] |
 Monday, March 12, 2007

Bernie Velivis on Performance Testing Strategies

OpenSTA is a distributed performance testing architecture and tool.  Recently, on the opensta-users mailing list, Bernie Velivis (from Performax Inc) posted an excellent article about performance testing strategies.  I thought this information would be useful for all performance testers who are just learning the craft.  Rather than letting it sit in some arcane mailing list archive, I asked Bernie if I could re-publish it here.

enjoy.

-Corey


Bernie Velivis on Performance Testing Strategies:


PERFORMANCE TESTING STRATEGIES

The terminology used to discuss Performance testing in technical publications and support forums can be ambiguous or inconsistent. Hopefully this article will help participants in the OpenSTA user support forum by providing a common frame of reference for discussing tools, testing, and test results. It may also be helpful to those new to performance testing.

CAPACITY TESTING

If your goal is to determine the CAPACITY of the system under test, start by creating a "realistic" workload consisting of a mix of the most popular transactions plus those deemed critical or known to cause problems even when executed infrequently. Pick a manageable set of transactions to emulate (considering time, budget, and goals), determine the probability of executing each transaction, the work rate for the emulated users, and the "success criteria for performance metrics (i.e. response time limits, concurrent users, and throughput).

One way to implement this approach is to create a master script, assign it to each VU, and have it generate random numbers and then call other scripts which model the individual workload transactions based on a table of probabilities. The scripts should be modeled with think times consistent with the way your users interact with the system. This varies greatly from one application to another and unless you are mining log files from an application already in use, this is a somewhat subjective process. The best advice I can give in defining the workload is to get input from people who know how the application is (or will be) used, make conservative assumptions (but no so much so that the sum of all your conservative decisions is pathological), and balance the scope of the workload vs. time to complete the project. Another important consideration is the data demographics of the transactions and the size and contents of the database.

When its time to test, increase the number of emulated users and monitor how response times, server resource utilization (CPU, disk IO, network, and memory), and throughput (the rate of tasks completed system wide) vary with the increased load. You might construct a test that ramps up to a specific number of users, lets them run for a while, and then repeats as necessary. This way, you can observe the behavior of the system in various steady states under increasing load. Workloads containing transactions having a low probability of being executed and/or a disproportionately large impact on the performance of other transactions usually need to run longer to reach a steady state. If you can't get repeatable results, your steady state interval might be too small. As a rule of thumb I would suggest a minimum ramp up time equal to the duration of the longest running script and the steady state observation period at least twice as long as the ramp up period. I also tend to ignore response times and performance statistics gathered during the ramp up periods and focus instead on the data collected during the steady state periods.

That's a rough outline of one approach to capacity testing which in summary is an attempt to load up the system with VUs in a way that is indistinguishable from a "real users" in order to find the capacity limit. Pick the wrong workload however and you might miss something very important or end up solving problems that won’t exist in the real world.

The end game here is to increase load until response times become excessive at which point you have found the system’s capacity limit. This limit will be due to either a hardware or software bottleneck. If time and goals allow, analyze the performance metrics captured, do some tuning, improve code efficiency or concurrency, or add some hardware resources. Make one change at a time and repeat as necessary until you meet capacity goals, find the limits to the architecture, or run out of time (which happens more then most performance engineers would like).

SOAK TESTING

The same scripts created for capacity testing can also be used for SOAK TESTING where you load up the system close to its maximum capacity and let it run for hours, days, etc. This is a great way to spot stability problems that only occur after the system has been running a long time (memory leaks are a good example of things you might find).

FAILOVER TESTING

Get the system under test into a steady state and start failing components (servers, routers, etc) and observe how response times are effected during and after the failover and how long the system takes to transition back to steady state and you are on your way towards FAILOVER TESTING. (A gross simplification and again there is lots of good reading material out there on failover and high availability testing).

STRESS TESTING

If your goal is to determine where or how (not if) the system will fail under load, then you are doing STRESS TESTING. One way to do this is to comment out the think times and increase VUs until something (hopefully not your emulator!) breaks. This is one form of stress testing, a valuable aspect of performance testing, but not the same as capacity testing. How the VUs compare to "real users" may be irrelevant as you are trying to determine how the system behaves when pushed past its limits.


A report illustrating how these concepts were used to performance test a SOAP application using OpenSTA can be downloaded here:
SamplePerformaxPerformanceReport.pdf


Bernie Velivis
Principle Consultant and President, Performax Inc
www.iPerformax.com


#    Comments [0] |

Zabbix - Open Source Network/Infrastructure Monitoring

I have used Nagios for several years, and it has served me well as an open source distributed monitoring system.

I just read about Zabbix, and I'm posting here so I won't forget to check it out.  Zabbix is GPL (v2) licensed and looks interesting.  I will post more once I get a chance to play with it.

#    Comments [0] |
 Saturday, March 10, 2007

Python - Iterating Multiple Sequences

Here are some examples of iterating through multiple sequences simultaneously in Python:


I start with 2 lists of numbers:

foos = [0, 1, 2, 3, 4]
bars = [1, 2, 3, 4, 5]

I want to create a new list that is made up of the sum of the items at each position in the original lists.  So I will end up with this:

>>> print foobars

[1, 3, 5, 7, 9]


Starting with an unpythonic way...
Here I use a counter to iterate through the indexes of each sequence and build a new list:

foobars = []
for i in range(len(foos)):
    foo = foos[i]
    bar = bars[i]
    foobars.append(foo + bar)


Getting more pythonic...
Here I use zip. Zip allows me to iterate each sequence simultaneously, assigning the current sequence values each time through the loop:

foobars = []
for foo, bar in zip(foos, bars):
    foobars.append(foo + bar)


The older pythonic way to do this was with map:

foobars = []
for foo, bar in map(None, foos, bars):
    foobars.append(foo + bar)


Getting even more pythonic and more concise...
I can combine zip with a list comprehension and do it in a one-liner like this:

foobars = [foo + bar for (foo, bar) in zip(foos, bars)]


*note:  zip will not be part of Python 3000.  It will be replaced by izip and iterators to achieve similar results.

#    Comments [2] |
 Friday, March 09, 2007

Joe Barr Lays the Smack Down on OSS FUD

In his article: "Joe Barr rips proprietary software vendor a new one", Joe does exactly what the title states  :)

His article was a response to an earlier piece by Roger Greene (CEO of Ipswitch), where Roger says some very confused/uninformed things about Open Source software.


One thing Joe didn't rip was this excerpt from Roger Greene:

"The open source community claims bugs can be fixed faster for open source software than commercial software because hundreds, if not thousands, of people are looking at the code daily and can help with fixes. [ ... ] Even when those individuals generously offer their time for free, can you really afford to wait for one to agree with you on the urgency of action if your network is down?"

Huh?

That is a very odd and misleading way to look at it.  Open Source gives you the ability to modify the code yourself.  You don't have to wait for anyone.  You can hire a freelance developer or consultancy to fix it on the spot.  If you find a problem in a proprietary vendor's software, can you do the same?

No.. proprietary software puts you at the mercy of your vendor.

#    Comments [0] |
 Thursday, March 08, 2007

PLEAC - Programming Language Examples Alike Cookbook

I just stumbled across the PLEAC Project (Programming Language Examples Alike Cookbook).

Project Description:

"Following the great Perl Cookbook (by Tom Christiansen & Nathan Torkington, published by O'Reilly; you can freely browse an excerpt of the book here) which presents a suite of common programming problems solved in the Perl language, this project aims to gather fans of programming, in order to implement the solutions in other programming languages."


There is sample code in many popular languages.  The Python examples are really good.  They would serve as an excellent primer for someone moving from Perl to Python, or as a general Python reference with cookbook-style examples.

It is hosted at SourceForge and licensed under the GNU Free Documentation License (GFDL).

#    Comments [0] |
 Wednesday, March 07, 2007

Play Corey's Tunes - Last.fm

I got hooked on Last.fm last summer.  Since then, I've scrobbled over 6,800 tracks.. not bad!  (I've been a music junkie my entire life).

I submit my played tracks from all of my music players (Foobar2000, Winamp, iTunes, Squeezebox/SlimServer).  It populates fantastic statistics and charts of my listening habits and lets me listen to streams from people with similar tastes.

Here are my top 10 artists since I started using it:


Once you scrobble enough tracks, you can start streaming custom stations based on your listening habits.  Something new they just added is the ability to embed a Last.fm player widget into your own site. I just knocked up a quick web page with the embedded player so I can listen to my own station from anywhere that has a browser (requires Flash).  Check it out and have a listen.

#    Comments [0] |
 Tuesday, March 06, 2007

Show Us Your Rack!

Reverend Ted just started the "Show Us Your Rack" blog campaign ;)

Well.. I've seen a few nice racks in my day.  I took these pics last year.  Forget the tiny colo cage, this data center gives you some room to spread out:


plenty of room:


looking down the aisle:


got load balancing?:


the big dogs:

#    Comments [0] |
 Monday, March 05, 2007

.NET CLR - Covertly Throttling Thread Creation

Joe Duffy (from Microsoft) talking about the .NET 2.0 CLR:

"It's also worth noting that the threadpool throttles its creation of threads to 2/second once the count has exceeded the # of CPUs."


Yuck...  I don't like throttling like that behind the scenes.  It can make performance problems *very* hard to diagnose.

#    Comments [2] |
 Sunday, March 04, 2007

Python Parameters - Pass-By-Value or Pass-By-Reference?

Passing parameters to functions and methods.  Pass-by-value?  Pass-by-reference?  Which does your language use?

You probably learned this in your first CS class... so did I.

Then why did it take me a frakin' month to understand what Python does? :)

Well... if you look online, you will find some very ambiguous answers about Python being pass-by-reference or pass-by-value.  (which ends up boiling down to semantics and how you use certain terminology, but forget that for now)


To review, how do other languages handle this concept?

C is pass-by-value

Straightforward. You can simulate pass-by-reference with pointers.  Not much else to say here.

Java is pass-by-value

Primitive Types (non-object built-in types) are simply passed by value.  Passing Object References feels like pass-by-reference, but it isn't.  What you are really doing is passing references-to-objects by value.

OK, so what about Python?

Python passes references to objects by value (like Java), and everything in Python is an object. This sounds simple, but then you will notice that some data types seem to exhibit pass-by-value characteristics, while others seem to act like pass-by-reference... what's the deal?

It is important to understand mutable and immutable objects. Some objects, like strings, tuples, and numbers, are immutable.  Altering them inside a function/method will create a new instance and the original instance outside the function/method is not changed.  Other objects, like lists and dictionaries are mutable, which means you can change the object in-place.  Therefore, altering an object inside a function/method will also change the original object outside.



For entirely too much information about this topic in Python and across many other languages (Java, Scheme, C#, C, C++, Python), read the thread where these quotes come from:

Is Python By Value Or By Reference?

Alex Martelli:

The terminology problem may be due to the fact that, in python, the value of a name is a reference to an object. So, you always pass the value (no implicit copying), and that value is always a reference.
[...]
Now if you want to coin a name for that, such as "by object reference", "by uncopied value", or whatever, be my guest. Trying to reuse terminology that is more generally applied to languages where "variables are boxes" to a language where "variables are post-it tags" is, IMHO, more likely to confuse than to help.

Michael Hoffman:

Alex is right that trying to shoehorn Python into a "pass-by-reference" or "pass-by-value" paradigm is misleading and probably not very helpful. In Python every variable assignment (even an assignment of a small integer) is an assignment of a reference. Every function call involves passing the values of those references.


word.

#    Comments [0] |