goldb.org home

AS OF MAY 2008, THIS BLOG IS NO LONGER BEING UPDATED.
Visit the new blog at: http://coreygoldberg.blogspot.com



 Tuesday, November 27, 2007

Python - Extracting Files From Zip Archives

Here is a way to unzip files in Python.  If you have a zip containing multiple files, you can unzip it like this:

import zipfile

fh = open('foo.zip', 'rb')
z = zipfile.ZipFile(fh)
for name in z.namelist():
outfile = open(name, 'wb')
outfile.write(z.read(name))
outfile.close()
fh.close()
#    Comments [6] |
Tuesday, November 27, 2007 3:50:11 PM (Eastern Standard Time, UTC-05:00)
This approach won't work for excessively large .ZIP files however. Since the source file is expanded in memory, you're limited by the host platform's OS. I needed to programatically expand a 30Gb zip file, and of course Python choked. I ended up using an external zip tool and refactored the code to work with the unzipped file instead.
Nick Danger
Tuesday, November 27, 2007 4:17:27 PM (Eastern Standard Time, UTC-05:00)
@Nick:

interesting, I see what you mean. I have used this successfully for unzipping files of a few Gigs, but have never tried to go beyond that limit.

-Corey
Tuesday, November 27, 2007 4:56:59 PM (Eastern Standard Time, UTC-05:00)
Additionally permissions won't be set correctly (eg read only, executable). Also you will need to create subdirectories if the contents are in sub-directories. zip also often has directories as entries so you need to be careful with those.

And just to be hyper annoying, the tarfile module has a completely different API. I don't know which one came first, but the second one should at least have copied the method names. eg it is namelist() in zipfile and getnames() in tarfile. zipfile only extracts to memory while tarfile wants to extract to disk.
Tuesday, November 27, 2007 5:07:56 PM (Eastern Standard Time, UTC-05:00)
@Roger:

so i guess my example only works for zip files that do not contain nested directories, and where permissions don't need to be preserved.

thanks!
Friday, January 04, 2008 5:46:25 PM (Eastern Standard Time, UTC-05:00)
Not to mention.... you can't unzip anything that would require memory in excess of 2GB. You mentioned you were using this on files of a "few" GB, but I can't get it to even unzip a zip file that is a mere 500MB...
Friday, January 04, 2008 9:58:01 PM (Eastern Standard Time, UTC-05:00)
I can't remember th exact file size I unzipped but it was certainly bigger than 500MB.. and was definitely over a GB.
Comments are closed.