I have some utilities to read and write zip files in an application I maintain. I’ve been periodically running into “buffer errors” in the GZipReader library of Zlib. I’ve been working around the errors in various ways but today I finally got stopped cold and had to figure out what was going on. It turns out that the RubyGems people have also seen this error and wrote some code to solve it. So I adapted their code to my purposes and I’ll share what I’ve got in hopes that someone else can use this stuff too.
The code is relatively straightforward. You open a gzip file as a regular File IO object in binary mode. You then use the low level gzip reader to pull the unzipped data into a StringIO object. Then you can stream that StringIO into an output file. Here are some assumptions:
unzipped_filename = string holding path/filename of where you’re saving the unzipped file
zipped_filename = string holding path/filename of where you’re gzipped file is located
File::open(unzipped_filename, 'wb') do |unzipped_file|
unzipped_file.binmode
gz = File::open(zipped_filename, 'rb')
gz.binmode
begin
zis = Zlib::GzipReader.new(gz)
dis = zis.read
io = StringIO.new(dis)
ensure
zis.finish if zis
end
unzipped_file.write(io.read)
unzipped_file.flush
unzipped_file.fsync
gz.close
end
GC.start
That’s all there is to it! I call flush and fsync to make sure your file is actually on disk before leaving the routine. Also, GC.start will clean up a lot of ram right away that was used by this function if you’re dealing with extremely large files (I have a unit test for this against a random 100 megabyte ascii file that works now). Performance isn’t bad either!
Note: If you’re using Gnu-Win32 utils (at least) and you gzip a file using maximal compression, don’t expect Ruby’s Zlib to work correctly. I’ve found that using standard compression results in gzip files that Ruby can read more reliably (i.e. without “buffer error” problems).
Addendum 9/22/07: I’m still getting torqued on a few cases where Zlib is returning buffer error for gzipped files on Windows. I have no work around but the solutions above have reduced the # of errors I’m getting. I’m about ready to just shell out to `gzip` and forget about code-driven solutions..
Addendum 5/20/08: I have swapped out my Ruby code and now just shell to do all this:
`gzip -d #{zipped_filename}`
It seems to do the job. Also, zipping files has never caused a problem, it’s unzipping which often fails in Ruby win. Probably b/c in zip mode Ruby is feeding the OS at its pace, but in unzip mode, the OS is throwing bytes at Ruby.
I’m running into this same error when trying to install rails with gems (Ruby 1.8.6-25). How would I go about applying your fix in my situation?
Try this - I’ve heard it fixes a lot of issues like what you’re having:
Thank you, the ‘gem update –system’ solved my issues.
I am stilling getting Zlib:BufError after doing “gem update –system”.
Taowen: Ruby’s win zlib blows is all I can say. Perhaps try to get on the newest revision of Ruby. Or try switching to JRuby or Rubinius - I hear JRuby is pretty far along though I’m not sure about Windows. Rubinius finally booted Rails recently but again I’m not sure about Windows. Hope that helps!
Experimenting with this resulted in the following working code:
It seems that gzio.read() does not always work when the whole result is requested at once. Sounds like an overflow somewhere…
Hm, a correction after having processed more files: 8192 bytes seems to be OK, while 16384 does not seem to be OK. So change the ‘65536′ in the previous post to 8192.
Rutger: My experience tells me that what you’re actually experiencing is not a definite solution but just a work around that works in certain test cases. I’ve come up with a number of solutions that seems to reduce the incidence of errors only to be surprised that on certain architectures with other gzip files that the lib starts failing again.
So I greatly appreciate the code - I think I have also experimented with reading only pieces of the zip file into a buffer and found that the dang thing still crashes in some cases. Keep us posted on your mileage! For now I’m sticking with:
You’re right, running it against a large number of .gz files I found that 8192 is still too high ;)
2048 seems to work for at least the 58000 cases I’ve got here, so I’m going to stick with that. The gzip solutions makes it too slow for me (58000 times starting a process on Windows is no fun), and makes the installer larger since I need to bundle gzip.exe and cygwin1.dll with it.
Thanks for the tips Rutger. Just fyi you can get native compiled gzip without the cygwin layer from the most excellent gnuwin32 libs: http://gnuwin32.sourceforge.net/packages/gzip.htm