~hruske Hruške, jabuke, jablane, čežane. » Blog Archive » A replacement for bzip2 compression
Home Contact Sitemap

Hruške, jabuke, jablane, čežane.

Ste se gdaj vprašali, zakaj Najboljšega soseda nikoli ni doma, ko pridete na obisk?

A replacement for bzip2 compression

Posted on Maj 15th, 2008 in debian, linux |

I was looking for a replacement for the old and slow bzip2. Not to say it doesn’t work, it’s just too slow for use case I have, and gzip just doesn’t bring enough space savings. And, after all, why settle for less than you’re able to achieve?

So, after checking out Linux compression utilities, the only one that fits my requirements:

  • it should be included in Debian repositories
  • it should compress approximately as good as bzip2 does in similar time, but should decompress faster
  • if possible, it should be free software

The only real contestant is 7-zip, which uses the super efficient LZMA algorithm for compression. It can be quite slow, though. So, the idea is to try to fine tune the utility, to use at most the space bzip2 would or less, and be faster when decompressing. p7zip, the Unix port, has similar compression settings as gzip has, ranging from 1 to 9. I tested some of them, to find optimal settings for my use case and made some benchmarks. I used three different test files, all of which were tar files, but with different contents. Test case 1 was 40MB of text, test case 2 was about 200MB of a recent Haiku OS image and test case 3 were essentially a 70MB bunch of Java JAR files.

This is how the archives compressed. Numbers are normalized to bzip2, for comparison.
7-zip vs. bzip2: Archive size comparizon

Time needed for compression:
Time needed for compression by algorithm and test case

Time needed for decompression:
Time needed for decompression by algorithm and test case

You can see that 7zip always decompresses faster, and that in general, higher 7z compression makes the archive decompress faster. Interesting.

Some more info:

  • 7zip was Debian package p7zip-full 4.57~dfsg.1-1
  • bzip2 was Debian package 1.0.5-0.1
  • test machine was 2.16GHz Macbook with 2GB RAM, doing only the tests
  • frequency scaling was off
  • all files were first cached in RAM by doing “cat file > /dev/null” so disk I/O was not the bottleneck

Comments are closed.

Komentarji so izklopljeni