다음을 통해 공유


A quite efficient compression algorithm: Blosc

Recently I just learnt about a new compression algorithm Blosc (https://www.blosc.org/trac). It has amazing performance, but with poor compression ratio (by design).

I have done a performance comparison between the Blosc and the .Net built in Deflate compression algorithm.

Here is my PC environment:

  • CPU: Intel Xeon CPU E5-1620 3.6 GHz
  • RAM: 16.0 GB

For Blosc, I use the following settings:

  • Number of threads: 1

    • Set as single thread for fair comparison. Using multi-threads will improve the throughput a lot.
  • Compression level: 9

    • It is the desired compression level and must be a number between 0 (no
      compression) and 9 (maximum compression).
  • Whether do shuffle: No

  • Type size: 8

 

DataSet

Blosc  

Compression

Throughput (MB/S)  

Compression  

Ratio

Deflate  

Compression

Throughput (MB/S)  

Compression

 Ratio

[advwks_cust.dat]

 

220.3

2.8

 

33.7

4.4

[advwks_fact.dat]

 

451.1

4.9

 

58.4

14.5

[experian_fact.dat]

 

285.0

3.0

 

26.7

6.2

[jitb_fact.dat]

 

290.7

2.2

 

16.9

3.9

[mssales_fact.dat]

 

501.9

6.0

 

41.7

10.4

[mssales_prod.dat]

 

530.0

5.8

 

50.5

10.1

[skype_fact.dat]

 

213.0

2.1

 

15.9

4.0

[synthetic_int.dat]

 

359.4

3.9

 

15.7

5.7

Comments

  • Anonymous
    June 05, 2014
    An interesting property of the Blosc algorithm is that it supports a built-in multithread compression. However, its exposed compression API is not thread safe, which is quite different from the existing other compression algorithms (there is no built-in multithread, but the API is thread safe). This may make it harder to integrate with your existing framework if using multi-thread compression.

  • Anonymous
    December 11, 2014
    Adding multithreading to compression libraries is a trivial task. Blosc is no more than Shuffle+Compression. You can add shuffle+Multithreading to any compression library.