Share via


Text data and Compression

Most of the developers believe that we can get good compression ratio if we compress text data. They believe that any text data can be compressed to great extent. If this perception is true then we can compress Giga bytes of data to few Kilo bytes. Unfortunately this perception is wrong. To prove this perception is wrong I wrote one console application in .NET. I took GzipStream from .NET class library to compress and de-compress the data. I compressed one string value using GzipStream which gave me byte [] as output. I converted the byte [] to base64 string which is of pure text. I compressed the base64 string again to prove that all the textual compression would not give good compression ratio. Attached is the code.

If you run the console application this is what you would see as output

Console window 

The original text length I have provided for this test is 717. Once I compressed the text, the compressed byte[] length is 454. After converting the byte[] to base64 string, the length of the base64 string is 608. The code executes this set of code in a loop. Here is the test solution I wrote.

Comments

  • Anonymous
    March 29, 2008
    Thought I would bring this back again and see if it was popular or not this time around. So many things