Sdílet prostřednictvím


Using a MemoryStream with GZipStream [Lakshan Fernando]

We’ve seen cases where our customers have run into issues when using a MemoryStream with GZip compression. The problem can be frustrating to debug and I thought I’ll blog about it in the hope that others would avoid a similar issue. The code for this looks like this;

        Byte[] compressedBuffer;

        MemoryStream stream = new MemoryStream();

        using (GZipStream zip = new GZipStream(stream, CompressionMode.Compress))

        {

            //compress data

            ...

            //Dont get the MemoryStream data before the GZipStream is closed since it doesn’t yet contain complete compressed data.

//GZipStream writes additional data including footer information when its been disposed

compressedBuffer = stream.ToArray(); //WRONG

        }

        // CORRECT CODE: call compressedBuffer = stream.ToArray() here after the GZipStream is disposed

The problem arises because the data in MemoryStream is not complete when ToArray is called before the GZipStream is closed. We will write any remaining compressed data and footer information to GZipStream when its being closed. The data in the MemoryStream is still accessible even after its been closed. Both ToArray and GetBuffer methods will return valid data after the MemoryStream has being disposed. This is not so much an issue when another stream like FileStream is used in compression since there is generally time before decompression when a file is used and its ok for the file to be re-opened when that happens.

Comments

  • Anonymous
    May 10, 2006
    no
  • Anonymous
    May 10, 2006
    Isn't it a really a non-intuitive behaviour? What is the reason for this strange design decision?
  • Anonymous
    May 11, 2006
    I agree that this behavior is not obvious - it would be useful to reconsider the design.

    The GZip class itself could provide a ToArray method; this would eliminate the need to concern ourselves with the state of the the Stream object, either within or outside the scope of the using statement. The semantics of ToArray could include appending the footer info to the array without closing the zip stream.

    Also...
    the code:

    using (GZipStream zip = new GZipStream(stream, CompressionMode.Compress)

    will set the internal _leaveOpen field to a value of false, which means that when the zip closes it also closes the underlying stream, so when the code exits the scope of the 'using' statement it closes the MemoryStream - attempts to access the length throw an exception. It should be changed to:

    using (GZipStream zip = new GZipStream(stream, CompressionMode.Compress,true)
    {
    }
    // access MemoryStream object here...
    buffer = ms.ToArray();

    It would also be goodness if the ZIP classes provided better compression - I can only get about a 78% compression ratio on a pure text file - I was getting about a 98% ratio using other libraries. Will this be improved in future versions?


  • Anonymous
    May 11, 2006
    The comment has been removed
  • Anonymous
    May 15, 2006
    It sounds like your customers are seeing the GZipStream retain some data due to compression block sizes. Would it be better to call the Flush method rather than relying on the implicit flush during close?
  • Anonymous
    May 16, 2006
    Thanks a lot for your comments.

    The footer information is written at GZipStream.Dispose() time and this means that the MemoryStream doesn't have complete data until its closed. As mention earlier, this is not an issue with FileStream since the common usage incorporates Close before decompressing the bits. It seems unintuitive when using the MemoryStream.

    We are looking into addressing performance and size issues of GZipStream in a future release.
    Thanks
    Lakshan Fernando
    BCL Team
  • Anonymous
    October 10, 2006
    While working on compressing view state for web pages I encountered an issue in using the compression...