DotNetZip users: Please Test Unicode support (free zip library for .NET)

DotNetZip is an open-source library to allow any .NET application to read and create zip files.  If you want your ASP.NET page to grab an uploaded zipfile and unpack it on the server, DotNetZip can help you.  If you want to generate a zipfile from a server-based app like an ASP.NET page and send it as a download, DotNetZip can be the library you need.  If you have a windows forms app or a WPF app that needs to read or write zipfiles, DotNetZip can help.  If you have an agent application that watches for files to be dropped into a directory, then zips or unzips them, DotNetZip is the thing you need.

DotNetZip has been available as an open source project on CodePlex for about a year. In that time I've added a bunch of features as requested by people who are using it - support for passwords, support for stream-based interfaces, zipfile comments, the ability to remove entries from zip files, progress events for zipping up or unzipping, finer control over the use of compression, and so on. 

The #1 most requested feature for the DotNetZip library is Unicode support - people want to be able to zip up files that have filenames with non-ASCII characters. 

I've produced a preliminary release with support for Unicode, and I'm asking for people to try it out, test it and tell me if this does what they need it to do.  The thing is, Unicode was added to the PKWare specification for zip files only about a year ago.  There are few tools out there that properly support the spec.  In particular, Windows XP and Windows Vista do not support the PKWare Unicode specification in the "compressed folders" feature.  In Windows, if you click on a zip file that complies with the PKWare spec, you won't get what you want to get.  And it's not just Windows.  Most tools don't support Unicode zip files.  As a result, I don't know how to test the Unicode support effectively. 

Because zip tools that do Unicode are few and far between, I used a fallback approach in the library.  In the latest prelim release, DotNetZip uses the IBM437 code page to encode non-ASCII characters by default.  This is not in the PKWare spec, but zip files that people have sent me are using this encoding.  IBM437 in zip files may be a quiet, unspecified, de-facto standard, "good enough" for many people.  Of course, using the IBM437 mode, you cannot do Chinese characters, which is a huge hole.  But IBM437 does cover characters with umlauts and tildes and so on, a lot of Latin languages. 

And now I'm asking for your help.  I'd like everyone who uses DotNetZip to try out the new Unicode and IBM437 stuff.  It is available in the latest v1.6 prelim release availalble on the releases tab. 

For tests: 
Use the library to zip up files that have "Unicode filenames", or more accurately, filenames with characters that cannot be represented in 7-bit ASCII.  This might be characters with umlauts, tildes, and so on, in addition to characters from the non-Latin languages  - Hebrew, Greek, Cyrillic, and of course, Chinese. 

See if it works, see if the files zip up and unzip properly.  See if the files open in Explorer the way you expect.  See if it works intuitively, if the library behaves the way you would like it to behave.  See if the zip files can be read by other tools and libraries (7-zip or zlib or winrar).  That kind of thing.

You will have to check the doc on the UseUnicode property on the ZipFile to see the options and the details of the implementation.  I encourage you to try all your tests with that flag both ON and OFF - to see how it behaves and what you would prefer. 

If find something that breaks, or surprises you, then please do report it.  If you are really ambitious, you can write up a test case that reproduces the problem you observed and attach it to a new workitem

The new Unicode support in the library is easy to describe, but it is hard for me to test.  It is hard for me to know if I am testing the right things.  I need help on this, before I can declare the unicode support useful, stable, and interoperable.

Check it out!

Comments

  • Anonymous
    September 19, 2008
    PingBack from http://www.easycoded.com/dotnetzip-users-please-test-unicode-support-free-zip-library-for-net/

  • Anonymous
    December 04, 2008
    I want to zip a files contatins denish special characters i.e dDåÅæÆøØ¥¢.pdf. I am using encoding IBM437 to zip a file. But it replaced Ø with O. I read the artical you referenced for page codes. http://en.wikipedia.org/wiki/Code_page_437. In this artical CP437 is recomended for Ø.. But it doesn't solve my probelm. Please send me solution as soon as possible

  • Anonymous
    December 11, 2008
    This comment list is probably a bad place to ask for help.  If you are using DotNetZip, then use the forums on www.codeplex.com/DotNetZip to ask your question. If you are using something else to create the zip file, then you should ask your question at a different forum, too.