Long Paths in .NET, Part 3 of 3 [Kim Hamilton]
Updated 6/10/08 2:20pm: clarified details of proposed solution
Here it is, Part 3 of the long path series, which started over a year ago. I apologize for leaving you hanging; the BCL team has been busy lighting up the web. Because of the delay, I’ll summarize the compatibility concerns as context for the proposed solution.
Summary of Compat Concerns
Recall from Part 1 that one way to bypass the MAX_PATH limit with Win32 File APIs is to prepend \\?\ to the file name. This allows you to create paths that are only subject to NTFS restrictions, and the length limit is 32K. However, \\?\ has another side-effect -- it bypasses all Win32 file name canonicalization.
BCL gets a lot of requests for long path support. Some specifically request that we allow the \\?\ prefix. This brings up the question: is \\?\ requested because it allows longer paths, or do users also want to create paths that don’t conform to Win32 naming conventions? Our investigation indicates that, while there are specialized areas where non-canonical file names are useful, the overwhelming majority of users just want longer paths.
Why is this distinction important? If you just want longer paths, you don’t necessarily want the side effect of turning off all Win32 file naming conventions. For example:
- ‘/’ will no longer be converted to ‘\’ ; instead it’s part of the file name
- Trailing white spaces will no longer be removed; they’re also part of the file name
- You can create file names with reserved device names.
File names like this are problematic for other apps (independent of the path length: using \\?\ you can create a file name shorter than MAX_PATH that doesn’t adhere to Win32 naming conventions), to the extent that (I expect) a great majority of users will want the framework to enforce canonicalization, at least as the default behavior.
Note that the above statements are a commentary on unbridled use of \\?\. The problem could still be resolved as follows: behind the scenes we first canonicalize the path using GetFullPathName (since GetFullPathName isn’t subject to the MAX_PATH restriction) and then prefix\\?\. Perhaps non-canonical names could be allowed on an opt-in basis.
Either way, suppose .NET lets you create paths up to 32K in length. Now you have a new problem: you have a file that, most likely, no other app on your system can use. It would have to support the \\?\ syntax. Furthermore, many .NET APIs won’t even be able to work with this file: recall from Part 1 that this syntax only works with the Win32 file-related APIs, but not for general Win32 functions that accept paths (e.g. LoadLibrary).
Goals
This blog series has focused fairly heavily on nuances of the \\?\ prefix, simply because it’s commonly viewed as the workaround to the MAX_PATH limitation. Let’s switch focus to some reasonable goals in the absence of a unified solution (exposed, for example, by Win32 APIs).
- Reduce the incidence of hitting path length limit for opening existing files in a compatible way; i.e. the path can only be modified into a form that Win32 APIs accept as legal.
- For users that need consistent access to long paths, even if that involves bypassing Win32 restrictions, allow access to longer/non-canonical (tbd) file names.
Because of the compat concerns with Goal 2, we don’t want users to “accidentally” use this solution.
Proposed Solution
Fortunately, the Vista shell has provided a precedent of allowing longer path names in a compatible way. It’s called auto-path shrinking and it attempts to squeeze a file name into MAX_PATH by shrinking the long file names into the short name equivalents piecewise behind the scenes. Before describing that, note that the proposed solution is a hybrid approach:
1. Try to squeeze the file name into MAX_PATH characters using auto-path shrinking. Only used for existing files, and paths that don't have the file:///?\ prefix (see below)
2. Allow use of the file:///?\ prefix for creating as well as opening (in general allow this for every operation corresponding to a Win32 file API that supports this). We will not attempt to add the file:///?\ prefix behind the scenes; at most we'll provide a helper to perform such as AddLongPathPrefix. In any case, the user must intentionally request this and not stumble into using file:///?\ by accident. This part is TBD: we think it makes sense to expose as an option whether we should always enforce other Win32 file name restrictions other than length, and enforcing file naming rules would be the default.
Let's describe auto-path shrinking a bit more. If you pass in a file name that exceeds MAX_PATH:
C:\alongdirectoryname\anotherlongdirectoryname\...
It will try to shrink it under the MAX_PATH limit by using the short name equivalents:
C:\alongd~1\anothe~1\...
This solution may seem odd at first (beyond the ironic spin that we’re coming full circle to short file names). But it’s very compelling for adoption in the framework since paths of this form are acceptable to Win32 API (it’s a valid Win32 file name).
Some important clarifications:
- Note that this solution wouldn’t require you to actually use short file names. When you create a file on an NT-based system, a short name is generated and associated with it*. You would be able to pass “normal” (long file name) file names to .NET APIs, and the short name conversion would happen being the scenes. (Just wanted to make it clear that we’re not reverting to the pre-Windows 95 days.)
- Right now, you can’t use short names to .NET APIs to get around the MAX_PATH restriction. During canonicalization, we expand short names to long names, and throw a PathTooLongException if the long name exceeds MAX_PATH.
* This brings up two questions. One is that users can turn off short file name generation via a registry value. This is discussed below. Also, you’ll notice this solution is NT-focused, but Silverlight can run on Macs. We also intend to handle platform-specific path limits with long path efforts, instead of enforcing Windows MAX_PATH (as we do currently).
Analysis
Allowing use of file:///?\ will likely require a permission demand greater than FileIOPermission, perhaps even full demand for full trust. However, for many apps that need to work with long paths, this isn't a problem. We should investigate ways to relax this demand for partial trust scenarios like isolated storage.
Let’s look at some pros and cons of auto-shrinking:
Cons:
- Doesn't work if user has turned off short file names. (But this is uncommon.)
- Given that shrinking is happening behind the scenes, it could make path length issues even more confusing to users. \\?\ allows a consistent higher limit of 32K. Examples and explanations are given below.
Pros:
- A path name that can be shrunk into MAX_PATH as above is accepted by any Win32 API (clarification: assuming the name is given to the Win32 API in shrunk form. Win32 APIs do not perform auto-shrinking)
- This is an appealing "try to make it work" solution (if the user hasn't provided file:///?\). Until there’s a standard solution in the Win32 APIs, it’s better from a maintenance and usability perspective if we’re all using a similar approach.
We’re curious to hear your feedback about this approach.
Navigation
Comments
Anonymous
June 10, 2008
PingBack from http://blogs.msdn.com/bclteam/archive/2007/03/26/long-paths-in-net-part-2-of-3-long-path-workarounds-kim-hamilton.aspxAnonymous
June 10, 2008
The comment has been removedAnonymous
June 10, 2008
The comment has been removedAnonymous
June 10, 2008
The comment has been removedAnonymous
June 10, 2008
The comment has been removedAnonymous
June 10, 2008
The comment has been removedAnonymous
June 10, 2008
The comment has been removedAnonymous
June 10, 2008
Kim, Whether you allow ? in parallel with path shrinking is mostly irrelevant. You would still be introducing more complex path handling rules, relying on short names, and add a lot of potential complexity in .Net's behaviour. The drawbacks are still there. Moreover, if you give a simple, partial solution and a solid, harder to use one, most programmers will choose the first. The option to explicitly use of ? would be mostly ignored. On the other hand, if you focus on using ? behind the scene, you could try to create an easy to use, cleaner API, and try to hind the quirks of the underlying Win32 calls.Anonymous
June 10, 2008
Note that I rearranged the post to highlight that ? is indeed part of the solution and that auto-shrinking helps "make it work" for opening existing files when ? is not used. I have to agree that my mention of ? support was rather buried before -- basically it was the closing sentence of the "solution" part. I guess I can't expect everyone to pore over every sentence. Hope it's clearer now. :)Anonymous
June 10, 2008
Please be aware that 8.3 filename generation has to be disabled for any application that creates and reads files rapidly, and has more than a few thousand files per directory. Otherwise the performance is terrible -- there is a KB about it which I can't find right now. So disabling 8.3 generation is much more common than you might think.Anonymous
June 10, 2008
IMO you should prepend ? automatically for all .NET APIs that support it (but still enforcing the other path restrictions of course). .NET APIs that wrap Win32 functions without ? support, should use the proposed automatic path shrinking if the path is too long, so that they'll be also able to handle long paths whenever possible. Of course you will still have problems with other applications that don't support long file paths, but these applications would not work with path shrinking either, if they don't shrink paths themself. Also while the proposed solution will only work if short file name generation is not deactivated, this solution would still be able to work in most (or many) cases. What do you think?Anonymous
June 10, 2008
Eric, I don't think you understand the algorithm that is being proposed. It wouldn't be blindly truncating and appending "~1" to the file name. It would use the correct short name for a given long name. I can't believe how anybody could imagine it working any other way. As to the original question, given that ? will be supported anyway, I'm not sure what the benefit of path shrinking would be. It would work some times, but not all times, and so you'd still need to work around the times it doesn't work with ?. I think path shrinking should be limited only to those cases that won't be fixable by supporting ? -- namely LoadLibrary and that sort of thing.Anonymous
June 10, 2008
The comment has been removedAnonymous
June 10, 2008
orcmid: Notice the comments above about how short name generation can be turned off. While it's true that turning off short name generation won't delete the short names that already exist, it obviously won't generate any new ones, so it'll only help you in some situations. I believe the main problem with shrinking the paths is that it only works some of the time, and it's totally arbitrary whether it'll work on any given path (so your application might be OK on an operating system that has short name generation turned on, but if your customer has it turned off, it might not work for them... I imagine it's going to be very difficult to track those sorts of problems down -- better to fail consistently).Anonymous
June 10, 2008
As part of the long-term plan, have you considered deprecating strings-as-paths altogether and using the Uri class instead?Anonymous
June 10, 2008
If you're changing the filename parsing in the BCL, what about allowing names that specify alternate NTFS streams? i.e. c:MyFile.Txt:extra_metadata At the moment (well, last time I tried) it will get upset about the extra : character.Anonymous
June 11, 2008
The source of the main title is an inside joke I am probably not going to ever explain within the blog.Anonymous
June 11, 2008
The comment has been removedAnonymous
June 11, 2008
Urk about short-name generation being disabled. So, I agree, this is a SYSTEM issue and in particular a matter with regard to the file system. I guess I don't know what the use case is that has exceeding MAX_PATH be locally useful while remaining globally safe. More fodder: Zip files and directory-simulating Zip access, and then there's the part: protocol and OPC um, the file: URI, oh, and who gets to figure out all the threat-modeling of this combined with NTFS streams, hiding root kits, etc. and what about using long-path injection as a form of buffer-overrun exploit, although I suspect that would be hard (assuming the Win32 APIs never change and neither does the MAX_PATH constant, for obvious down-level protection reasons). Just tossing things around here. The bigger problem with this conversation is that the inside one is completely separated from the outside one and who knows what well-trodden ground we are revisiting. So I think I will shut up now. I hate write-only feedback channels.Anonymous
June 11, 2008
The comment has been removedAnonymous
June 12, 2008
Hi, I tried you sample code from your blog but I cannot get it to work on XP SP2. Is the ? mechanism not supported for really long file names? StringBuilder sb = new StringBuilder("C:\SubDir\SubDir"); for(int i=0;i<60;i++) sb.Append("FileName"); sb.Append(".txt"); LongWriter.TestCreateAndWrite(sb.ToString()); The Directory does already exist. I get as return value: The filename, directory name, or volume label syntax is incorrect. Is there a known limitation with the ? approach? Yours, Alois KrausAnonymous
June 12, 2008
The comment has been removed