Jaa


Why does Windows still place so much importance on filenames?

Earlier today, Adrian Kingsley-Hughes posted a rant (his word, not mine) about the fact that Windows still relies on text filenames.

The title says it all really. Why is it that Windows still place so much importance on filenames.

Take the following example - sorting out digital snaps. These are usually automatically given daft filenames such as IMG00032.JPG at the time they are stored by the camera. In an ideal world you’d only ever have one IMG00032.JPG on your entire system, but the world is far from perfect. Your camera might decide to restart its numbering system, or you might have two cameras using the same naming format. What happens then?

I guess I’m confused.  I could see a *very* strong argument against Windows dependency on file extensions, but I’m totally mystified about why having filenames is such a problem.

At some level, Adrian’s absolutely right – it IS possible to have multiple files on the hard disk named “recipe.txt”.  And that’s bad.  But is it the fault of Windows for allowing multiple files to have colliding names? Or is it the fault of the user for choosing poor names?  Maybe it’s a bit of both.

What would a better system look like?  Well Adrian gives an example of what he’s like to see:

Why? Why is the filename the deciding factor? Why not something more unique? Something like a checksum? This way the operating system could decide is two files really are identical or not, and replace the file if it’s a copy, or create a copy if they are different. This would save time, and dramatically reduce the likelihood of data loss through overwriting.

But how would that system work?  What if we did just that.  Then you wouldn’t have two files named recipe.txt (which is good).

Unfortunately that solution introduces a new problem: You still have two files.  One named “2B1015DB-30CA-409E-9B07-234A209622B6” and the other named “5F5431E8-FF7C-45D4-9A2B-B30A9D9A791B”. It’s certainly true that those two files are uniquely named and you can always tell them apart.  But you’ve also lost a critical piece of information: the fact that they both contain recipes.

That’s the information that the filename conveys.  It’s human specific data that describes the contents of the file.  If we were to go with unique monikers, we’d lose that critical information.

But I don’t actually think that the dependency on filenames is really what’s annoying him.  It’s just a symptom of a different problem. 

Adrian’s rant is a perfect example of jumping to a solution without first understanding the problem.  And why it’s so hard for Windows UI designers to figure out how to solve customer problems – this example is a customer complaint that we remove filenames from Windows.  Obviously something happened to annoy Adrian that was related to filenames, but the question is: What?  He doesn’t describe the problem, but we can hazard a guess about what happened from his text:

Here’s an example. I might have two files in separate folders called recipe.txt, but one is a recipe for a pumpkin pie, and the other for apple pie. OK, it was dumb of me to give the files the same name, but it’s in situations like this that the OS should be helping me, not hindering me and making me pay for my stupidity. After all, Windows knows, without asking me, that the files, even if they are the same size and created at exactly the same time, are different. Why does Windows need to ask me what to do? Sure, it doesn’t solve all problems, but it’s a far better solution than clinging to the notion of filenames as being the best metric by which to judge whether files are identical or not.

The key information here is the question: “Why does Windows need to ask me what to do?”  My guess is that he had two “recipe.txt” files in different directories and copied a recipe.txt from one directory to the other.  When you do that, Windows presents you with the following dialog:

Windows Copy Dialog

My suspicion is that he’s annoyed because Windows is forcing him to make a choice about what to do when there’s a conflict.  The problem is that there’s no one answer that works for all users and all scenarios.    Even in my day-to-day work I’ve had reason to chose all three options, depending on what’s going on.  From the rant, it appears that Adrian would like it to chose “Copy, but keep both files” by default.  But what happens if you really *do* want to replace the old recipe.txt with a new version?  Maybe you edited the file offline on your laptop and you’re bringing the new copy back to your desktop machine.  Or maybe you’re copying a bunch of files from one drive to another (I do this regularly when I sync my music collection from home and work).  In that case, you want to ignore the existing copy of the file (or maybe you want to copy the file over to ensure that the metadata is in sync).

Windows can’t figure out what the right answer is here – so it prompts the user for advice about what to do.

Btw, Adrian’s answer to his rhetorical question is “the reason is legacy”.  Actually that’s not quite it.  The reason is that it’s filenames provide valuable information for the user that would be lost if we went away from them.

Next time I want to spend a bit of time brainstorming about ways to solve his problem (assuming that the problem I identified is the real problem – it might not be). 

 

 

PS: I’m also not sure why he picked on Windows here.  Every operating system I know of has similar dependencies on filenames.  I think that’s an another indication that he’s jumping on a solution without first describing the problem.

Comments

  • Anonymous
    February 04, 2011
    What I would venture is that using the file name as THE identity token of the file in the file system is what is causing this guys trouble. Arguably the name is just metadata about the file, a very important part, but still just metadata, no different than the last write date or the permissions. One could argue that the user should be able to "name" the file whatever she wants, independently on how the OS determines the identity of the file, copying then the two files called recipe.txt to the same folder should then be just a matter of annoyance to the user because she doesn't know anymore which one is which. This could be also extended to the usage of the extension paradigm to "tag" the file type, which should be also part of the metadata not part of the file identity. Even folders could be thought as just mere views of the underlying data then, whether the same folder is in two folders or the file is copied being a bit more natural to express. Now the interesting though experiment here is how to design an api to deal with a file system such as this, one would open the file by its id token which would be resolved after the user picks a file in some sort of FileOpenDialog UI. Almost like one imagine the file system internal API must be, after the directory is resolve to the actual entry in the MFT. The apps would now deal with those IDs directly instead of through the "view" of directories and file entries.

  • Anonymous
    February 04, 2011
    @Nobody.  I want to talk about that particular issue in the post after the next one. There are some interesting challenges involving user expectations to that solution.

  • Anonymous
    February 04, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    @Weeble: You're describing the Windows XP experience.  The file copy dialog was dramatically improved for Windows Vista.  How do you handle the "copying an updated file from the laptop" scenario if you never replace the existing file (where you do want to overwrite the file)?  What about the "updating my media library from home" scenario (where you don't want to overwrite the file)? Forcing the user to come back and clean up after the copy command can also result in a poor experience.  People would say "@#$@#$ windows, why doesn't it understand that I wanted to overwrite the file?" These decisions are tricky, which is why I decided to write the followup post.

  • Anonymous
    February 04, 2011
    What's the names of the different photos in your iPad/iPhone/iPod Touch Photos app? What are the names of the files for the notes in your phone's note-taking app? The names of the save-game files? The MP3s in your music library app? By framing the question of filenames in a filesystem context, you risk prematurely jumping to conclusions. What if, from the perspective of the user, there is no filesystem? Without a filesystem, you don't need names. Perhaps you need tags, dates, camera model, author, etc. etc. Perhaps these things are more or less convenient. Perhaps there is still a filesystem behind the scenes, but the user doesn't necessarily need to know that.

  • Anonymous
    February 04, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    That dialog was one of the best changes MS made to the explorer I can think of. There are scenarii for all three options (and I could think of a 4th that allows the user to specify a new name when keeping both instead of the default behavior). And I don't see how it doesn't give me all the information I need to make the right decision. Now if someone wants to complain about the XP dialog boxes, just go on, I doubt anyone would want to stop you. Also actually I totally DON'T agree that it's a bad thing to have several files with the same name on a disk - readme.txts or config files come to my mind.. it's not just the filename but the absolute path that has lots of information. If I have two files that can be distinguished based on some tag, I can just as easily adjust the filename. Also how would I specify a specific file if there could be several with the same name in one position? Specfiy the distinguishing tag? Sounds like he had a specific problem and generalized from it without thinking about the hundreds of scenarios where his "obvious solution" wouldn't work.

  • Anonymous
    February 04, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    I thought I made it clear but I shall try to restate - the system that I was considering is quite obviously utterly incompatible with existing applications and file systems and would only work with an ecosystem of applications designed to support it from the ground up. It needs a mechanism for programs and scripts to communicate and store file identity other than filenames. It needs a mechanism to distinguish replacing the content of a file and creating a new file with the same name. Obviously it is not practical to retrofit this behaviour into a traditional file-system. That doesn't mean it's not useful to consider it as a theoretical way to manage documents. I think we are agreed that it looks like such a system would be awkward when we really want to "copy and replace the equivalent documents" or "copy only the documents that don't have equivalents" or some mix of both. I guess I'm just saying that I'd be interested to experiment with such a system to see how painful this is and if there are other ways to resolve those problems than by using filenames as identities. There seems to be some elegance to being able to say that "copying" a document is just that, no more and no less, as opposed to "copying and replacing". Elegance isn't an end to itself, but I find it can mean something is at least worth a second look.

  • Anonymous
    February 04, 2011
    I think you missed the point. When you copy a folder with a file named "recipe.txt" over a folder, which already contains a file named "recipe.txt" it would be better if Windows would know, if these files are identical. So, if you merge some folders, for example you have already images from your camera in your pictures folder. Now you are on vacation and download some pictures from your camera to your notebook. When you are back, you would copy the pictures folder from your notebook over your pictures folder on your desktop. But because you didn't remember, "Ah, I already downloaded image "IMG00032.JPG" to my desktop, before I start my vacation and forget to delete it on the camera, you downloaded the file again to your notebook. And now you will see the prompt, Windows asking you what to do. If Windows would already know, "Hey, these fles are the same..." there is nothing to ask ;-) The problem might be, that it takes too much time: To calculate a checksum, you need to read the entire file. If you then decide you want to copy the file, you need to read it again..

  • Anonymous
    February 04, 2011
    I think you missed the point. When you copy a folder with a file named "recipe.txt" over a folder, which already contains a file named "recipe.txt" it would be better if Windows would know, if these files are identical. So, if you merge some folders, for example you have already images from your camera in your pictures folder. Now you are on vacation and download some pictures from your camera to your notebook. When you are back, you would copy the pictures folder from your notebook over your pictures folder on your desktop. But because you didn't remember, "Ah, I already downloaded image "IMG00032.JPG" to my desktop, before I start my vacation and forget to delete it on the camera, you downloaded the file again to your notebook. And now you will see the prompt, Windows asking you what to do. If Windows would already know, "Hey, these fles are the same..." there is nothing to ask ;-) The problem might be, that it takes too much time: To calculate a checksum, you need to read the entire file. If you then decide you want to copy the file, you need to read it again..

  • Anonymous
    February 04, 2011
    I posted this reply/suggestion to the original: I'm not sure what you really expect Windows (or any other OS; they all act the same in this regard) to do here. Why don't you use a decent file manager and have it make the filenames properly unique as it moves them off the camera? For example, have it prefix them with the date & time of when they are being moved. Then they will not clash with existing filenames even if the camera has reset it counter.

  • Anonymous
    February 04, 2011
    @voo > And I don't see how it doesn't give me all the information I need to make the right decision. By this I meant that if I have two files called recipe.txt I might need to see the contents of both to make the right decision. Maybe they are entirely different recipes and I want both. Maybe they've both been edited since they were copied and the last modified date and size aren't enough to pick one over the other. And I know I have in the past started such an operation, chosen to replace or keep a few files and then gotten to a point where I realise there's no good answer for some file and I really want to roll-back the operation, but by then it's too late. (I don't think it's undo-able... or is it?)

  • Anonymous
    February 04, 2011
    The comment has been removed

  • Anonymous
    February 04, 2011
    This discussion makes me think of WinFS.

  • Anonymous
    February 05, 2011
    I suppose two niceties could be to detect when the file contents are identical, and maybe tie in the preview pane so one could examine both files.

  • Anonymous
    February 06, 2011
    The rant in question may have more to do with a pressing deadline and a need for copy than a real grievance. It sounds more like a polemic of convenience.

  • Anonymous
    February 06, 2011
    @Larry, I think the problem with the "copying an amended file from a laptop" scenario is that there really shouldn't be a checksum, but a GUID.  When I copy the file to my laptop, it uses the GUID, when I copy it back, the system notes that (a) it was copied to the laptop on date X, (b) the file on the master system hasn't been changed since date X and (c) these files have the same GUID and therefore does an overwrite. The problem here is that you need to have (a) a mechanism to do intentional forking of files (File | Save As ?) and (b) a real conflict resolution mechanism for merging back two files that have both been amended since they were separated. Of course, now you're headed towards having a DVCS instead of a filesystem.  But maybe that's what would be appropriate for document files if they weren't binary blobs that the DVCS can't see into and therefore can't do merging properly. Certainly, using a VCS or a DVCS for code and a DMS for documents at work has made me regard filesystems as being a bit primitive for user-facing documents.  You'll notice that Google Docs doesn't really emulate a file system.

  • Anonymous
    February 07, 2011
    @Evan, yes what I was thinking is that each file gets an ID when they're created and that is how they're manipulated, this ID being independent from the identity as perceived by the user. Another of the limitations of the filenames is the fact that you can only use "legal" characters on them, I can't name my file "My 12 work done" because it so happens that the OS decided eons ago that the '' character is the "path separator" whatever that means in a GUI world. Try to explain to somebody that is not very computer savvy that with a straight face.

  • Anonymous
    February 07, 2011
    The comment has been removed

  • Anonymous
    February 08, 2011
    > "In many cases, users don't care about the filename. When dealing with the photos on my camera, the camera automatically fills in the date" I actually really hate the random identifiers that cameras apply but understand at the moment there isn't a better solution.  I do care about the file name, but it is overly burdensome to apply it.  In my ideal world cameras would have a lot better built in tagging capability (other than date, and occassionally GPS cordinates which are utterly un user friendly) and allow me to specify a file name based on the meta data in the same way various utilities let me rename MP3s based on tag info.  At the same time, while Windows is getting better at using meta data it would be nice if it could go further - functionality like Windows Live Photo Gallery really should be built into the shell. > "When you copy a folder with a file named "recipe.txt" over a folder, which already contains a file named "recipe.txt" it would be better if Windows would know, if these files are identical." In a perfect world sure, but that comes at some real world tradeoffs that are huge.  A 5K text file can be diffed in a matter of milliseconds (especially if you are only checksum diffing) but even that becomes very complicated if you want to start making intellignent decisions based on things like trailing whitespaces, differing unicode quote marks, etc. Expand that out to a 200 meg powerpoint which can have all sorts of very subtle comparisons (office properties metadata, minute differences in specifying formatting between office versions, document change metadata, etc).  Comparing binary identity is easy, but comparing context identity is incredibly hard and not at all processing cheap.  

  • Anonymous
    February 08, 2011
    I wonder if Adrian has children. I wonder if he gave them names.

  • Anonymous
    February 09, 2011
    I don't understand how you're supposed to use a system where filenames don't have to be unique -- how do you know which file you want to open when they have the same name? If I want to open a recipe, I need to be able to figure out which recipe to open. You could argue that a thumbnail of the document would show me which one I want, but I would imagine that most recipes would look similar in thumbnail form. Can you imagine a cookbook where the recipes weren't uniquely named? It would be very difficult to use. You'd have to use meaningless metadata like page number to find a specific recipe. Of course you could argue that the user should be allowed to create as many files of the same name as they want, but then most users would end up with hundreds of files all called "Untitled" because it's easier. Then people would be complaining about how hard it gets to use computers because filenames are so often the same and it's hard to know which "recipe" they're looking for. By forcing unique names at file creation time, the computer is saving users lots of pain later on.

  • Anonymous
    February 10, 2011
    An enterprising person could prototype such a thing today. Ignore the actual filenames, and use the file id (MFT record number). Stuff your non-unique filename in a named data stream.   Write yourself a file browser that showed the non-unique name. Mind you, I'm not so sure I'd want to use it.

  • Anonymous
    February 10, 2011
    Maybe there should be more options in the file copy dialog. Just looked how Total Commander handles and there is at least two useful option: "Replace all older" and "Replace all shorter". This would probably solve many situations where people synchronize files. Also a visual feedback option might be OK - there are preview handlers in new Windows - why not display a part of the file in the copy dialog - XnView will present you an option like this if you try to overwrite files with same name. "After all, Windows knows, without asking me, that the files, even if they are the same size and created at exactly the same time, are different." - he seems to not see many problems here. When are they different? When every letter is different? Or one letter is different? What about binary files? It would result in a totally unhelpful and unpredictable (for a user who haven't read the, probably several pages long, documentation of this feature) choices.

  • Anonymous
    February 10, 2011
    The comment has been removed

  • Anonymous
    February 13, 2011
    I'd like to add a thought. Perhaps you should have a look at a situation where this issue regularly occurs: When I take pictures with my digital camera, the system starts with a filename like IMG000001.jpg. It continues to count till infinity as long (!!!) as I don't change the card on which pics are stored. If I do, it starts again at IMG000001.jpg. When I come back from holidays and I store the pics on my harddrive, I have to store the pics from the second card, in a separate folder to not lose half of my pics. As far as I know (and I am no expert), other OSs will recognise that these are indeed different files and store them without regard to the file name...

  • Anonymous
    February 21, 2011
    @dave: Really all that needs to be done is to ignore the underlying filename in the GUI, instead display names based on a combination of already-present metadata plus additional user-specified metadata (through the new GUI). See, it sounds to me like what Adrian is after (mostly) is simply a new front-end; that is, a replacement for or upgrade of Explorer. This "Pretty-Explorer" would display files using metadata, without regard to the underlying filename. A logical extension of how image files can be displayed as thumbnails, or how music files can be displayed as artist/track information instead of raw filenames. For example, with the new system, you might see a folder full of several text documents called "Untitled", with additional metadata - like the date - displayed in proximity (alongside, in a preview bar, something like that). The user doesn't care what the actual filesystem names are, since they operate on the documents. This still doesn't fully solve the issue of overwriting versus "side-by-side" copying. One could base the initial "guess" as to the type of copy operation solely on the new set of metadata - perhaps first switch on the specified "file type", using different logic for different files. If the metadata used in the check says the files are different, then they are copied side-by-side, with the underlying file system names being automatically modified, if necessary. On the other hand, if the metadata says the files are the same - perhaps they are both of type "Word Document" and both have the title "Apple Pie Recipe" - then the system would prompt the user as to whether they want the "old" file overwritten, or to simply have the "new" file exist side-by-side after the operation. This discussion really raises a lot of interesting UX questions. I feel that we are going down this road, but one must remember that there always also needs to be a way to represent the system as it "really is", for those times when either (a) you have a power-user who can work more efficiently with the "raw" view or (b) you need access to the system to aid in debugging, system recovery, etc. @Martin: What OS uses a filesystem that allows for non-unique fully-qualified filenames? (That is, the filename with absolute path, taking into account any case-sensitivity, etc.) I can't imagine how someting like that could even function, unless again the "filenames" you are seeing are not really filenames, but rather metadata displayed to the user when viewing from some front-end.

  • Anonymous
    April 01, 2011
    I think much of this discussion brings up SOME of the motivating reason for WinFS. I hope that gets resurrected some day. As far as the UX, I'd like to see Explorer (and every other program with the same pattern) keep track of "user-resolvable conflicts" as it continues to copy the remaining files that do not conflict. As it hits these conflicts, it should update a dialog containing a listView/dataGrid showing the comflicts, and allowing me to check a radio button of what to do for each, or for a multi-selection. That way, I would make these decisions as all the non-conflicting files continue to copy/move. Then, once I've made my decisions on each of these listed conflicts, I could hit "Apply". Additionally, the context menu for each file/conflict listed would allow me all the same verbs that an Explorer window allows. Just my two cents worth.

  • Anonymous
    April 25, 2011
    I think there should be tab in explorer that will display MD5 of a file and user can sort files having same MD5 and delete the duplicate ones.

  • Anonymous
    April 25, 2011
    I think there should be tab in explorer that will display MD5 of a file and user can sort files having same MD5 and delete the duplicate ones.

  • Anonymous
    April 27, 2011
    Maybe the solution for Adrian's real problem would be an explorer option that allows proper batch-renaming of files (then you could use the explorer sort order by date, or by album title or whatnot). NB: Windows Explorer can batch-rename files, but only in a very limited way. And for more than ten files it breaks the name sort order (file(1).txt, file(10).txt, file(2).txt, ...).