Partilhar via


Jon Udell questions the value and direction of WinFS

Jon Udell at InfoWorld is doing a series of blog entries on Longhorn.  Feedster just discovered his first one, from Wednesday on the justifcation for WinFS defining a new way to manage metadata.

It's a well written entry, and deserves a well thought out response.  I did want to get out at least one quick response into the blogsphere, though, because I think there's a misleading statement towards the end:

two powerful trends point to a brighter future for this scenario: the growing use of open XML file formats, and the steady advance of databases that can index and search XML content. WinFS embraces neither trend, and that looks to me like a looming headache. Personal information management, in Longhorn, will be a walled garden with its own notion of schema, and its own query language. To give users the benefit of finding stuff, Longhorn-style, developers will have to implement the Longhorn model.

Jon seems to have missed a few key entries in MSDN about WinFS's support for XML APIs, as well as the support for metadata handlers that copy metadata between WinFS and the filestream, precisely so that there is no walled garden -- if you're using Longhorn, you see WinFS properties, if you take the file somewhere else, you see EXIF headers or whatever other metadata format your file type supports.

XML formats with well-defined, licensed schemas, are certainly a great step towards a world of open data interchange.  But XML files alone don't make it easier for users to find, relate and act on their information.

Jon's contention is that full text search over XML files is good enough, but is it really?  I did a series of blog entries on WinFS scenarios back in February, and I don't think's Jon full text search approach would really enable these things.  Take the simple media scenario, where I want to add background music to a movie by browsing through my media library.  As it happens, I've recently started using iTunes to manage my music, and iTunes stores its metadata in “iTunes Music Library.xml“ on my hard drive.  So let's say I wanted to search for jazz music to add.  Here's a little snippet of what iTunes's XML format looks like for one of my jazz CDs:

- <dict>

<key>Composer</key>

<string>Lee Morgan</string>

<key>Album</key>

<string>Cornbread</string>

<key>Genre</key>

<string>Jazz</string>

<key>Kind</key>

<string>MPEG audio file</string>

<key>Location</key>

<string>file://localhost/D:/files/Music/Lee%20Morgan/Cornbread/01%20Cornbread.mp3/ </string>

</dict>

So what would full text search do for me over this file?  If I searched for “jazz“, it would certainly show me a result for “iTunes Music Library.xml“, since that file contains many instances of the string “jazz“.  It would also probably return other documents on my system that mention jazz, like emails, or papers I may have written in school.  How exactly does this help me find the right piece of music to add to my movie?

To help at all, the programmer who built the movie editing program would have to add in a bunch of smarts about how to index this particular iTunes file, understanding the key/string pairs, and also recognizing that Location has a particularly special meaning.  To make a really performant system, you'd probably need to make a smart indexer as well, so that if you change the genre of only song in you collection of 4000, the indexer doesn't have to recrawl the entire XML file to update itself.

Perhaps Jon's point is that the file format of the music itself should support XML, and we should replace .mp3 with a pretend new XML-media file format, .xm3.  Well, now you've still got to worry about the schema definition of .xm3, and you've got to also worry about what to do if your user happens to prefer media encoded with Windows Media Player or Ogg or whatever.  What you want is a common storage engine, and you want a shared schema with strongly typed metadata.  That's WinFS.

I could go on more here, but wow, this was supposed to be the simple scenario!  I also wrote up a more complicated event planner scenario.  The key value of an event planner app is that it relates together content that would otherwise be completely unrelated.  If I want to find the presention on Longhorn that I gave to Infoworld, full text search doesn't help at all, because “Longhorn“ and “Infoworld“ likely never appear together in any document.  Perhaps they appear together in a calendar entry, and full text search might help me find that one entry.  But then how would I find the agenda for that Infoworld meeting, or the notes I took from that meeting, or the presentation I gave?

Anyways, like I said at the start, Jon's well-written entry deserves a well-written response, but this is what I came up with off the top of my head.  If we're doing a bad job about explaining the end-user benefits of WinFS, keep in mind that so far we've tried to really focus the message on developers, since it will be a while still before a home user needs to think about Longhorn.  If you're a developer and you're interested, there's plenty of WinFS info up on MSDN today.  Take a read through, and see if you reach the same conclusion Jon did -- “There's no question that Longhorn aims for lock-in ”.

Comments

  • Anonymous
    June 07, 2004
    The comment has been removed

  • Anonymous
    June 07, 2004
    Jeremy,
    You missed his point almost completely.

  • Anonymous
    June 07, 2004
    Jeremy,

    I remember giving almost exactly the same example to you:

    <full text search doesn't help at all, because “Longhorn“ and “Infoworld“ likely never appear together in any document>

    just with different keywords, back in September/October last year, when talking about 'Save With' instead of 'Save As' and still agree completely.

    I think the main problem facing Microsoft and you in particular is to provide powerful demonstrations of the power of WinFS. Since it a Conceptual sell, not a standard benefits sell. Your Scenarios are just the first step I think...

    I decided to stop working on my Relational File System after seeing WinFS... because:
    1) WinFS looked like it was going to be significantly better technically.
    2) But more importantly it was so hard to explain to customers, even MS is going to struggle to explain it the benefits to consumers, and you and MS have 2 years to do so.

    Good luck.
    Alex





  • Anonymous
    June 07, 2004
    The comment has been removed

  • Anonymous
    June 08, 2004
    The comment has been removed

  • Anonymous
    June 08, 2004
    Today’s Windows file system cannot be trusted to remember where I put a file (location of icon in window) or even that I wanted the window to be displayed in the graphical icon mode. Why should I look forward to trusting a future Windows file system with even more information?
    (Yes, I know that is an available feature ... What I do not know is how to fix a system after that feature breaks).

  • Anonymous
    June 08, 2004
    I've been waiting for this battle to ensue.

  • Anonymous
    June 08, 2004
    Excellent rebuttal and well thought out. Some of the other commenters need to go back and read Jon's blog entry as he does indeed assert that full text search is good enough.

  • Anonymous
    June 09, 2004
    Pete: where does he assert that? I reread the article and all I saw was a statement that "The power of pervasive free-text search, by the way, is something that Microsoft seems consistently to underestimate," which is not the same thing.

  • Anonymous
    June 09, 2004
    Jeremy: "Perhaps Jon's point is that the file format of the music itself should support XML, and we should replace .mp3 with a pretend new XML-media file format, .xm3."

    Think file-format independent XML metadata where the association between data (eg, the mp3 bits) and the metadata is made via a file-system directory,

    http://weblog.infoworld.com/udell/2003/03/05.html#a627

  • Anonymous
    June 10, 2004
    Danny Ayers on WinFS and RDF/semantic web

  • Anonymous
    June 10, 2004
    Scoble

  • Anonymous
    June 11, 2004
    'If we're doing a bad job about explaining the end-user benefits of WinFS, keep in mind that so far we've tried to really focus the message on developers, since it will be a while still before a home user needs to think about Longhorn.'

    OK, let's say you are coming from the business side (with deep technology knowledge, but not hard-core developer) and trying to understand Longhorn so you can decide if you want to put resources into Longhorn development. I do think end-user benefits would be important to understand the bigger business value picture.

  • Anonymous
    June 11, 2004
    MasterMaq

  • Anonymous
    June 15, 2004
    I've got hours of thoughts on this whole thing, so for the sanity of others I'll limit this post to just one or two.

    Firstly, with regards to the Jazz Music example, free text search can be just fine if you limit the scope of the search to mp3s - or "My Music". You wouldn't then come up with any Word document containing jazz as it would be out of scope for the search. What I don't understand is how it would be any different using WinFS as the user's experience of the search is "Jazz" - so how would your (Jeremy) example be any better. Which lead me to believe that I don't perhaps understand your example because now you are talking about a developer of a movie editing package adding "smarts" relating to the iTunes package. Could you elaborate a little on your example, in particular to the users' experience and not neccessarily how easy it is to code.

    My second point is a general one on WinFS and, indeed Longhorn's marketing messaging... could you please stop citing examples of pictures and music?!! I see very few examples of the benefit to corporate workgroups, especially when working with files on a mapped network drive. Is there any light you can shine on this for us (me) Jeremy?

  • Anonymous
    March 24, 2008
    PingBack from http://caferestaurantsblog.info/riding-herd-jon-udell-questions-the-value-and-direction-of-winfs/

  • Anonymous
    June 05, 2008
    Jon Udell at InfoWorld is doing a series of blog entries on Longhorn. Feedster just discovered his first one, from Wednesday on the justifcation for WinFS defining a new way to manage metadata . It's a well written entry, and deserves a well thought ou