Why is MSN Desktop Search so slow at indexing my disk? And other tips...
A quick follow-up to four points, since "Dogfooding MSN Desktop Search" seems to be getting a lot of attention.
First, Joe Wilcox of Jupiter Research is disappointed with the initial indexing speed of desktop search, because it took 2.5 hours to index 7GB of data and 800MB of email on his laptop. I’m not sure if that’s considered fast, slow, or in-between, but I do know one way to speed it up — defragment your hard disk first! This applies to all the desktop search utilities out there: MSN, Google, Yahoo, X1, Copernicus, the lot. Their indexers have to crack open every file to look inside it. If that file is in nice consecutive blocks on your disk, it’ll be slurped up in a flash, but if the blocks are scattered hither and yon, your disk head isn’t going to thank you.
Second, the indexer is very cautious, and will immediately back off if it thinks either your or another application is using the machine. So the fastest way to build an index is:
- Defragment your hard disk
- Start up just Outlook or Outlook Express (so that it can index your mailbox)
- Install MSN Desktop Search
- Leave it alone!
- If you can’t leave it alone, right-click on the magnifying-glass icon in your system tray and select “Index Now”
Third, the two videos of the MSN Desktop Search team on Channel 9 are long, but if you want to get any idea of the history or future direction of the tool, they’re well worth checking out. The Silicon Valley team show off all the capabilities of the desktop bar, while the Redmond team talk about the big picture and shipping product. Scary tidbit: some of the core code is apparently left over from Cairo, way back in 1991!
Finally, I’d previously suggested leaving feedback on the msnsearch blog, but there’s an even easier way from within desktop search itself — right-clicking on the MSN butterfly button and selecting Help->Send Feedback will take you to the MSN Toolbar feedback page. Of course, then you don’t get to read the cool blog comments, like the team fixing customer problems, or readers reverse-engineering the registry hacks shown on the video!
Oh, and Scoble, do we have a record yet for the greatest number of items indexed? I just saw an email from one internal user who’s got 1.2 million…
Comments
- Anonymous
December 13, 2004
I loved the quick response that I got when I posted the problem I was having with Outlook on the MSN Search blog. Looks like a very enthusiastic and committed team of devs. :) - Anonymous
December 13, 2004
"but I do know one way to speed it up — defragment your hard disk first! "
8GB in 2.5 hours is 53MB per MINUTE. I think even in PIOmode4 a harddrive can read all the data fully fragmented. Which means: the disk is definitely not the bottleneck. Or is the process doing very silly I/O? like the I/O decision win32 makes when copying a large file: copy 4KB blocks, no matter how much ram is available. If you have 100MB of ram to spare, the indexer can load files of 100MB at once in memory, then start indexing. Already on the Amiga we knew that if you do I/O and processing in parallel with two (2) buffers (one for reading, the other one is processed, what an invention!) you can achieve very fast performance without having the processes block eachother. - Anonymous
December 15, 2004
Frans - I'm pretty sure that the indexer is NOT doing very silly I/O. If you go watch the Channel 9 video, you'll see Chris McConnell commenting that he's been working on indexers since 1996, and I can attest to the fact that he's plenty smart. There are clearly several variables at work, because I've seen some people say they indexed half of an 80GB disk in less than an hour, and others say they had to leave the thing running all night and it still wasn't finished. I'll also bet that the performance characteristics of each indexer are closed guarded secrets of the major players, so we might not see a full analysis coming from the inside anytime soon. Nothing to stop anyone on the outside running a performance comparison, though (although better read those click-through EULAs first, just to be sure you can publish it! :->)