Condividi tramite


A bit about WinInet's Index.dat

Since a recent digg article and its underlying Wikipedia entry seems a little confused about index.dat, I’d like to give some more detail about what it is and what we have changed with it in IE7/Vista’s version of WinInet. As Jeffdav explained a while back, the index.dat file is a store for web related things; the URL content cache, cookies, RSS feeds, and visited links. Each of these collections, called a container, has their own index.dat file that lives in the user profile.

First, let’s talk a bit about these containers a bit more:

On most machines the biggest and most important container is the URL content cache index.dat. It lives (on vista) at \Users\<user>\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.IE5\index.dat. Content such as pages and images that we fetch from the web and that are cacheable get placed into this cache until they expire. The rules for if it is cacheable and when the entries expire from the cache are complex enough to warrant its own blog posting, but the common reasons that content doesn’t go in the cache is due to the server telling us not to via response headers, or the user telling us to not save any SSL resources to disk via the “Do not save encrypted pages to disk” option in Internet Options->Security->Advanced. Each cache entry has the URL and a file name to allow us to quickly find previously retrieved URLs and serve that content out of the content container. If a user just deletes all the files in the directory, the index.dat file will still contain all the URLs and paths until we realize that the cache entry is missing the file, and should be deleted from the index.dat.

The visited container is a listing of the URLs that you click on when web browsing, which is how IE can do URL auto completion and mark the links that you have visited a different color. This container is located on my vista box in \Users\<user>\AppData\Local\Microsoft\Windows\History\History.IE5\index.dat. Visited only needs to know about each URL once, since you have either visited the site or you haven’t.

The history containers are a set of containers for the different date ranges that IE displays, like today, yesterday, last week, etc. These containers are in \Users\<user>\AppData\Local\Microsoft\Windows\History\History.IE5\MShist01<date><date>\index.dat. Again top level links that you visit are stored in these containers. When the date shifts, IE does the bookkeeping often through merging these buckets.

The cookie container maps the cookie URLs to individual cookie files. It is stored in \Users\<user>\AppData\Roaming\Microsoft\Windows\Cookies\index.dat. The index.dat contains the associated URL, path to the cookie data and other cookie metadata information. You might notice that unlike the other containers this container is under a path called Roaming. This has to do with a domain feature that copy around your preferences from machine to machine on a domain. Cookies are one of those types of settings.

You might have also seen that starting in Vista almost all the containers have a Low\ directory with another index.dat. That is because these are specially marked directories that IE in protected mode can access. We completely partition off IE between the protected mode and normal modes. By design, normal only accesses the normal cache, cookies, etc. and by design and OS protection, protected mode only accesses the Low\ versions. The “how” of this partitioning is talked about on MSDN.

It’s important to note that pretty much all modern web browsers has to store these types of data stores. Firefox (1.5.0.6 at least) uses different types of file formats for each of its index.dat equivalents but they are there. The equivalent of the cache container index.dat is in Users\<user>\AppData\Local\Mozilla\Firefox\Profiles\59kuzm1n.default\Cache with the _CACHE_* files. The other containers are in the Roaming version of the directory over in Users\<user>\AppData\Local\Mozilla\Firefox\Profiles\59kuzm1n.default\. The history and visited are probably combined into one container; history.dat and the cookies container is cookies.txt.

There is one thing pretty special about WinInet and hence the index.dat files; they are OS components that many applications use, including explorer. That means that they were highly optimized for sharing data between processes. Each application’s copy of WinInet opens up the file for sharing read and write, but not for delete. As long as any program is using WinInet, the index.dat file can’t be deleted. If you could delete it, the applications actively using the file would probably crash or start corrupting data in memory. This also means that many applications leave their own footprints in the different containers. For Example: when Windows Music Player downloads an mp3 from the web from an URL, that file can end up in WinInet’s content cache.

So what’s new in IE7? Well the first thing is that IE made the interface for clearing up these files much simpler with “Cover My Tracks”. Under this idea WinInet made a bunch of improvements. The first improvement was in entry deletion. Those of you who remember the FAT file system on DOS might find the concepts behind this problem familiar. In DOS when you delete a file, the file is still around and special tools can undelete them unless some new files have already written over the old files. The way we use to delete entries in the index.dat file was pretty similar, the old URL data was marked free, but was still there, at least until it was overwritten by a new entry. In IE7 we now zero out the entry. Another problem was that some applications (cough Outlook Express cough) would write temporary files, like attachments, into the cache file directory to allow other applications to open them. If the index.dat file didn’t know about the file, we wouldn’t clean it up. Now when you use the “Delete Files…” button we delete everything in the directory regardless of if it’s in index.dat or not. There is one more feature in this area that I should mention even though it is not new. When we attempt to delete an entry from the cache, but can’t delete the actual storage file, we will still remove the entry from index.dat and stick the file on a list of things to periodically try to clean up.

Any Questions?

    -- Ari Pernick

Comments

  • Anonymous
    August 04, 2006
    Continuing the discussion in the previous post, offcourse index.dat is not a secret record of any kind,...

  • Anonymous
    August 04, 2006
    The comment has been removed

  • Anonymous
    August 04, 2006

    I think this post is misleading. The ability to delete cookies has been available in Internet Explorer just about forever. The real problem behind index.dat is that whether or not the indexes inside are still relevant or not, it keeps named urls forever. This is a privacy issue. Any application can read index.dat and figure out which sites I visit, without me knowing.

    As a user, I want to be able to turn on an hypothetical "auto-delete" of everything either anytime the web browser is restarted, or windows is restarted, or even on a schedule basis. I don't think IE7 is going to provide any of this, unless I have missed something.

  • Anonymous
    August 04, 2006
    Why is the directory still called IE5? It's exactly those kind of legacy thingies which frustrate me about MS software. The fact that UA string additions can be stored in HKLMSoftwareMicrosoftWindowsCurrentVersionInternet Settings5.0User AgentPost Platform as well as HKLMSoftwareMicrosoftWindowsCurrentVersionInternet SettingsUser AgentPost Platform is just another example of this.

  • Anonymous
    August 06, 2006
    I agree on the legacy comment by Jorrit-- why not start using a new folder name? Isn't there a "system variable" for that system directory, much like there is one for "MyDocuments" and "Windows" ?

    Also, I remember there being issues with the index.dat becoming corrupted or full. This is what leads to the "right click-> view source -> nothing happens." bug, right?

    That's why I use CacheSentry [http://www.enigmaticsoftware.com/cachesentry/] sometimes.. but wish it would just be fixed in the source!

  • Anonymous
    August 07, 2006
    In my previous post I tried to explain a bit about what the index.dat files are and what has changed...

  • Anonymous
    August 07, 2006
    I've responded to a number of the questions asked here on the next post: http://blogs.msdn.com/wndp/archive/2006/08/07/WinInet_Index_dat_Q_and_A.aspx.

    -- Ari

  • Anonymous
    August 08, 2006
    The mysterious history file.

  • Anonymous
    August 08, 2006
    The comment has been removed

  • Anonymous
    August 08, 2006
    FYI, if you want to decode the contents of the Index.dat file, here's a forensics tool to do that. (Useful if you want to see what websites may have done nefarious things to Internet Explorer):
    http://www.foundstone.com/index.htm?subnav=resources/navigation.htm&subcontent=/resources/proddesc/pasco.htm

  • Anonymous
    August 09, 2006
    Does this mean that a forensic company/police has to actually read the HDD with some special tools (might need to open it physically up) if they need to have access to visited urls after user cleared up them easily with IE7? Or maybe they elect to try get the urls from ISP but hey, someone might have been using your wireless access and now it's harder to tell the police it wasn't you since it was your ip but you had just yesterday cleared the cache.

  • Anonymous
    August 27, 2006
    I have to analyze visitors history in a local net but i can't find by which way I can pass to WinINet caching functions locations of files. Then, I have set of index.dat files on server, certainly, in different folders. How can i read them using WinINet?

  • Anonymous
    September 11, 2006
    What about people with older computers or operating systems who can't get IE7? How do they delete that info stored about them?

  • Anonymous
    September 12, 2006
    myob: I'm sorry to hear that you are not running XP or 2003. I'm not aware of an easy way to cleanup these files on previous versions of IE.

  • Anonymous
    January 29, 2008
    where do i go on the site to get the forensic tool to decode the index

  • Anonymous
    June 26, 2008
    I created a batch file (below) that cleans up a bunch of stuff including clearing IE's history, cookies, temporary internet files, and index.dat files. I run it (via a shortcut) whenever needed. Also, I'll periodically log off then log in as a different user (with admin privs), which releases the index.dat files for that profile. Then upon running the batch file, all the index.dat files are deleted. After logging back in as the regular user, the files are recreated. rmdir /S /Q "C:Documents and SettingsuserApplication DataAdobe" rmdir /S /Q "C:Documents and SettingsuserApplication DataMacromedia" rmdir /S /Q "C:Documents and SettingsuserApplication DataMicrosoftInternet ExplorerUserData" del /F /A:H /Q "C:Documents and SettingsuserLocal SettingsApplication Data*.db" del /F /Q "C:Documents and SettingsuserLocal SettingsApplication DataMicrosoftMedia Player" del /F /Q "C:Documents and SettingsuserLocal SettingsApplication DataMicrosoftWindows Media11.0" del /F /Q "C:Documents and SettingsuserApplication DataMicrosoftMedia Player*.wpl" del /F /Q "C:Documents and SettingsuserApplication DataMicrosoftOfficeRecent" del /F /A:H /Q "C:Documents and SettingsuserApplication DataMicrosoftOfficeShortcut Bar*.tmp" del /F /A:H /Q "C:Documents and SettingsuserCookies" rmdir /S /Q "C:Documents and SettingsuserLocal SettingsHistory" rmdir /S /Q "C:Documents and SettingsuserLocal SettingsTemporary Internet Files" del /F /Q "C:Documents and SettingsuserLocal SettingsTemp" rmdir /S /Q "C:Documents and SettingsuserLocal SettingsTemp" mkdir "C:Documents and SettingsuserLocal SettingsTemp" rundll32.exe InetCpl.cpl,ClearMyTracksByProcess 255 pause exit

  • Anonymous
    August 28, 2009
    You can read these files using index.dat Viewer™, available for free at http://www.pointstone.com/products/ ~Hawk~

  • Anonymous
    December 28, 2009
    Index.dat Suite by Ur I.T. Mate Group and Steven Burn (2007) is supposed to delete all index.dat files, or at least those that you want to delete. I have run it several times with no problems.

  • Anonymous
    January 06, 2010
    The comment has been removed

  • Anonymous
    February 21, 2010
    I work with IE8 on Vista. After deleting cookies from IE using "Delete History" button, I still have not-empty index.dat in my RoamingMicrosoftWindowsCookies folder. I can see whole browsing history using notepad - index.dat is NOT filled with blanks, as I expected. "Unlocker" doesn't help delete file, even "Index.dat Suite" doesn't help (nothing changed after reboot) :-/

  • Anonymous
    February 22, 2010
    kit10: Did you leave "preserve favorite website data" checked?

  • Anonymous
    March 30, 2010
    So...here is a question about this file.... My current field of work depends on being able to reconstruct browsing history and if the suspect used IE, I go after the index.dat file. I've noticed some odd output from some forensics tools regarding this file, though.  I'll see large holes in the history when evaluating this output.  Am I not processing all the files or is this something particular to the index.dat file? As an example, I'll see URLs for one month and than the next URL has a date attached to it that is several months later.  I have no reason to beleive that browsing didn't occur, but somehow the index.dat file didn't record it?   Can you explain this behavior?