TFS Administrator chores – dealing with the space offender

 

These are the days of cheap storage - but even the cheap storage may run out. And running Team Foundation Server storing artifacts in its (multiple) databases may use up your space rack faster than you might have expected (and if you want to know what to expect, refer to this classical post by Buck Hodges on database size calculations).

If that happens, the most probable culprit is version control database (TfsVersionControl) – in other words, all these files that people check in into version control. The size of the file matters because TFS stores difference only for each new revision of “small” files but for the “large” files every new revision gets full-blown copy (by default TFS considers the file to be large if it is over 16 Mb - read more on that topic in my previous post).

There are several ways of making sure that your users do not fill up your version control with memory dumps, images of installation CDs and such. Mind you – I am not saying that large files do not belong to version control; I am saying that the addition of large files should be a) conscious step and b) “revisionless” (i.e. with no versioning).

Myself, I have been always ambivalent about storing large binary thingies in source control – on one hand, you get all content in one place (which is mighty convenient for builds etc.), on the other hand, many users will probably check in the content that does not belong in source control. So here is my hit list of  measures to deal with large files in version control

  • Educate your user – make sure your average user understands that DVD ISO added to version control ends up being transmitted and stored in the database; perhaps what the user is looking for is file server, not version control
  • Make user aware of his actions – it is possible to write check-in policy that would alert the user at the time of check-in, that the files being checked in are large and perhaps should not be in version control. And then, even if the user decides to override the policy you may run report on policy overrides
  • Monitor your storage – if high level prevention and low level prevention fail, you can query the database to identify the offending files. The query below (with usual caveats – it is AS IS etc.) will give you a list of large files in the database (it will not take into account the summary size of all versions, only the latest version):
 DECLARE @LargeFile int;
 -- return files larger than 16 Mb
 SET @LargeFile = 16 * 1024 * 1024; 
  
 USE TfsVersionControl; –– use source control DB 
 SELECT -- item path 
     Versions.ParentPath + Versions.ChildItem AS ItemPath,
     -- size of latest version in DB 
     Files.CompressedLength AS DatabaseSize, 
     -- size of original file
     Files.FileLength AS [Size], 
     -- whether item deleted
     CASE WHEN Versions.DeletionId = 0 THEN 0 
         ELSE 1 END AS Deleted 
 FROM tbl_File Files, tbl_Version Versions
 WHERE -- get item latest version 
     Versions.VersionTo = 2147483647 
     -- join to table with sizes
     AND Versions.FileId = Files.FileId 
     -- return only large files
     AND Files.CompressedLength > @LargeFile 
 ORDER BY ItemPath;

I would be happy to hear your horror stories of the application of the above query; mine was nothing more than a bunch of ISO images checked in :)

Thanks for reviewing the query go to Chandru Ramakrishnan

Comments

  • Anonymous
    March 30, 2009
    Thanks for this useful query. However, 1610241024*1024 would make 16 GB.

  • Anonymous
    March 31, 2009
    Thanks Thys, fixed that

  • Anonymous
    April 01, 2009
    In my previous post I talked about management of large files in TFS version control database. Today I’d