Command logging in TFS

By default all commands are logged in TFS 2010 in two tables; tbl_Command and tbl_Parameter.  tbl_Command contains every command that was executed against TFS, while tbl_Parameter is only populated if there is an error executing the command or the command takes a long time to execute.  This is very useful for helping diagnose performance issues with your system.  However, if you are designing custom code and you have a bug like a non-terminating loop around a failed command, many rows can be written to these tables in a short period of time.  All of this logging can cause the TFS DB to expand rapidly.

My customer recently found that their TFS DB increased in size by around 100GB over the course of a weekend.  This sudden increase was caused by a failed command that ran in a loop all weekend.  This added 2-3 million rows to both the tbl_Command and tbl_Parameter tables in TFS.  The resulting 100GB increase caused failures across the maintenance cycle for backups as the backup volume ran out of disk space.  This was a sudden and dramatic growth of the DB that wasn't expected.  Running the following SQL, we quickly discovered that the tbl_Parameter table had become 20% of the DB size!

 

 ------------------SQL Server SCRIPT ----------START-------------------------
 -- Create the temp table for further querying
 CREATE TABLE #temp(
 tbl_id int IDENTITY (1, 1),
 tbl_name varchar(128),
 rows_num int,
 data_space decimal(15,2),
 index_space decimal(15,2),
 total_size decimal(15,2),
 percent_of_db decimal(15,2),
 db_size decimal(15,2))
 -- Get all tables, names, and sizes
 EXEC sp_msforeachtable @command1="insert into #temp(rows_num, data_space, index_space) exec sp_mstablespace '?'",
 @command2="update #temp set tbl_name = '?' where tbl_id = (select max(tbl_id) from #temp)"
 -- Set the total_size and total database size fields
 UPDATE #temp
 SET total_size = (data_space + index_space), db_size = (SELECT SUM(data_space + index_space) FROM #temp)
 -- Set the percent of the total database size
 UPDATE #temp
 SET percent_of_db = (total_size/db_size) * 100
 -- Get the data
 SELECT *
 FROM #temp
 ORDER BY total_size DESC
 -- Comment out the following line if you want to do further querying
 DROP TABLE #temp
 --------------------SQL Server SCRIPT FINISH----------------------------------------------
 

The default retention period for this logging is 14 days.  A nightly job runs that cleans up these tables according to that retention period.  Normally, this isn't a problem under normal workloads, but if you have a sudden increase in the size of these two tables, you may wish to remove the entries prior to the retention period.

 Grant Holiday wrote on his BLOG instructions for how you can adjust the retention period of the logging.

Comments

  • Anonymous
    April 30, 2014
    We had the exact same experience: an out of control loop of failed operations on the TFS 2010 server all weekend resulting in an IMMENSE number of rows added to tbl_Parameter and tbl_Command. We are debating whether to manually truncate the tables or to let prc_PruneCommands attempt its cleanup after 14 days. On one hand, I'm afraid the stored proc will fail if it doesn't remove the records intelligently. On the other, I'm not sure how to safely specify the records to delete. We're leaning toward wait-and-see, monitor the 12am job runs over the weekend. But we would love to have a definitive answer as to whether the cleanup proc will fail!

  • Anonymous
    April 30, 2014
    Tod - your best bet for proving this would be to test it on a separate system.  That's what we did at my customer.  We took a backup and restored it to a test system and tested it there to prove that it worked.