SMS heartbeat discovery and delete aged discovery task

From time to time we get calls from admins who have updated the frequency at which the 'delete aged discovery' task runs and then find valid computers start to disappear from collections.  I thought I'd spend a few minutes discussing how adjusting the 'delete aged discovery' settings can impact what information is retained in the database.

Consider an example scenario.  Let's assume an SMS admin decides that they want to purge all systems from the database that have not reported for 3 days.  To do this they adjust the retention value on the 'delete aged discovery' task to 3 days.  This set, the admin goes on to the next project.  The next day, the admin comes in and notices that the number of systems in the database has reduce to a level well below what would have been expected based on the deletion settings configured. 

The problem in the above example is that the admin has set this new deletion interval without understanding how SMS determines whether a machine is a candidate for removal.  The most common conflict in this scenario is between the deletion interval and the heartbeat discovery interval. 

To simplify things, lets assume that the only discovery method enabled on this site is heartbeat discovery and that heartbeat discovery interval settings have been left at the default of 7 days.  In this case, 'delete aged discovery' has been set to occur well before some of the SMS clients will have generated their regular heartbeat to update the database.  As such, SMS will see those systems as aged and will proceed to remove them - which also removes any associated inventory data.  It is a good rule of thumb to ensure the delete aged discovery interval is at least twice that of the heartbeat interval.  So, if heartbeat discovery is set for 7 days then a good interval for 'delete aged discovery' might be 14 days.  This will help prevent removing systems that just haven't heartbeated yet or may be offline for a period of time.  Of course, the shorter the heartbeat interval the less applicable the rule of thumb becomes.as simply doubling the times may not be sufficient.

This problem can get a bit more complex by introducing other discovery methods.  If, for example, two or three discovery methods are enabled that should discover a particular system - if at least one of those discovery methods has discovered the client within the time period configured for the deletion task, the system will not be deleted.

There are also other deletion tasks, like the 'delete inactive client discovery data' and 'clear undiscovered clients' - but that's enough for this entry.

-Steve

Comments