Condividi tramite


Temporal, where art thou data? – Establishing the Temporal Basics part 1

by Mike Boswell - Data Platform Solution Architect.

At a recent SQL event I presented on the use cases for Temporal Tables.

Imagine the business problem where you would like to keep the history of dimension attributes without any additional ETL code.

Let's take a retail website where products table attributes are changing on a daily basis (this is our dimension table) and orders (this is our fact table) need to have the correct product attributes. For example, you might want to look at the products sold at the start of the month and then compare at the end of the month. Temporal tables will allow you to report on data as it was in the past and then compare to current date.

Other areas temporal tables can solve is the ability to track changes made by users without having to implement triggers onto tables. Again, temporal tables can handle data audit without you having to make changes to your applications. Combining temporal tables with the increased auditing capability of SQL Server 2016 can not only show you what action was performed but how it affected the data.

Finally, what about the "help" I need to recover some deleted rows? The usual answer would be to restore a backup, extract the rows and then import them back in. With temporal enabled it is possible to pull the data back from the history table and improve your Recovery Time Objective (RTO).

Basics

In 2011 there was an updated ANSI standard issued for SQL which included the temporal definition (https://en.wikipedia.org/wiki/SQL:2011). Microsoft have implemented their version using the "PERIOD FOR SYSTEM_TIME" and "SYSTEM_VERSION". For the full breakdown of the architecture you could read https://msdn.microsoft.com/en-us/library/dn935015.aspx but as I like to understand how the feature works by using it I hope you will read on.

There are three key points to a temporal table:

  1. You must have a datetime2 column to store valid start and the end dates in. This is to help SQL Server answer when the row was valid to and from.
  2. The end date should have a default constraint of '9999-12-31 23:59:59.99'
  3. There will always be a current table and a history table.

I'd recommend creating a history table or naming it in the CREATE/ALTER statement before you enable temporal tables. This will stop you ending up with a number of random tables starting MSSQL_TemporalHistoryFor_<current_temporal_table_object_id>_[suffix] which are inconsistent with your development naming strategy.

Before we can set off demoing the basics, please download the AdventureWorksDW2016CTP3 databases from https://www.microsoft.com/download/details.aspx?id=49502. Once restored on SQL Server 2016 RC0 or later, run this setup script:

/* DISCLAIMER: This code is not supported under any Microsoft standard support program or service. This code and information are provided "AS IS" without warranty of any kind, either expressed or implied. The entire risk arising out of the use or performance of the script and documentation remains with you. Furthermore, Microsoft or the author shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, including, without limitation, damages for loss of business profits, business interruption, loss of business information or other pecuniary loss even if it has been advised of the possibility of such damages. */  USE [AdventureworksDW2016CTP3]GO  /*     If Temporal system-versioned tables already exist, first drop versioning */  IFEXISTS(SELECT * FROMsys.tables WHERE [Name] ='WebProducts'AND temporal_type = 2)     ALTERTABLE [dbo].[WebProducts] SET (SYSTEM_VERSIONING = OFF);  /*     Drop all objects from the sample if exist:         view         stored procedures         tables */  DROPTABLEIFEXISTS [dbo].[WebProducts]; DROPTABLEIFEXISTS [dbo].[WebProducts_History]; DROPTABLEIFEXISTS [dbo].[FactSQLSatSales]; DROPSEQUENCEIFEXISTS [dbo].[WebProductsSeq];  --Create Table CREATETABLE [dbo].[WebProducts](     [ProductKey] [int] NOTNULL,     [ProductAlternateKey] nvarchar(25)NULL,     [ProductSubcategoryKey] [int] NULL,     [EnglishProductName] [nvarchar](50)NOTNULL,     [SafetyStockLevel] [smallint] NULL,     [ReorderPoint] [smallint] NULL,     [ListPrice] [money] NULL, CONSTRAINT [PK_WebProducts_ProductKey] PRIMARYKEYCLUSTERED (     [ProductKey] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE =OFF, IGNORE_DUP_KEY=OFF, ALLOW_ROW_LOCKS=ON, ALLOW_PAGE_LOCKS=ON) ON [PRIMARY]) GO  SETANSI_PADDINGON GO  ALTERTABLE [dbo].[WebProducts] WITHCHECKADDCONSTRAINT [FK_WebProducts_DimProductSubcategory] FOREIGNKEY([ProductSubcategoryKey]) REFERENCES [dbo].[DimProductSubcategory]([ProductSubcategoryKey]) GO  ALTERTABLE [dbo].[WebProducts] CHECKCONSTRAINT [FK_WebProducts_DimProductSubcategory]GO  insertinto [dbo].[WebProducts]    SELECT [ProductKey]    ,[ProductAlternateKey]     ,[ProductSubcategoryKey],[EnglishProductName],[SafetyStockLevel],[ReorderPoint],[ListPrice]FROM [AdventureworksDW2016CTP3].[dbo].[DimProduct]go GO  CREATETABLE [dbo].[FactSQLSatSales](     [ProductKey] [int] NOTNULL,     [SalesOrderNumber] [nvarchar](20)NOTNULL,     [SalesOrderLineNumber] [tinyint] NOTNULL,     [RevisionNumber] [tinyint] NULL,     [OrderQuantity] [smallint] NULL,     [UnitPrice] [money] NULL,     [InsertDate] datetime2(0)NULL, CONSTRAINT [PK_FactSQLSatSales_SalesOrderNumber_SalesOrderLineNumber] PRIMARYKEYCLUSTERED (     [SalesOrderNumber] ASC,     [SalesOrderLineNumber] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE =OFF, IGNORE_DUP_KEY=OFF, ALLOW_ROW_LOCKS=ON, ALLOW_PAGE_LOCKS=ON) ON [PRIMARY])ON [PRIMARY] GO

The above script creates two tables, one which is going to be used as a product dimension (WebProducts) and the other which will be used to store our facts. The amount of data and columns has been cut down to allow us to demonstrate the feature easily.

Currently the dbo.WebProducts table is a type 0 dimension and history isn't preserved. However, the business would like to start keeping the history in a separate table while the original dimension table keeps the latest dimension member version. To enable this change, without having to make any changes to the ETL process we first have to enable the dbo.WebProducts table with SYSTEM_VERSIONING. Before we do this we are going to create our history table:

CREATETABLE [dbo].[WebProducts_History](     [ProductKey] [int] NOTNULL,     [ProductAlternateKey] nvarchar(25)NULL,     [ProductSubcategoryKey] [int] NULL,     [EnglishProductName] [nvarchar](50)NOTNULL,     [SafetyStockLevel] [smallint] NULL,     [ReorderPoint] [smallint] NULL,     [ListPrice] [money] NULL,     [SysStartTime] datetime2(0)NOTNULL, [SysEndTime] datetime2(0)NOTNULL, );

Notice that the table has the two extra columns which we spoke of before? ([SysStartTime] and [SysEndTime]). The data types on this must be a datetime2(0) as this is the only data type which is supported by temporal tables to store the valid from and too range in.

The next recommendation is to create a columnstore index upon the history table. The reason is that if you are expecting a high number of rows and lots of changes to be pushed to the history table then this will usually give you the best compression ratio. If you decide against a columnstore index be aware that history tables use PAGE compression by default. Foreign Keys on the history table are not supported.

CREATECLUSTEREDCOLUMNSTOREINDEX IX_WebProducts_HistoryON dbo.WebProducts_History;

The next task is to add two additional columns ([SysStartTime] and [SysEndTime]) and make sure that these are HIDDEN. When I say hidden, SQL Server knows about them but they are handled as internal columns and not returned as outputs unless explicitly named.

ALTERTABLE [dbo].[WebProducts]ADD SysStartTime datetime2(0) GENERATEDALWAYSASROWSTARTHIDDEN CONSTRAINT DF_SysStart DEFAULTSYSUTCDATETIME() , SysEndTime datetime2(0) GENERATEDALWAYSASROWENDHIDDEN CONSTRAINT DF_SysEnd DEFAULTCONVERT(datetime2 (0),'9999-12-31 23:59:59'), PERIODFORSYSTEM_TIME (SysStartTime, SysEndTime); GO

With the two extra columns added we can enable our [dbo].[WebProducts] table using the keyword 'SYSTEM_VERSIONING = ON':

ALTERTABLE [WebProducts]SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = [dbo].[WebProducts_History])); go

We are telling SQL Server to store all the history in the previous table we created and a number of rules are now applied by SQL Server:

  1. Inserts to [dbo].[WebProducts] will go into [dbo].[WebProducts].
  2. Updates to [dbo].[WebProducts] will move the old row to [dbo].[WebProducts_History] and insert a row into [dbo].[WebProducts].
  3. Deletes to [dbo].[WebProducts] will move the old row to [dbo].[WebProducts_History] and there will be no row in [dbo].[WebProducts].
  4. Merge operations invoke one of the rules above.
  5. Once SYS_VERSIONING is switched on you cannot delete, insert or update rows held in the history table
  6. For SELECT statements that affect the history table, SELECT permission is required on both the current table and the history table.

Let's test the theory about HIDDEN columns. If we issue:

SELECT* from [WebProducts];

We will get all the columns minus the [SysStartTime] and [SysEndTime].

If we try and name these in a SELECT then they will appear:

SELECT*, [SysStartTime], [SysEndTime]from [WebProducts];

 

If we want to see these by default in our output without naming them then we can unhide them using an alter table command:

ALTERTABLE dbo.[WebProducts]ALTERCOLUMN SysStartTime DROPHIDDEN; ALTERTABLE dbo.[WebProducts]ALTERCOLUMN SysEndTime DROPHIDDEN;

A 'SELECT *' will now output the fields:

SELECT* from [WebProducts];

 

Let's test out what happens to INSERTS to the table:

INSERTINTO [dbo].[WebProducts]([ProductKey] ,[ProductAlternateKey],[ProductSubcategoryKey],[EnglishProductName],[SafetyStockLevel],[ReorderPoint],[ListPrice]) VALUES (1 ,'BK-R50R-64' ,2,'Road-650 Red, 64' ,5,1,209) GO

Our theory tells us that it should land into the current table and the history table should be unaffected:

select*from [dbo].[WebProducts]select*from [dbo].[WebProducts_History]

 

We can see that we have a new row in the current table and no rows exist in the history table at the moment. The SysStartTime field has been set to the datetime the row was inserted. As all the rows are valid, and in the current table, the SysEndTime has been set to the year 9999. Next, let's see what happens when we issue a DELETE.

DELETEFROM [dbo].[WebProducts] where [ProductKey]=1

This time when we select from both the current and history table we see that the row has moved. It is no longer in the current table but has moved to the History table:

The important thing to note is that the SysEndTime has been changed to the datetime of when the row was deleted. If we now wanted to recover this row it would be easy to reference the history table and to insert it back into the current table.

Moving onto Updates, we update one of the remaining rows in the [dbo].[WebProducts] to see the effect on the history table:

UPDATE [dbo].[WebProducts] set [EnglishProductName]=[EnglishProductName]+' SQL Sat'where productkey=322

Now when we look at [dbo].[WebProducts] and in the history table we can see that the [dbo].[WebProducts] is holding the most recent change but as opposed to us losing the history, like a Dimensions Type 0 update, the previous state has been copied to the history table giving us a Dimensions Type 4 out of the box.

 

Time Travel

Onwards to some time travel to see the current state of the [dbo].[WebProducts] as of particular times.

Let's start with the datetime of '2016-03-15 18:55:26'. If we take into account the SysEndTime of the records, in the history table, then we should see that the record with product key 1 and 322 will be returned from the history table. This will be unioned with product key 324.

 

SELECT [ProductKey],     [ProductSubcategoryKey],     [EnglishProductName],     [SafetyStockLevel],     [ReorderPoint],     [ListPrice],     [SysStartTime], [SysEndTime]    from [WebProducts]    FORSYSTEM_TIMEASOF'2016-03-15 18:55:26'

 

 

In the next example we will use the datetime of '2016-03-15 18:57:26'. If we take into account the SysEndTime of the records, in the history table, then we should see that the record with product key 322 will be returned from the history table. This will be unioned with product key 324 from the current table.

 

SELECT [ProductKey],     [ProductSubcategoryKey],     [EnglishProductName],     [SafetyStockLevel],     [ReorderPoint],     [ListPrice],     [SysStartTime], [SysEndTime]    from [WebProducts]    FORSYSTEM_TIMEASOF'2016-03-15 18:57:26'

 

 

 

There are other SYSTEM_TIME functions, which can be used to time travel, such as CONTAINED IN, BETWEEN, ALL and FROM <> TO <>.

 

Meta Data

In SQL Server Management Studio you will see the [dbo].[WebProducts] tells you it is 'System-Versioned' and expanding the node shows you the related history table. Note you can only have one related history table.

 

Under the table properties you can see Table temporal type:

To check on the status using monitoring tools Microsoft have added some additional columns to allow you to see the related object_id of the history table. Running this script:

SELECT T1.object_id, T1.name as TemporalTableName,SCHEMA_NAME(T1.schema_id)AS TemporalTableSchema, T2.name as HistoryTableName,SCHEMA_NAME(T2.schema_id)AS HistoryTableSchema, T1.temporal_type_descFROMsys.tables T1 LEFTJOINsys.tables T2 ON T1.history_table_id = T2.object_id WHERE T1.temporal_type <> 0ORDERBY T1.temporal_type desc; go

 

Gives us the following output to analyse:

For SQL Server 2016 another system table has been introduced called sys.periods. This table defines the temporal table type – at the moment temporal tables only support SYSTEM_TIME_PERIOD which could prohibit some adoption of the technology. This is because Developers might want control over the dates which are inserted into the start and end times.

select name, period_type, period_type_desc, object_name(object_id)as obj_name fromsys.periods;

 

Conclusion

In our first look at Temporal tables you will have seen how easy it is to take an existing table and capture history. Previously this would have involved setting up triggers which would have increased the time to deploy a solution. A few lines of code and you are ready to go!

In my next post I will try to address how you can use this for Type 2 Dimensions.