It's never too late for Partitioning your Tables

Only a few developers design their tables for partitioning from ground up, simply because they do not envision that a table once might become so big that partitioning would become a real advantage. And even if they do, the fact that partitioning is an Enterprise-only feature (at least until SQL 2016 SP1) keeps them away from using it.

Recently, I was involved in a customer case where they did have an audit table where around 10 events were logged per second, all day long. That makes for a little less than 1 Mio. entries per day. And there was the need for a retention period of 30 days. So, once every day they had to delete about 1 Mio. entries from this 30 Mio. row table. This daily delete job was a constant blocker for all other transactions trying to insert into this table during the delete due to the exclusive table lock that was held on the table by the delete job. Not to mention the huge amount of data added to the transaction log with each delete.

Since the customer was running on Enterprise Edition, the obvious solution was to partition the table by day. Hence, the daily delete job would be replaced by a job that switches the oldest partition out into an archive table, merging the last partition, and deleting the archive table (since this was on SQL 2014, the new TRUNCATE TABLE WITH PARTITIONS wasn’t yet available). But partitioning the existing table by rebuilding the clustered index would require a maintenance window of several minutes. A price, the customer was not willing to pay. At least, not without evaluating alternatives.

So, I came up with a solution that partitions the table within milliseconds and without any major disruption for the transactions running in parallel. You think, that’s impossible? Stay tuned!

The crucial point is to dismiss the idea of partitioning the existing data and instead just partition all forthcoming data. I.e. all existing data go into one big partition, whereas all new data go into a new partition that is added every day. It's important that all indexes are already conformant with partition aligned indexes, i.e. the clustered index and all unique indexes must contain the partitioning column as a key column. In our case, the table only had a clustered index on the column that we intended to partition the table with. Hence, all requirements were already met.

We started with a partition layout consisting of 3 range right partitions. The left most partition was intentionally left empty, to allow for painless partition merging. The middle partition would store all rows that currently exist in the non-partitioned table. And starting by the next day, all new rows would go into the right most partition. From then on, each new day the customer would expand the partition layout by one new partition until they would have reached the final layout of 30 data partitions. At that point, they would start dropping the oldest partition each day, starting with the big partition we created right at the beginning.

How to code this solution

In the following I’ll outline the steps to be performed for partitioning the table with name AuditTab by the column create_date of type datetime2.

Let’s start with creating the partition function and scheme to be used for partitioning our table. First, we must determine the date of the oldest entry in our table. Although, not necessary, we round it down to 00:00 of that day (we want all boundary values of our partition function to be just dates without any time portion). Since, we want the next partition to start the following day, we take this date as the upper boundary. Although, there should not exist any record in our table with a date in the future, we adjust the upper bound just in case (probably it would be better to return an error in that case, but for clarity, we omit error handling here).

DECLARE @start_dt datetime2, @end_dt datetime2; SELECT @start_dt = MIN(create_date) FROM dbo.AuditTab; SET @start_dt = DATEADD(dd, DATEDIFF(dd, 0, @start_dt), 0); SELECT @end_dt = MAX(create_date) FROM dbo.AuditTab; SET @end_dt = CASE WHEN @end_dt < GETDATE() THEN GETDATE() ELSE @end_dt END; SET @end_dt = DATEADD(dd, DATEDIFF(dd, 0, @end_dt) + 1, 0);CREATE PARTITION FUNCTION PF_Date (datetime2) AS RANGE RIGHT FOR VALUES (@start_dt, @end_dt); CREATE PARTITION SCHEME PS_Date AS PARTITION PF_Date TO ([PRIMARY], [PRIMARY], [PRIMARY]);

Our AuditTab table has been created in the PRIMARY filegroup. It’s important that the middle partition is mapped to the same filegroup where the original table was created in.

Next, we create a check constraint on our table enforcing that all current rows (plus those being added until we start partitioning the table) are within the boundaries of the middle partition. This is the most expensive operation in the whole process. Although, it will take only a few seconds on a table of this size, the schema modification lock, which is being acquired while creating this check constraint, blocks all DML operations on the table. Therefore, we perform this operation right ahead (preferably at a period of lower insert activity).

DECLARE @check nvarchar(max), @sql nvarchar(max);SET @check = N' CHECK (create_date >= ''' + CONVERT(nvarchar, @start_dt, 112) + N''' AND create_date < ''' + CONVERT(nvarchar, @end_dt, 112) + N''')';SET @sql = N'ALTER TABLE dbo.AuditTab ADD CONSTRAINT chk_date' + @check; EXEC sp_executesql @sql;

Now, we create a temporary table that has an identical structure as our original table, is in the same filegroup, has the same indexes, and the same constraints. This is necessary, because later we want to switch the data from our original table into this temp table. Thereby, we can partition the empty original table, and later switch the data back from the temp table. This has the advantage that we do not alter the underlying database object, i.e. object ownership, all permission settings, and other constructs that reference the table, like triggers or extended properties, etc. remain unaffected.

CREATE TABLE dbo.AuditTab_Temp (…,create_date datetime2 NOT NULL, …) ON [PRIMARY]; CREATE CLUSTERED INDEX ix ON dbo.AuditTab_Temp (create_date);SET @sql = N'ALTER TABLE dbo.AuditTab_Temp ADD' + @check; EXEC sp_executesql @sql;

Now, we perform the actual transformation of the original table into a partitioned table. All operations are grouped within a transaction, so if anything fails, the transaction would be rolled back. First, we switch the original table into the temporary table we just created. Then, we partition the, now empty, original table by rebuilding the clustered index. Finally, we switch the temp table back into the middle partition of the now partitioned original table. Since we did create the check constraint that is necessary for this switch already on the original table and then on the temporary table, it is not necessary to perform this expensive check constraint creation on the temporary table within the transaction after switching the data from the original table into the temporary table.

BEGIN TRAN; ALTER TABLE dbo.AuditTab SWITCH TO dbo.AuditTab_Temp; CREATE CLUSTERED INDEX ix_AuditTab ON dbo.AuditTab (create_date) WITH (DROP_EXISTING = ON) ON PS_Date (create_date); ALTER TABLE dbo.AuditTab_Temp SWITCH TO dbo.AuditTab PARTITION 2; COMMIT TRAN;

Note, this transaction only consists of meta-data changes and therefore performs very fast. In principal, nothing has the potential to fail within this transaction since we enforce that all data fit into the target partition by means of the check constraint. Anyway, it’s recommended to put this transaction within a try-block, and rollback the transaction in the catch-block.

Finally, we drop the additional objects being created, i.e. the temp table and the check constraint.

DROP TABLE dbo.AuditTab_Temp; ALTER TABLE dbo.AuditTab DROP CONSTRAINT chk_date; ALTER TABLE dbo.AuditTab SET (LOCK_ESCALATION = AUTO);

We also set the LOCK_ESCALATION setting for our original table to AUTO, which causes lock escalations on the partition level, instead of the table level. Thereby, if saving space is a must, we may continue deleting the data that is older than 30 days from the big partition on a daily basis, but now without having to lock the whole table. Hence, the deletes do not affect the partitions where the new inserts are occurring and hence, increasing the concurrency.

Defining the partition management job

Now that the table is being partitioned, the only thing left is to define the daily partition management activities. We create a SQLAgent job for this, which runs every day before midnight, starting at the day where we partitioned the table. This job consists of one TSQL step executing the following batch:

DECLARE @fid int, @num_part int, @first_val sql_variant;SELECT @fid = function_id, @num_part = fanout FROM   sys.partition_functions WHERE name = 'PF_Date';IF @num_part >= 32 BEGIN CREATE TABLE dbo.AuditTab_Archive (…, create_date datetime2 not null, …); CREATE CLUSTERED INDEX ix ON dbo.AuditTab_Archive (create_date); ALTER TABLE dbo.AuditTab SWITCH PARTITION 2 TO dbo.AuditTab_Archive; DROP TABLE dbo.AuditTab_Archive;SELECT @first_val = value FROM sys.partition_range_values WHERE  function_id = @fid AND boundary_id = 1 AND parameter_id = 1;ALTER PARTITION FUNCTION PF_Date() MERGE RANGE (@first_val); ENDALTER PARTITION SCHEME PS_Date NEXT USED [PRIMARY]; ALTER PARTITION FUNCTION PF_Date() SPLIT RANGE (DATEADD(dd, DATEDIFF(dd, 0, GETDATE()) + 2, 0));

As long as we haven’t reached the final target of 32 partitions (note there is one empty partition at the beginning and one at the end), we just add a new partition, mapped to the PRIMARY filegroup, by splitting the last partition on the date that follows the next day. Hence, starting at midnight all new records will go into the partition preceding the new one (the last partition is always left empty so that the SPLIT operation is always performed on an empty partition). Once we have reached 30 partitions with data, we drop the partition with the oldest data, which always happens to be partition 2, by switching the data out into an archive table (which later is being dropped), and merging the (empty) partitions 1 and 2 using the lowest boundary value in our partition range.

Conclusion

In case you intend to partition a table where it is acceptable not to partition the existing data, but only all future data, using the strategy outlined above it is possible to partition even huge tables within seconds without major disruption for parallel workload.

By the way, this strategy also has been implemented in the new version of PTF (see https://ptf.codeplex.com/) by setting the parameter @fast_mode = 1 in procedure PartitionTable. Hence, in PTF executing the following command would perform the identical steps:

EXEC ptf.PartitionTable 'dbo.AuditTab', 'PS_Date', 'create_date', @fast_mode = 1