Freigeben über


Implementing LOB Storage in Memory Optimized Tables

Introduction

Memory optimized tables do not have off-row or large object (LOB) storage, and the row size is limited to 8060 bytes. Thus, storing large binary or character string values can be done in one of two ways:

  • Split the LOB values into multiple rows
  • Store the LOB values in a regular non-memory optimized (file based) table

This document will provide a mechanism for implementing the split of LOB values into multiple rows, and outline several mechanisms for retrieving the data.

The alternative solution of using a file based table for overflow data will also be discussed, along with options relating to error checking, and durability.

Associated with this document is a set of scripts covering all the TSQL code presented in this document, along with some simple testing scripts.

Approach Overview

The approach to be taken will allow for the processing of LOB data such that when the data length is greater than a predefined upper limit, segment size, it is broken down into smaller chunks.

If the data fits into a single segment then the process will default to operating on just a primary table with a single row for the key identifier. In this instance reading the data means just returning the data solely from the primary table.

If the data is too large for a single segment then it will be broken down into multiple segments and each part saved as a row with an associated part number; linked to the primary element via the key identifier. In this instance, reading the data then becomes a determination of how many segments the data is stored in, and either concatenating them into a single field or return the individual parts through a SELECT statement.

The procedures to support this approach would be as follows:

  • Insert a LOB column when the data length is less than a predefined segment size
  • Insert a LOB column when the data length is greater than the predefined segment size and a (max) column definition is needed
  • Provide Upsert operations that allow an entry to be made after an insert has previously been made; or to insert data if the current state is not known
  • Provide a SELECT process that returns the collection of LOB segments; taking into consideration the fact the data may not be segmented
  • Read a LOB column when the size is less than a predefined segment size
  • Read a LOB column when the size is greater than the predefined segment size, returning the data in a (max) datatype definition

When implementing your own solution consideration should be given to the predefined segment size. The goal should be such that this size is as small as possible but big enough that a large percentage of the applications data can possibly fit into a single segment.

If most of your LOB data is less than 2K in size then this should be predefined segment size. Always using a maximum segment size of 8K could result in a lot of wasted memory, and consequently affect performance. For small transient data, such as session data and small encrypted data elements, adjusting this value can impact the overall performance gains one will get with memory optimized tables.

Of course, if you are saving large LOB data elements, such that most of your LOB data is larger than the maximum segment size of 8K, then this optimization will not be feasible. Thus it will be better to have a segment size of 8K to reduce the number of segments the data needs to be split into.

The other added advantage of having the data fit into a single segment is that the data handling Stored Procedures can all be natively compiled. In this case the application would need to know the size of the data it is processing, which may or may not be possible.

The primary approach outlined may not be suitable for storing very large quantities of LOB data which are accessed infrequently; say large document or images. In this instance a standard file based table approach may be more suitable. However for highly transient data such as session data, encrypted authentication codes, etc. using this approach can dramatically speed up access.

Implementation

For this implementation I have created a new database, called [MemBinaryStore]. In reality the object definitions below would be placed into your own application database:

CREATE DATABASE [MemBinaryStore]

    CONTAINMENT = NONE        

    ON PRIMARY (NAME = [MemBinaryStore_Primary], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Prim.mdf', SIZE = 10 MB, FILEGROWTH = 10 MB),

    FILEGROUP [MemBinaryStore_Mod] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = [MemBinaryStore_Mod], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore.dir'),

    FILEGROUP [MemBinaryStore_Data] (NAME = [MemBinaryStore_Data1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Data1.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB),

        (NAME = [MemBinaryStore_Data2], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Data2.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB)

    LOG ON (NAME = [MemBinaryStore_Log1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Log1.ldf', SIZE = 5 MB, FILEGROWTH = 5 MB)

    COLLATE Latin1_General_100_CI_AS;

GO

 

ALTER DATABASE [MemBinaryStore]

MODIFY FILEGROUP [MemBinaryStore_Data] DEFAULT;

GO

 

One must remember Memory Optimized tables have a restriction on the code page for (var)char columns. These columns must use code page 1252, hence the reason the collation has been specified.

Table Objects

This implementation uses two base tables. A Primary table that holds the base binary information and also all the necessary data when the LOB data can fit into a single segment. An Extended table that holds any necessary LOB data segments; multiple records existing for each primary entry when the data is split into segments.

CREATE TABLE [dbo].[BinaryItem]

(

    [BinaryItemId] binary(16) NOT NULL        

        CONSTRAINT [PK_BinaryItem] PRIMARY KEY NONCLUSTERED HASH

        WITH (BUCKET_COUNT = 1000000),

    [ItemCreated] datetime NOT NULL,

    [BinaryData] varbinary(6000) NULL,

    [BinaryExParts] int NOT NULL

)

WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

GO

 

CREATE TABLE [dbo].[BinaryItemEx]

(

    [BinaryItemId] binary(16) NOT NULL

        INDEX IDX_BINARYITEMEX_ID HASH

        WITH (BUCKET_COUNT = 1000000),

    [BinaryPart] int NOT NULL,

    [BinaryDataEx] varbinary(6000) NOT NULL,

        CONSTRAINT [PK_BinaryItemEx] PRIMARY KEY NONCLUSTERED HASH

        ([BinaryItemId], [BinaryPart])

        WITH (BUCKET_COUNT = 20000000)

)

WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

GO

 

When defining your tables care should be taken in defining the correct bucket count. As a very rough guideline, the bucket count should be set to 1-2X the maximum expected cardinality of the table.

One of the differences in defining the [BinaryItemEx] table is the use of the IDX_BINARYITEMEX_ID index. If this were a file based table the Index would not be needed as it would be covered by the Primary Key index. However, with hash based indexes this is not the case, as outlined in this blog post "Troubleshooting Common Performance Problems with Memory-Optimized Hash Indexes" (https://blogs.technet.com/b/dataplatforminsider/archive/2013/10/09/troubleshooting-common-performance-problems-with-memory-optimized-hash-indexes.aspx).

In this instance I have chosen a key type of binary(16), rather than a varchar() or uniqueidentifier datatype. This is mostly to avoid specifying a binary collation that is needed for the primary key and index specifications; such as "nvarchar(34) collate Latin1_General_100_BIN2".

This collation requirement is necessary because indexes on (n)(var)char columns can only be specified with BIN2 collations.

Write Procedures

The process for inserting or updating LOB items is dependent on the length of the LOB data. Calling application should determine what procedure to call based on the size of the data being passed in.

If the data is less than the selected segment size then it is just a simple INSERT operation, which can be performed using a natively compiled Stored Procedure:

CREATE PROCEDURE [dbo].[InsertBinaryItem]

(

@BinaryItemId uniqueidentifier,

@BinaryData varbinary(6000)

)

WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER

AS

BEGIN ATOMIC WITH

(

    TRANSACTION ISOLATION LEVEL = SNAPSHOT,

    LANGUAGE = N'English'

)

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

 

    INSERT INTO [dbo].[BinaryItem]

    ([BinaryItemId], [ItemCreated], [BinaryData], [BinaryExParts])

    VALUES(@binaryId, @now, @BinaryData, 0);

END

You can of course define the primary key to be whatever datatype is applicable for your application.

In the instance that the data length is larger than the defined segment size you have to pass in the data using a (max) datatype definition. In this case the procedure will have to chunk the data down and INSERT it into the extensions table:

CREATE PROCEDURE [dbo].[InsertBinaryItemEx]

(

    @BinaryItemId uniqueidentifier,

    @BinaryData varbinary(max)

)

AS

BEGIN

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

 

    DECLARE @maxSegmentLength int = 6000;

    DECLARE @binaryDataLength int = DATALENGTH(@BinaryData);

    DECLARE @maxSegment int = CEILING(@binaryDataLength / 1.0 / @maxSegmentLength);

    DECLARE @segmentValue varbinary(6000);

 

    -- Insert the header primary

    INSERT INTO [dbo].[BinaryItem] WITH (SNAPSHOT)

        ([BinaryItemId], [ItemCreated], [BinaryData], [BinaryExParts])

        VALUES(@binaryId, @now, NULL, @maxSegment);

 

    -- Now insert the chunked binary data

    DECLARE @part int = 1;

    WHILE (@part <= @maxSegment)

    BEGIN

        SET @segmentValue = SUBSTRING(@BinaryData, ((@part -1) * @maxSegmentLength + 1), @maxSegmentLength);

 

        INSERT [dbo].[BinaryItemEx] WITH (SNAPSHOT)

            ([BinaryItemId], [BinaryPart], [BinaryDataEx])

            VALUES (@binaryId, @part, @segmentValue);

 

        SET @part += 1;

    END

END

In this case the binary header element is still written but the binary block is NULL. Also the part count, [BinaryExParts], is defined as the number of segments the data has been split into.

Breaking down the LOB column is merely a simple case of using the SUBSTRING, which can be used on binary data, to define each segment.

For the INSERT/UPDATE, upsert operations, the process follows the same pattern. In the case of the data being less than the selected segment size an UPDATE operation is first performed. If no entry is found then an INSERT operation is performed:

CREATE PROCEDURE [dbo].[UpsertBinaryItem]

(

    @BinaryItemId uniqueidentifier,

    @BinaryData varbinary(6000)

)

WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER

AS

BEGIN ATOMIC WITH

(

    TRANSACTION ISOLATION LEVEL = SNAPSHOT,

    LANGUAGE = N'English'

)

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

    DECLARE @itemIdFound binary(16) = NULL;

 

    -- Delete any extended data

    DELETE FROM [dbo].[BinaryItemEx]

    WHERE [BinaryItemId] = @binaryId;

 

    -- First try an update as the primary may already exist

    UPDATE [dbo].[BinaryItem]

    SET @itemIdFound = [BinaryItemId],

        [BinaryData] = @BinaryData,

        [BinaryExParts] = 0

    WHERE [BinaryItemId] = @binaryId;

 

    -- If no primary found the perform the insert

    IF (@itemIdFound IS NULL)

    BEGIN

        INSERT INTO [dbo].[BinaryItem]

        ([BinaryItemId], [ItemCreated], [BinaryData], [BinaryExParts])

        VALUES(@binaryId, @now, @BinaryData, 0);

    END

END

In this case one has to remember to ensure that any spurious extended binary segments are first deleted before the new values are upserted.

For the case of the data being larger than the selected segment size the process is very similar, except that the data is chunked as in the case of an insert operation:

CREATE PROCEDURE [dbo].[UpsertBinaryItemEx]

(

    @BinaryItemId uniqueidentifier,

    @BinaryData varbinary(max)

)

AS

BEGIN

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

    DECLARE @itemIdFound binary(16) = NULL;

 

    DECLARE @maxSegmentLength int = 6000;

    DECLARE @binaryDataLength int = DATALENGTH(@BinaryData);

    DECLARE @maxSegment int = CEILING(@binaryDataLength / 1.0 / @maxSegmentLength);

    DECLARE @segmentValue varbinary(6000);

 

    IF (@binaryDataLength <= 6000)

    BEGIN

        SET @segmentValue = SUBSTRING(@BinaryData, 1, @binaryDataLength);

        EXEC [dbo].[UpsertBinaryItem]@BinaryItemId, @segmentValue

    END

    ELSE

    BEGIN

 

        -- Delete any existing binary segments

        DELETE FROM [dbo].[BinaryItemEx] WITH (SNAPSHOT)

        WHERE [BinaryItemId] = @binaryId;

 

        -- Update the header

        UPDATE [dbo].[BinaryItem] WITH (SNAPSHOT)

        SET @itemIdFound = [BinaryItemId],

            [BinaryData] = NULL,

            [BinaryExParts] = @maxSegment

        WHERE [BinaryItemId] = @binaryId;

 

        -- If no header found then create one

        IF (@itemIdFound IS NULL)

        BEGIN

            INSERT INTO [dbo].[BinaryItem] WITH (SNAPSHOT)

                ([BinaryItemId], [ItemCreated], [BinaryData], [BinaryExParts])

                VALUES(@binaryId, @now, NULL, @maxSegment);

        END

 

        -- Insert the binary segments

        DECLARE @part int = 1;

        WHILE (@part <= @maxSegment)

        BEGIN

            SET @segmentValue = SUBSTRING(@BinaryData, ((@part -1) * @maxSegmentLength + 1), @maxSegmentLength);

 

            INSERT [dbo].[BinaryItemEx] WITH (SNAPSHOT)

                ([BinaryItemId], [BinaryPart], [BinaryDataEx])

                VALUES (@binaryId, @part, @segmentValue);

 

            SET @part += 1;

        END                 

    END

END

As in the case of the insert operations, it is only for when the data is less than the selected segment size that the Stored Procedures can be natively compiled. As such, ensuring that a large portion of the data fits into this selected segment size will give the best performance gains.    

Access Procedures

The most efficient way to access the binary data is just to return binary segments, through a SELECT statement, and have the application layer combine them back into a single element. The process to do this has to look at the primary element for the specified number of elements and return the corresponding data:

CREATE PROCEDURE [dbo].[GetBinaryDataParts]

(

    @BinaryItemId uniqueidentifier

)

WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER

AS

BEGIN ATOMIC WITH

(

    TRANSACTION ISOLATION LEVEL = SNAPSHOT,

    LANGUAGE = N'English'

)

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

    DECLARE @binaryData varbinary(6000);

    DECLARE @binaryParts int = 0;

 

    -- Get the binary data and see if parts exists

    SELECT

        @binaryData = [BinaryData],

        @binaryParts = [BinaryExParts]

FROM [dbo].[BinaryItem]

WHERE [BinaryItemId] = @binaryId;

 

    -- If parts exists then return these otherwise just return the single element

    IF (@binaryData IS NULL AND @binaryParts > 0)

    BEGIN

        SELECT CAST([BinaryItemId] AS uniqueidentifier) AS [BinaryItemId], [BinaryPart], [BinaryDataEx] AS [BinaryData]

        FROM [dbo].[BinaryItemEx]

        WHERE [BinaryItemId] = @binaryId

        ORDER BY [BinaryItemId], [BinaryPart];

    END

    ELSE

    BEGIN

        SELECT @BinaryItemId AS [BinaryItemId], 0 AS [BinaryPart], @binaryData AS [BinaryData];

    END

END

The beauty of this approach is that the called Stored Procedure can be natively compiled.

Also, the application calling this Stored Procedure then has the option of better handling the necessary memory allocations for concatenating the LOB segments into a single element.

If a SELECT process is not optimal, as in the case that the application requires the data to be concatenated, one can use a Stored Procedure with an OUTPUT field.

In the advent that the application knows that data size is less than the specified segment size there is the option of directly calling a natively compiled Stored Procedure:

CREATE PROCEDURE [dbo].[GetBinaryData]

(

    @BinaryItemId uniqueidentifier,

    @BinaryData varbinary(6000) OUTPUT

)

WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER

AS

BEGIN ATOMIC WITH

(

    TRANSACTION ISOLATION LEVEL = SNAPSHOT,

    LANGUAGE = N'English'

)

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

 

    -- Get the binary data for output

    SELECT

        @BinaryData = [BinaryData]

    FROM [dbo].[BinaryItem]

    WHERE [BinaryItemId] = @binaryId;

END

In the case that the data size is unknown, or it is larger than the specified segment size, the Stored Procedure has to perform the concatenation into a max datatype, and thus cannot be natively compiled:

CREATE PROCEDURE [dbo].[GetBinaryDataEx]

(

    @BinaryItemId uniqueidentifier,

    @BinaryData varbinary(max) OUTPUT

)

AS

BEGIN

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

 

    DECLARE @maxSegmentLength int = 6000;

    DECLARE @blobSegment varbinary(6000);

    DECLARE @binaryParts int;

    DECLARE @count int = 0;

 

    SET @BinaryData = NULL;

 

    -- Get the binary primary and see if parts exists

    SELECT

        @BinaryData = [BinaryData],

        @binaryParts = [BinaryExParts]

    FROM [dbo].[BinaryItem]

    WHERE [BinaryItemId] = @binaryId;

 

    -- If parts exist then build the binary element

    IF (@BinaryData IS NULL AND @binaryParts > 0)

    BEGIN

        WHILE (@count <= @binaryParts)

        BEGIN

            SELECT @blobSegment = [BinaryDataEx]

            FROM [dbo].[BinaryItemEx] WITH (SNAPSHOT)

            WHERE [BinaryItemId] = @binaryId AND [BinaryPart] = @count;

      

            IF (@BinaryData IS NULL)

                SET @BinaryData = CAST(@blobSegment AS varbinary(max));

            ELSE

                SET @BinaryData = @BinaryData + CAST(@blobSegment AS varbinary(max));

 

            SET @count += 1;

        END

    END

END

In this case the LOB concatenation is performing simply using the plus operator. Of course this concatenation is only necessary if extended binary segments are found.

In this scenario one does have the option of providing internal natively compiled procedures to perform the UPDATE and SELECT operations, which will provide some further performance improvements.

From the performance perspective it is probably better to take the original SELECT approach and have the application layer handle the necessary data concatenation; as it can better handle the memory allocations necessary to perform this operation.

Alternative Considerations

Archiving Considerations

Whenever persisting data it is always important to consider how the data is going to be archived. In this case one could take the approach of delete data elements that are older than a fixed period of time; say 24 hours.

This approach would be rather blunt. An alternative would be to update the primary element with a last accessed time. This would allow for elements to be removed based on a defined life expectancy. In this scenario the [GetBinaryDataParts] procedure would be as follows:

CREATE PROCEDURE [dbo].[GetBinaryDataParts]

(

    @BinaryItemId uniqueidentifier

)

WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER

AS

BEGIN ATOMIC WITH

(

    TRANSACTION ISOLATION LEVEL = SNAPSHOT,

    LANGUAGE = N'English'

)

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

    DECLARE @binaryData varbinary(6000);

    DECLARE @binaryParts int = 0;

 

    -- Get the binary data and see if parts exists

    UPDATE [dbo].[BinaryItem]

    SET [ItemAccessed] = @now,

        @binaryData = [BinaryData],

        @binaryParts = [BinaryExParts]

    WHERE [BinaryItemId] = @binaryId;

 

    -- If parts exists then return these otherwise just return the single element

    IF (@binaryData IS NULL AND @binaryParts > 0)

    BEGIN

        SELECT CAST([BinaryItemId] AS uniqueidentifier) AS [BinaryItemId], [BinaryPart], [BinaryDataEx] AS [BinaryData]

        FROM [dbo].[BinaryItemEx]

        WHERE [BinaryItemId] = @binaryId

        ORDER BY [BinaryItemId], [BinaryPart];

    END

    ELSE

    BEGIN

        SELECT @BinaryItemId AS [BinaryItemId], 0 AS [BinaryPart], @binaryData AS [BinaryData];

    END

END

The SELECT operation has been replaced with an UPDATE that modifies a [LastAccessed] time.

Similarly the non-natively compiled procedures that perform the segment concatenation would be as follows:

CREATE PROCEDURE [dbo].[GetBinaryDataEx]

(

    @BinaryItemId uniqueidentifier,

    @BinaryData varbinary(max) OUTPUT

)

AS

BEGIN

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

 

    DECLARE @maxSegmentLength int = 6000;

    DECLARE @blobSegment varbinary(6000);

    DECLARE @binaryParts int;

    DECLARE @count int = 0;

 

    SET @BinaryData = NULL;

 

    -- Get the binary primary and see if parts exists

    UPDATE [dbo].[BinaryItem] WITH (SNAPSHOT)

    SET [ItemAccessed] = @now,

        @BinaryData = [BinaryData],

        @binaryParts = [BinaryExParts]

    WHERE [BinaryItemId] = @binaryId;

 

    -- If parts exist then build the binary element

    IF (@BinaryData IS NULL AND @binaryParts > 0)

    BEGIN

        WHILE (@count <= @binaryParts)

        BEGIN

            SELECT @blobSegment = [BinaryDataEx]

            FROM [dbo].[BinaryItemEx] WITH (SNAPSHOT)

            WHERE [BinaryItemId] = @binaryId AND [BinaryPart] = @count;

      

            IF (@BinaryData IS NULL)

                SET @BinaryData = CAST(@blobSegment AS varbinary(max));

            ELSE

                SET @BinaryData = @BinaryData + CAST(@blobSegment AS varbinary(max));

 

            SET @count += 1;

        END

    END

END

Of course the Insert and Upsert operations would also need to appropriately set the last accessed time.

To archive the data the usual approach with file based tables is to perform the deletion in chunks. Namely, the first set of X records are identified and deleted within a transaction. This process is then repeated within a loop until no more records to delete are found:

CREATE TYPE [dbo].[BinaryKeys]

AS TABLE

(

    [BinaryItemId] binary(16) NOT NULL PRIMARY KEY NONCLUSTERED

HASH WITH (BUCKET_COUNT=100000)

) WITH (MEMORY_OPTIMIZED=ON);

GO

 

CREATE PROCEDURE [dbo].[PurgeArchivedEntriesChunk]

(

@ArchiveHours int = 24

)

AS

BEGIN

BEGIN TRY

 

SET NOCOUNT ON;

 

-- Define the Archive window

DECLARE @archiveTime datetime = DATEADD(HOUR, -@ArchiveHours, GETUTCDATE());

 

-- Define a table to hold the items to purge within each loop

DECLARE @entryPurge [dbo].[BinaryKeys];

 

-- Define the binary reference from which to start the purging

DECLARE @minReference binary(16);

SELECT @minReference = (

SELECT TOP(1) [BinaryItemId]

            FROM [dbo].[BinaryItem]

WHERE [ItemCreated] < @archiveTime

ORDER BY [BinaryItemId] ASC

);

 

DECLARE @continue int;

DECLARE @purgeCount int;

 

SET @continue = 1;

SET @purgeCount = 1000;

 

WHILE (@continue = 1)

BEGIN

 

-- define the entries to be purged in this cycle

DELETE FROM @entryPurge;

 

INSERT INTO @entryPurge

SELECT TOP(@purgeCount) [BinaryItemId]

FROM [dbo].[BinaryItem]

WHERE [ItemCreated] < @archiveTime

AND [BinaryItemId] >= @minReference

ORDER BY [BinaryItemId] ASC;

 

-- If subset selected make the last loop

IF (@@ROWCOUNT < @purgeCount)

BEGIN

SET @continue = 0;

END

 

-- Run in a transaction to ensure consistency

BEGIN TRANSACTION;

 

-- Delete the BinaryItem and Ex tables             

            DELETE FROM [dbo].[BinaryItemEx]

FROM @entryPurge purge

INNER JOIN [dbo].[BinaryItemEx] items WITH (SNAPSHOT)

ON items.[BinaryItemId] = purge.[BinaryItemId];

                  

DELETE FROM [dbo].[BinaryItem]

FROM @entryPurge purge

INNER JOIN [dbo].[BinaryItem] items WITH (SNAPSHOT)

ON items.[BinaryItemId] = purge.[BinaryItemId];

 

COMMIT TRANSACTION;

 

-- Define the next lowest PDR to be processed

SELECT @minReference = (

SELECT TOP(1) [BinaryItemId] FROM @entryPurge

ORDER BY [BinaryItemId] DESC

);

 

END

END TRY

BEGIN CATCH

IF XACT_STATE() <> 0 ROLLBACK TRANSACTION;

 

THROW;

END CATCH;    

END

 

In this instance one does have the option of using an In Memory Table Type for saving the necessary deletion keys.

When operation on file based tables one would usually ensure rowlocks are issued and the deadlock priority is set to low. All this is designed to ensure that only a small section of the records within the table are locked to ensure that the table can still handle a high transaction rate; without the purging inadvertently blocking critical active transactions.

However in the case of In Memory table, as one does not have to worry about locking, one again has the option of taking the simple deletion approach:

CREATE PROCEDURE [dbo].[PurgeArchivedEntries]

(

    @ArchiveHours int = 24

)

AS

BEGIN

    BEGIN TRY

 

        SET NOCOUNT ON;

 

        BEGIN TRANSACTION;

 

        DECLARE @archiveTime datetime = DATEADD(HOUR, -@ArchiveHours, GETUTCDATE());

 

        DELETE [dbo].[BinaryItemEx]

        FROM [dbo].[BinaryItemEx] itemex WITH (SNAPSHOT)

            INNER JOIN [dbo].[BinaryItem] item WITH (SNAPSHOT)

            ON itemex.[BinaryItemId] = item.[BinaryItemId]

        WHERE [ItemCreated] < @archiveTime;

 

        DELETE [dbo].[BinaryItem] WITH (SNAPSHOT)

        WHERE [ItemCreated] < @archiveTime;

 

        COMMIT TRANSACTION;

END TRY

BEGIN CATCH

IF XACT_STATE() <> 0 ROLLBACK TRANSACTION;

 

THROW;

END CATCH;

END

 

Unfortunately one cannot yet natively compile this procedure as using the FROM clause in a DELETE statement is not supported.

Error Checking and Retries

The Stored Procedures outlined above do not take into consideration error checking and retry logic. The approach to either can be that it is the responsibility of the calling application or of the actual Stored Procedures.

The associated scripts included with this document include a version of the Stored Procedures that perform both error checking and retry logic. As an example here is the [UpsertBinaryItem] Stored Procedure:

CREATE PROCEDURE [dbo].[UpsertBinaryItem]

(

@BinaryItemId uniqueidentifier,

@BinaryData varbinary(6000)

)

WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER

AS

BEGIN ATOMIC WITH

(

    TRANSACTION ISOLATION LEVEL = SNAPSHOT,

    LANGUAGE = N'English'

)

    DECLARE @retry int = 3;

 

    DECLARE @now datetime = GETUTCDATE();

    DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));

    DECLARE @itemIdFound binary(16) = NULL;

 

    WHILE (@retry > 0)

    BEGIN

        BEGIN TRY

            -- Delete any extended data

            DELETE FROM [dbo].[BinaryItemEx]

            WHERE [BinaryItemId] = @binaryId;

 

            -- First try an update as the primary may already exist

            UPDATE [dbo].[BinaryItem]

            SET @itemIdFound = [BinaryItemId],

                [ItemAccessed] = @now,

                [BinaryData] = @BinaryData,

                [BinaryExParts] = 0

            WHERE [BinaryItemId] = @binaryId;

 

            -- If no primary found the perform the insert

            IF (@itemIdFound IS NULL)

            BEGIN

                INSERT INTO [dbo].[BinaryItem] ([BinaryItemId], [ItemCreated], [ItemAccessed], [BinaryData], [BinaryExParts])

                VALUES(@binaryId, @now, @now, @BinaryData, 0);

            END

 

            SET @retry = 0;

        END TRY

        BEGIN CATCH

            SET @retry -= 1;

 

            IF (@retry <= 0 OR (error_number() <> 41302 AND error_number() <> 41305 AND error_number() <> 41325 AND error_number() <> 41301))

            BEGIN

                -- insert custom error handling for other error conditions here

                ;THROW

            END

        END CATCH

    END

END

This code covers the common errors associated with using an optimistic concurrency control mechanism, as used by the In Memory OLTP engine.

In this case I have set the retry count to be 3, but this can easily be configured to suit your application and environment requirements.

Durability

In the samples above I have defined the tables to have a durability option of SCHEMA_AND_DATA. In the case of transient session data one could again improve performance using just the SCHEMA_ONLY option.

File Based Extension

In addition to taking the approach outlined above there is the option that if the data size is larger than the specified segment size a normal file based table can be used for saving the LOB data. In this case the table definitions would be more like the following:

CREATE TABLE [dbo].[BinaryItem]

(

    [BinaryItemId] binary(16) NOT NULL        

        CONSTRAINT [PK_BinaryItem] PRIMARY KEY NONCLUSTERED HASH

        WITH (BUCKET_COUNT = 1000000),

    [ItemCreated] datetime NOT NULL,

    [BinaryData] varbinary(6000) NULL,

    [IsExtended] int NOT NULL

)

WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);

GO

 

CREATE TABLE [dbo].[BinaryItemEx]

(

    [BinaryItemId] binary(16) NOT NULL,

    [TotalLength] int NOT NULL,

    [BinaryDataEx] varbinary(max) NOT NULL,

    CONSTRAINT [PK_BinaryItemEx] PRIMARY KEY CLUSTERED

        ([BinaryItemId])

)

ON [DEFAULT] TEXTIMAGE_ON [MemBinaryStore_Binary];

In this case the number of segments is not required, but rather just an indicator of whether extended data exists.

One could even take the approach of defining the database such that the LOB data is saved into a dedicated performant filegroup:

CREATE DATABASE [MemBinaryStore]

    CONTAINMENT = NONE        

    ON PRIMARY (NAME = [MemBinaryStore_Primary], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Prim.mdf', SIZE = 10 MB, FILEGROWTH = 10 MB),

    FILEGROUP [MemBinaryStore_Mod] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = [MemBinaryStore_Mod], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore.dir'),

    FILEGROUP [MemBinaryStore_Data] (NAME = [MemBinaryStore_Data1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Data1.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB),

        (NAME = [MemBinaryStore_Data2], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Data2.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB),

    FILEGROUP [MemBinaryStore_Binary] (NAME = [MemBinaryStore_Binary1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Binary1.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB),

        (NAME = [MemBinaryStore_Binary2], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Binary2.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB)

    LOG ON (NAME = [MemBinaryStore_Log1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Log1.ldf', SIZE = 5 MB, FILEGROWTH = 5 MB)

    COLLATE Latin1_General_100_CI_AS;

In this case the extended processing Stored Procedures would just have to be modified to insert a single row of LOB data and return this single row.

Conclusion

Although memory optimized tables do not have off-row or large object (LOB) storage, and the row size is limited to 8060 bytes, this does not mean it they cannot be used for LOB storage. This can be achieved easily in one of two ways:

  • Split the LOB values into multiple rows
  • Store the LOB values in a regular non-memory optimized (file based) table

Splitting the LOB data into multiple rows affords one the option of obtaining the possible performance gains of switching to memory optimized tables.

If you are in the scenario where a high percent of the LOB data fits into 1 or few segments the approach outlined above will provide all the possible advantages of memory optimized tables whilst providing a safety value for possible large data.

If in this scenario the application can also be modified to stitch back together the LOB segments then the process can also rely mostly on natively compiled procedures; again providing a performance boost.

The outlined may not be suitable for storing very large quantities of LOB data which are accessed infrequently; say large document or images. In this instance a standard file based table approach may be more suitable.

Written by Carl Nolan