Jaa


Appending Data Using SQL 2008 Filestream

SQL Server 2008 has a new feature called Filestream, which allows you to save large binary files in the file system, instead of in the database. This is targeted directly at the scenario that many document management and web applications have today where they save some metadata in the database, with a column holding the path to the actual file. However, there is no transactional context between the two; therefore, the data in the database can be changed to no longer point to the actual location, or the file moved without the pointer being updated. Users want to be able to have documents, pictures, video, etc. in the file system with the streaming support the file system provides, but have the metadata be consistent. This is exactly what the Filestream feature is targeted for.

When a large binary files gets ‘updated’ most users/applications will replace the file in its entirety (you do not usually update part of an image through the database). Filestream does not currently support in-place updates. Therefore an update to a column with the filestream attribute is implemented by creating a new zero-byte file, which then has the entire new data value written to it. When the update is committed, the file pointer is then changed to point to the new file, leaving the old file to be deleted at garbage collection time. This happens at a checkpoint for simple recovery, and at a backup or log backup.

Code that updates large value data types (varchar(max), nvarchar(max), and varbinary(max)) may use the .Write(offset) UPDATE clause; however since an in-place update of a column with the Filestream attribute is not supported, the .Write(offset) UPDATE clause will error out. Therefore, even though filestream is only a storage attribute, the update code does not work against both systems, requiring knowledge of underlying storage mechanism.

Many of the update actions taken against large data types are actually appends. And many logging scenarios continuously append to an ever-growing file. Filestream offers an option for this case, to avoid the scenario where the original data is pulled to the client, data appended, and then a new file written with the combined data – leaving the old file to be garbage collected later.

 In order to avoid this round trip, a device FS control (FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT) can be issued to trigger a server-side copy of the old content. This avoids the data moving over the network. This control is not yet documented; note that the DeviceIoControl ControlCode constant that is needed is: 599392. This will be added to the Filestream API documentation at a future date.

Example code:

[DllImport(“kernel32.dll”, SetLastError = true)]

Static extern bool DeviceIoControl(

IntPtr hDevice,

Uint dwIoControlCode,

IntPter lpInBuffer,

Uint ninBufferSize,

[out] IntPtr lpOutBuffer,

Uiny nOutBufferSize,

Ref uint lpBytesReturned,

IntPtr lpOverlapped );

                IntPtr hFile = IntPtr.Zero;

Uint bytesReturned;

                // hFile = mySafeHafle.DangereousGetHandle(); or something similar

               

Bool result = DeviceIoControl ( hFile, 599392, IntPtr.Zero, 0, IntPtr.Zero, 0, ref lpBytesReturned, InPtr.Zero );

As Filestream is minimally logged, if there is no activity other than filestream activity, there is usually not enough activity to trigger a checkpoint, and the old files will not get deleted. In non-simple recovery mode, the log will not increase, but the db will grow until those files get garbage collected. This will occur in a background thread, and can impact the server through this activity.

Therefore, it is a best practice to manually issue that checkpoint if in simple recovery mode; and to maintain an optimal transaction log backup to avoid the file growth.

Other quick notes:

Filestream deletes are significantly faster than blob deletes.

If an application needs to commonly delete large data values, filestream will be more scalable as the metadata is changed, then the file garbage collected asynchronously. For instance, the delete of a 6GB nvarbinary(max) file that took 6 seconds, happens in milliseconds with a filestream delete.

Filestream maximum size is different for SNAC 9 and SNAC 10.

Files using SNAC9 have a maximum size of 2GB. SNAC 10 supports unlimited size.

Note: Altering an existing column without a filestream attribute to having the filestream attribute is not supported.

This means that moving data from a blob storage mechanism to filestream mechanism requires that the data be copied to the new datatype with the filestream attribute.

Comments

  • Anonymous
    May 10, 2009
    MSSQL 2008 FILESTREAM: Dark Side