FILESTREAM Overview
Much of the data that is created every day is unstructured data, such as text documents, images, and videos. This unstructured data is often stored outside the database, separate from its structured data. This separation can cause data management complexities. Or, if the data is associated with structured storage, the file streaming capabilities and performance can be limited.
FILESTREAM integrates the SQL Server Database Engine with an NTFS file system by storing varbinary(max) binary large object (BLOB) data as files on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.
Note
FILESTREAM data is not encrypted even when transparent data encryption is enabled.
For a walkthrough that shows how to use FILESTREAM, see Getting Started with FILESTREAM Storage.
When to Use FILESTREAM
In SQL Server, BLOBs can be standard varbinary(max) data that stores the data in tables, or FILESTREAM varbinary(max) objects that store the data in the file system. The size and use of the data determines whether you should use database storage or file system storage. If the following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance.
FILESTREAM Storage
FILESTREAM storage is implemented as a varbinary(max) column in which the data is stored as BLOBs in the file system. The sizes of the BLOBs are limited only by the volume size of the file system. The standard varbinary(max) limitation of 2-GB file sizes does not apply to BLOBs that are stored in the file system.
To specify that a column should store data on the file system, specify the FILESTREAM attribute on a varbinary(max) column. This causes the Database Engine to store all data for that column on the file system, but not in the database file.
FILESTREAM data must be stored in FILESTREAM filegroups. A FILESTREAM filegroup is a special filegroup that contains file system directories instead of the files themselves. These file system directories are called data containers. Data containers are the interface between Database Engine storage and file system storage.
When you use FILESTREAM storage, consider the following:
When a table contains a FILESTREAM column, each row must have a nonnull unique row ID.
FILESTREAM data containers cannot be nested.
When you are using failover clustering, the FILESTREAM filegroups must be on shared disk resources.
FILESTREAM filegroups can be on compressed volumes.
Integrated Security
In SQL Server, FILESTREAM data is secured just like other data is secured: by granting permissions at the table or column levels. If a user has permission to the FILESTREAM column in a table, the user can open the associated files.
Note
Encryption is not supported on FILESTREAM data.
Only the account under which the SQL Server service account runs is granted NTFS permissions to the FILESTREAM container. We recommend that no other account be granted permissions on the data container.
Integrated Management
Because FILESTREAM is implemented as a varbinary(max) column and integrated directly into the Database Engine, most SQL Server management tools and functions work without modification for FILESTREAM data. For example, you can use all backup and recovery models with FILESTREAM data, and the FILESTREAM data is backed up with the structured data in the database. If you do not want to back up FILESTREAM data with relational data, you can use a partial backup to exclude FILESTREAM filegroups.
Dual Programming Model to Access BLOB Data
After you store data in a FILESTREAM column, you can access the files by using Transact-SQL transactions or by using Win32 APIs.
Transact-SQL Access
By using Transact-SQL, you can insert, update, and delete FILESTREAM data:
You can use an insert operation to prepopulate a FILESTREAM field with a null value, empty value, or relatively short inline data. However, a large amount of data is more efficiently streamed into a file that uses Win32 interfaces.
When you update a FILESTREAM field, you modify the underlying BLOB data in the file system. When a FILESTREAM field is set to NULL, the BLOB data associated with the field is deleted. You cannot use a Transact-SQL chunked update, implemented as UPDATE**.**Write(), to perform partial updates to the data.
When you delete a row or delete or truncate a table that contains FILESTREAM data, you delete the underlying BLOB data in the file system.
File System Streaming Access
The Win32 streaming support works in the context of a SQL Server transaction. Within a transaction, you can use FILESTREAM functions to obtain a logical UNC file system path of a file. You then use the OpenSqlFilestream API to obtain a file handle. This handle can then be used by Win32 file streaming interfaces, such as ReadFile() and WriteFile(), to access and update the file by way of the file system.
Because file operations are transactional, you cannot delete or rename FILESTREAM files through the file system.
Statement Model
The FILESTREAM file system access models a Transact-SQL statement by using file open and close. The statement starts when a file handle is opened and ends when the handle is closed. For example, when a write handle is closed, any possible AFTER trigger that is registered on the table fires as if an UPDATE statement is completed.
Storage Namespace
In FILESTREAM, the Database Engine controls the BLOB physical file system namespace. A new intrinsic function, PathName, provides the logical UNC path of the BLOB that corresponds to each FILESTREAM cell in the table. The application uses this logical path to obtain the Win32 handle and operate on the BLOB data by using regular Win32 file system interfaces. The function returns NULL if the value of the FILESTREAM column is NULL.
Transacted File System Access
A new intrinsic function, GET_FILESTREAM_TRANSACTION_CONTEXT(), provides the token that represents the current transaction that the session is associated with. The transaction must have been started and not yet aborted or committed. By obtaining a token, the application binds the FILESTREAM file system streaming operations with a started transaction. The function returns NULL in case of no explicitly started transaction.
All file handles must be closed before the transaction commits or aborts. If a handle is left open beyond the transaction scope, additional reads against the handle will cause a failure; additional writes against the handle will succeed, but the actual data will not be written to disk. Similarly, if the database or instance of the Database Engine shuts down, all open handles are invalidated.
Transactional Durability
With FILESTREAM, upon transaction commit, the Database Engine ensures transaction durability for FILESTREAM BLOB data that is modified from the file system streaming access.
Isolation Semantics
The isolation semantics are governed by Database Engine transaction isolation levels. Only the read-committed isolation level is supported for file system access. Repeatable read operations, and also serializable and snapshot isolations, are supported when the FILESTREAM data is accessed by using Transact-SQL. Dirty read is not supported.
The file system access open operations do not wait for any locks. Instead, the open operations fail immediately if they cannot access the data because of transaction isolation. The streaming API calls fail with ERROR_SHARING_VIOLATION if the open operation cannot continue because of isolation violation.
Partial Updates
To allow for partial updates to be made, the application can issue a device FS control (FSCTL_SQL_FILESTREAM_FETCH_OLD_CONTENT) to fetch the old content into the file that the opened handle references. This will trigger a server-side old content copy. For better application performance and to avoid running into potential time-outs when you are working with very large files, we recommend that you use asynchronous I/O.
If the FSCTL is issued after the handle has been written to, the last write operation will persist, and prior writes that were made to the handle are lost.
Write-Through from Remote Clients
Remote file system access to FILESTREAM data is enabled over the Server Message Block (SMB) protocol. If the client is remote, no write operations are cached by the client side. The write operations will always be sent to the server. The data can be cached on the server side. We recommend that applications that are running on remote clients consolidate small write operations to make fewer write operations using larger data size.
Creating memory mapped views (memory mapped I/O) by using a FILESTREAM handle is not supported. If memory mapping is used for FILESTREAM data, the Database Engine cannot guarantee consistency and durability of the data or the integrity of the database.