次の方法で共有


SQL Server Instant File Initialization: SetFileValidData (Windows) vs fallocate (Linux)

SQL Server Books Online documents Instant File Initialization and the associated security considerations.   This blog highlights the underlying file system implementations and differences in behavior between Windows and Linux.

SQL Server performs the following API calls when creating or extending (growing) data and log files.

  1. If the file is a log file (LDF) SQL Server always writes a known pattern (other than zeros) to the allocated space.
  2. If the file is a data file (MDF/NDF) SQL Server checks the instant file initialization and trace flag 1805 settings to decide if a pattern should be written to the allocated space.

Trace flag: 1805 (undocumented in SQL Server Books Online): Disables instant file initialization for data files.
Note: The stamping is usually performed in 4MB chunks for optimal performance and alignment with both Windows and Linux file system page and block size alignments.

Windows

Windows (ntfs and refs) file systems have two members key to instant file initialization.

EOF – End Of File Location

VDL – Valid Data Length Location

Empty FileWhen a file is first created the EOF and VDL both point to the beginning of the file.EOF = 0VDL = 0 image
SetEndOfFileSetEnfOfFile extends the file, acquiring space from the I/O device, and adjusting the EOF value.   Notice that the VDL value remains unchanged.EOF = 10GBVDL = 0 image
SetFileValidDataSetFileValidData is used to move the VDL.  Once the VDL is moved all data before the VDL offset if considered to have been written (even if no writes took place) and reads of the space before the VDL may return stale data from the I/O device.  Data after the VDL is considered invalid and zeros are returned for read requests.EOF = 10GBVDL = 1GBNote: Carefully reads the SQL Server Books Online documentation related to security considerations. image
Write beyond the current VDL (WriteFile*) When a write occurs, beyond the VDL offset, Windows moves the VDL to accommodate the write and also writes zeros to any offsets between the previous VDL and the start of the write request.EOF=10GBPrev VDL = 1GBVDL = 5GB image
Instant File Initialization (New File) Instead of incremental changes to the VDL, SQL Server uses the fast allocation capability of SetEndOfFile and then calls SetFileValidData with the same offset.  All data prior the VDL is considered written (valid) by the Windows file system.  Windows does not write zeros nor will SQL Server stamp a pattern (in data files) when Instant File Initialization is enabled.   The internal, SQL Server, database allocation structures track SQL Server, data file allocations and valid data read activities. image
Instant File Initialization (Grow) Growing a file, using Instant File Initialization, performs SetEndOfFile and SetFileValidData with the new offset.  Windows treats the data between the previous offset of new offset as valid. image

 

Linux

Linux support file allocation with the fallocate system call (ABI.)  The Windows API calls are mapped to Linux ABI calls as follows.

  • CreateFile – Uses Linux open
  • SetEndOfFile – Uses Linux fallocate
  • SetFileValidData – Noop on Linux

The main difference between Windows and Linux file systems is the tracking of extents, not a valid data length (VDL.)  On Linux each extent contains a flag indicating if it has been written to the I/O device.

Empty FileWhen the file is first created the EOF = 0 and the covering extent is set to (N) for not written.A read of a not-written extent always returns zeros on Linux. Linux does not engage the I/O device but simply fills the return buffer with zeros for the space tracked by the not-written extent.Hint: Check the documentation of the Linux file system for extent sizes and adjustments.   The default size usually aligns with a memory page size boundary (often 4K) which aligns well with SQL Server 8K pages and 64K extent boundaries for optimal performance. image
SetEndOfFileFile size growth takes place with an fallocate invocation.  Linux acquires space from the I/O device and establishes the EOF as well as the tracking extent meta data, indicating not-written.  The fallocate acquires space just as SetEndOfFile acquires space for a Windows file system, allowing fast creation of large files. The difference is the SetFileValidData. Linux does not provide the ability to set the extent tracking to ‘written’ without an actual write.Reference unimplemented Linux File System request to mark extents as written (security concerns): https://www.spinics.net/lists/linux-ext4/msg31637.htmlPerformance Consideration: If fallocate is not supported for the target file system, SQL Server uses ftruncate.   Contrary to its name, the ftruncate ABI can be used to grow a file but is a thin-provisioned operation (space is not acquired only the metadata updated.)  When ftruncate is required SQLPAL write zeros to the file to acquire the physical space and provide fallocate, zero read behavior to SQLPAL processes. image
WriteWhen the first write takes place the the metadata data for the extent(s) are also updated.  The extent may be split for tracking written and not-written data or the write extended, by the Linux kernel, to write zeros to the space on disk so the entire extent can be marked as written.Note: On most Linux file systems this results in two (2) write requests but may be more depending on the write size and offset alignment.  1 request for the file data and 1 request for the metadata change. image

 

Pay Me Now Or Later Decision

On Windows the SetFileValidData is a single metadata operation.  Once the VDL is established a write (sequential or random) does not require additional metadata updates as the VDL == EOF.  On Linux the write requires the extent update(s) requiring a write for the data and a write for the metadata.

You can create and grow a file as quickly on Linux as you can on Windows.   However, the first write on Linux results in the metadata maintenance.  

  • If database creation speed is a concern use Instant File Initialization and allow the first writes to encounter the additional overhead.  
    Note: The overhead is often hidden by SQL Server as most writes take place with background processes such as Checkpoint or Lazy Writer.  Bulk loading is an exception as write can occur on the active SQL Server session.
  • If the creation speed can be elongated you can enable –T1805 causing the database, data file space to be stamped during creation and growth.   The stamping is optimized in large chunks and becomes the write path encountering the first write data and metadata operations.   Once the location has been written (stamped) the writes no longer require the additional metadata write.

Note: If the file system does not support fallocate the file is stamped with zeros by SQLPAL.   The log file (LDF) always stamps a known pattern on top of the zeros and you can safely retain Instant File Initialization for the data files as the metadata has already been updated when SQLPAL zeroed the space.

Bob Dorr - Principal Software Engineer SQL Server