NTFS And 4K Disks

Since the 1960’s, hard disks have always used a block size of 512 bytes for the default read/write block size.  Recently drive manufacturers have been moving toward a larger block size to improve performance and reliability.  Currently there are two types of disks available with a 4KB sector size: 512 byte emulated, and 4KB block sized disks.

 

Disks with 4KB block size and 512 bytes per sector emulation

For performance reasons, drive manufacturers have already produced disks with 4KB native block size, which use firmware to emulate 512 bytes per sector.  Because of the emulated 512 byte sector size, the file system and most disk utilities will be blissfully unaware that they are running on a 4KB disk.  As a result, the on-disk structures will be completely unaffected by the underlying 4KB block size.  This allows for improved performance without altering the bytes per sector presented to the file system.  These disks are referred to as 512e (pronounced “five-twelve-eee”) disks.

 

Disks with 4KB block size without emulation

When the logical bytes per sector value is extended to 4KB without emulation, the actual file system will have to adjust to this new environment.  Actually, NTFS is already capable of functioning in this environment provided that no attached FS filter drivers make false assumptions about sector size.  Below are the highlights of what you should expect to see on a disk with a 4KB logical sector size.

 

1. It will not be possible to format with a cluster size that is smaller than the 4KB native block size.  This is because cluster size is defined as a multiple of sector size.  This multiple will always be expressed as 2n .

2. File records will assume the size of the logical block size of 4KB, rather than the previous size of 1KB.  This actually improves scalability to some degree, but the down-side is that each NTFS file record will require 4KB or more in the MFT.

3. Sparse and compressed files will continue to have 16 clusters per compression unit.

4. Since file records are 4 times their normal size, it will be possible to encode more mapping pairs per file record.  As a result, larger files can be compressed with NTFS compression without running into file system limitations.

5. Since the smallest allowable cluster size is 4KB, NTFS compression will only work on volumes with a 4KB cluster size.

6. Bytes per index record will be unaffected by the 4K block size since all index records are 4KB in size.  The on-disk folder directory structures will be completely unaffected by the new block size, but a performance increase may be seen while accessing folder structure metadata.

7. The BIOS Parameter Block (BPB) will continue to have the same format as before, but the only positive value for clusters per File Record Segment (FRS) will be 1.  In the case where clusters per FRS is 1, the FRS byte size is computed by the following equation:

image

 

NTFS BIOS Parameter Block Information

 

  BytesPerSector      :        4096

  Sectors Per Cluster :           1

  ReservedSectors     :           0

  Fats                :           0

  RootEntries         :           0

  Small Sectors       :           0 ( 0 MB )

  Media Type          :         248 ( 0xf8 )

  SectorsPerFat       :           0

  SectorsPerTrack     :          63

  Heads               :         255

  Hidden Sectors      :          64

  Large Sectors       :           0 ( 0 MB )

 

  ClustersPerFRS      :           1

  Clust/IndxAllocBuf  :           1

  NumberSectors       :                50431 ( 196.996 MB )

  MftStartLcn         :                16810

  Mft2StartLcn        :                    2

  SerialNumber        :  8406742282501311868

  Checksum            :                    0 (0x0)

 

If the cluster size is larger than the FRS size, then ClustersPerFrs will be a negative number as shown in the example below (0xf4 is -12 decimal).  In this case, the record size is computed with the equation:

image

 

In short, NTFS will always force a 4096 byte cluster size on disk with a 4KB sector size regardless of the cluster size.

 

NTFS BIOS Parameter Block Information

 

  BytesPerSector      :        4096

  Sectors Per Cluster :           4

  ReservedSectors     :           0

  Fats                :           0

  RootEntries         :           0

  Small Sectors       :           0 ( 0 MB )

  Media Type          :         248 ( 0xf8 )

  SectorsPerFat       :           0

  SectorsPerTrack     :          63

  Heads               :         255

  Hidden Sectors      :          64

  Large Sectors       :           0 ( 0 MB )

 

  ClustersPerFRS      :         f4

  Clust/IndxAllocBuf  :         f4

  NumberSectors       :                50431 ( 196.996 MB )

  MftStartLcn         :                 4202

  Mft2StartLcn        :                    1

  SerialNumber        :  7270585088516976380

  Checksum            :                    0 (0x0)

 

8. Aside from the 4KB file record size, there are a few other things to know about 4KB drives.  The code for implementing update sequence arrays (USA’s) has always worked on a 512 byte assumed sector size and it will continue to do so.  Since file records are 4 times their normal size, the update sequence arrays for file records now contain 9 entries instead of 3.  One array entry is required for the sequence number (blue) and eight array entries for the trailing bytes (red).  The original purpose of USA is to allow NTFS to detect torn writes.  Since the file record size is now equal to the block size, the hardware is capable of writing the entire file record at once, rather than in two parts.

 

    _MULTI_SECTOR_HEADER MultiSectorHeader {

            ULONG      Signature             : 0x454c4946 "FILE"

            USHORT     SequenceArrayOffset   : 0x0030

            USHORT     SequenceArraySize     : 0x0009

    }

 

 

0x0000   46 49 4c 45 30 00 09 00-dd 24 10 00 00 00 00 00   FILE0...Ý$.....

0x0010   01 00 01 00 48 00 01 00-b0 01 00 00 00 10 00 00   ....H...°......

0x0020   00 00 00 00 00 00 00 00-06 00 00 00 00 00 00 00   ................

0x0030   02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x0040   00 00 00 00 00 00 00 00-10 00 00 00 60 00 00 00   ...........`...

0x0050   00 00 18 00 00 00 00 00-48 00 00 00 18 00 00 00   .......H......

0x0060   f8 f1 5b 89 36 d2 cb 01-f8 f1 5b 89 36 d2 cb 01   øñ[‰6ÒË.øñ[‰6ÒË.

0x0070   f8 f1 5b 89 36 d2 cb 01-f8 f1 5b 89 36 d2 cb 01   øñ[‰6ÒË.øñ[‰6ÒË.

0x0080   06 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x0090   00 00 00 00 00 01 00 00-00 00 00 00 00 00 00 00   ................

0x00a0   00 00 00 00 00 00 00 00-30 00 00 00 68 00 00 00   ........0...h...

0x00b0   00 00 18 00 00 00 03 00-4a 00 00 00 18 00 01 00   .......J......

0x00c0   05 00 00 00 00 00 05 00-f8 f1 5b 89 36 d2 cb 01   ........øñ[‰6ÒË.

0x00d0   f8 f1 5b 89 36 d2 cb 01-f8 f1 5b 89 36 d2 cb 01   øñ[‰6ÒË.øñ[‰6ÒË.

0x00e0   f8 f1 5b 89 36 d2 cb 01-00 00 01 00 00 00 00 00   øñ[‰6ÒË.........

0x00f0   00 00 01 00 00 00 00 00-06 00 00 00 00 00 00 00   ................

0x0100   04 03 24 00 4d 00 46 00-54 00 00 00 00 00 00 00   ..$.M.F.T.......

0x0110   80 00 00 00 48 00 00 00-01 00 40 00 00 00 01 00   €...H.....@.....

0x0120   00 00 00 00 00 00 00 00-ff 00 00 00 00 00 00 00   ........ÿ.......

0x0130   40 00 00 00 00 00 00 00-00 00 10 00 00 00 00 00   @..............

0x0140   00 00 10 00 00 00 00 00-00 00 10 00 00 00 00 00   ..............

0x0150   22 00 01 aa 41 00 ff ff-b0 00 00 00 50 00 00 00   "..ªA.ÿÿ°...P...

0x0160   01 00 40 00 00 00 05 00-00 00 00 00 00 00 00 00   ..@.............

0x0170   01 00 00 00 00 00 00 00-40 00 00 00 00 00 00 00   ........@.......

0x0180   00 20 00 00 00 00 00 00-08 10 00 00 00 00 00 00   . .............

0x0190   08 10 00 00 00 00 00 00-21 01 a9 41 21 01 fd fd   .......!.©A!.ýý

0x01a0   00 69 b4 05 80 fa ff ff-ff ff ff ff 00 00 00 00   .i´.€úÿÿÿÿÿÿ....

0x01b0   00 00 10 00 00 00 00 00-22 00 01 aa 41 00 ff ff   ......."..ªA.ÿÿ

0x01c0   b0 00 00 00 50 00 00 00-01 00 40 00 00 00 05 00   °...P.....@.....

0x01d0   00 00 00 00 00 00 00 00-01 00 00 00 00 00 00 00   ................

0x01e0   40 00 00 00 00 00 00 00-00 20 00 00 00 00 00 00   @........ ......

0x01f0   08 10 00 00 00 00 00 00-08 10 00 00 00 00 02 00   ..............

.

.

.

0x03c0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x03d0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x03e0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x03f0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 02 00   ................

.

.

.

0x05d0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x05e0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x05f0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 02 00   ................

.

.

.

0x07d0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x07e0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x07f0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 02 00   ................

.

.

.

0x09d0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x09e0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x09f0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 02 00   ................

.

.

.

0x0bd0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x0be0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x0bf0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 02 00   ................

.

.

.

0x0dd0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x0de0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x0df0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 02 00   ................

.

.

.

0x0fd0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x0fe0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00   ................

0x0ff0   00 00 00 00 00 00 00 00-00 00 00 00 00 00 02 00   ................

 

I don’t actually own a 4KB disk, but I was able to give you this preview thanks to a nifty tool called VStorControl.  Vstor is a tool which allows you to create virtualized SCSI disks with arbitrary block sizes and is available for download with the Windows 7 SDK.

 

That’s all for now,

Dennis Middleton “The NTFS Doctor”

Comments

  • Anonymous
    July 05, 2011
    Thanks for an informative post. 1 question: How does one print out the "BIOS Parameter Block" as you have done ? Is that done in Windbg ?   -Alex [The tool used to dump the BPB is Secinspect.exe.  It is available from http://download.microsoft.com]

  • Anonymous
    July 09, 2012
    "One byte is required for the sequence number (blue)..." Please, correct this. Maybe you meant "One array entry" Thanks for the very useful post. [Good catch, we'll get this fixed. Thank you.]

  • Anonymous
    August 09, 2013
    The comment has been removed

  • Anonymous
    November 12, 2014
    The main problem I see is that it takes 4K of FRS data for each file on the volume. If you have 5 million files on your NTFS volume, you have to read in about 20 GB from disk just to look at all the files in the MFT. If you want to keep all that data in memory in order to perform lots of fast searches for files, you need 20 GB of RAM. What a waste of space and how slow is that! [It is sometimes necessary to choose hardware that is appropriate for the intended usage pattern.  In your example the best solution may be to use a disk with a smaller block size.]

  • Anonymous
    February 02, 2016
    The problem is getting worse all the time. It is now (2016) very economical to buy a computer that has a 4 TB disk drive and 16 GB of RAM. The disk drive can easily hold 10 million files on it (assuming an average file size of 400 KB). But the MFT is now 40 GB in size for such a volume formatted with NTFS. It takes 400 seconds (nearly 7 minutes) to read it from disk (assuming an average transfer speed of 100 MB/second). It also takes 40 GB of RAM just to cache the MFT so you are not paging in in and out of memory constantly to do file searches. This also assumes that you don't need to read in all the directory information that is stored in other structures. [The entire MFT should never be cached in RAM, so memory usage and the time to read the entire table should not be relevant.  NTFS works quite successfully with volumes much larger than 4TB.]