SQL 2016 - It Just Runs Faster: Automatic Soft NUMA
As hardware continues to expand and evolve SQL Server testing and customer reports have highlighted the need to partition activities for optimal scaling. Partitioning based designs are common ways to localize activities and improve performance and scalability. An example of how SQL Server leverages partitioning is the CMemThread object.
For thread safety various synchronization mechanisms are utilized (spinlock, latch, mutex, semaphore, …) For discussion purposes let's focus on a spinlock with imperial, computer industry testing results. A highly contended spinlock does not scale well beyond 8 CPUs. The CPU is utilized and N-1 spinners are unable to acquire the lock, only one owner is possible.
The SQL Server product team studies and tracks internal structures and partitioned designs. Many of the SQL Server, common structures are designed with various partitioning schemes and rooted around the NUMA layout of the machine.
The core inception of NUMA partitioned based designs evolved in SQL Server 2000 and more so in SQL Server 2005. As the SMP, single node machines, advanced beyond 8 CPUs the scalability issues were uncovered and design changes made to address the issues. To combat the scalability NUMA partitioning was a standardized choice. During SQL Server 2000 and 2005 development 8 CPUs per NUMA node was a high-end system. Hardware advancements have 18 cores in a single NUMA node today and expose SMP like scalability issues within a single NUMA node.
Even today, Soft NUMA can be used to divide a physical node into multiple logical nodes presenting a different layout to the entire SQL Server and adjusting the partitioning to optimize scalability and performance. Microsoft recommends use of Soft NUMA on the newer, large CPU NUMA system deployments to increase performance.
During startup, SQL Server 2016 interrogates the hardware layout and automatically configures Soft NUMA on systems reporting > 8 CPUs per NUMA node. The partitioning triggers various adjustments throughout the database engine improving scalability and performance. The Automatic Soft NUMA logic considers logical CPU ratios, total CPU counts and other factors, attempting to create soft, logical nodes containing 8 or fewer CPUs each.
- SQL Error log: Automatic soft-NUMA was enabled because SQL Server has detected hardware NUMA nodes with greater than 8 logical processors.
- DMV: The softnuma_configuration_desc column in sys.dm_os_sys_info can have one of the three values: OFF / ON / MANUAL
Your mileage may vary but, here is a testing results from the SQL Server 2016 test harness: "With HT aware auto soft-NUMA, we get up-to 30% gain in query performance when DOP is set to the number of physical cores on a socket (12 in this case) using Automatic Soft NUMA."
The automatic, soft NUMA behavior is Hyperthread (HT/logical processor) aware. When determining the optimal node layout the logical CPU information is queried and used to prevent groupings of logical only and physical only nodes which could lead to performance variations across the nodes.
Furthermore, many of the background processes are created within each node. The partitioning and creation of additional nodes scales background processing. For example, each node contains a worker to listen for network activity and performs encryption activities. The additional nodes created with a soft NUMA configuration increases the number of listeners, scaling and network and encryption capabilities.
'It Just Runs Faster' - Apply SQL Server 2016 and SQL Server internally leverages SOFT NUMA partitioning to achieve double digit performance gains.
Nitin Verma - Principal SQL Server Developer
Bob Dorr - Principal SQL Server Escalation Engineer
Comments
- Anonymous
June 17, 2016
The comment has been removed- Anonymous
June 21, 2016
It can help but I don't think you will see as much of a gain in SQL 2012. Other work done on CMemThread (reference -T8048) and such activities combine to make SQL 16 faster.
- Anonymous
- Anonymous
June 20, 2016
The current Intel Xeon E5-2600 v4 and E5-4600 v4 families have up to 22 physical cores per processor. The current Intel Xeon E7-8800 v4 family has up to 24 physical cores per processor. - Anonymous
August 13, 2016
Optimized utilization of CPU cores in enhancing substantial performance without investing a penny extra Hardware in SQL 2016 !!!! - Anonymous
October 30, 2017
The comment has been removed - Anonymous
February 13, 2019
Thanks Bob. I hoping you can clarify for me what your settings actually were in your test case: "With HT aware auto soft-NUMA, we get up-to 30% gain in query performance when DOP is set to the number of physical cores on a socket (12 in this case) using Automatic Soft NUMA."Does this mean that you had MAXDOP set to 12 (which you were able to do thanks to the auto soft-NUMA)? I have a SQL Server 2017 server with 10 cores, and soft-NUMA is enabled at start up. So is the recommendation to go ahead and set MAXDOP to 10, instead of the previously recommended 8 in KB2806535?