SQL Server 2008 SQLOS
Here are some notes on “SQL Server 2008 SQLOS” I took while attending an advanced class on SQL Server taught by Gert Drapers (from https://dbproj.com/ and https://blogs.msdn.com/gertd/).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Processes
- Processes (context, access token, virtual address space descriptors, handle table)
- SQL is a user mode process
- Threads (unit of execution, access token, shares process address space, can synchronize)
- Threads can have the same access token as process, or another one (impersonation)
- Every process starts with one thread - “main”
- Context switches
- Fibers (lightweight thread, need a thread to schedule fibers, fibers share same thread context)
Scheduling
- Pre-emptive scheduler (Windows decides when and how long you run)
- See https://windowsitpro.com/Articles/Index.cfm?IssueID=22&ArticleID=302
- Windows Internals Book
- See https://technet.microsoft.com/en-us/sysinternals/bb963901.aspx
Hardware
- Defining: Processor, CPU, Socket, Core, Logical Processor, NUMA, NUMA node
- Example: Notebook, dual-core, hyper-threaded = 2 cores, 4 logical processor
- Windows has 64 threads of execution in Windows Server 2008 – R2 will change that.
- NUMA node – group of logical processors and cache that are “near each other”
- SMP – Symmetric Multi-Processing, Front Bus to access memory, contention
- SMP layout – CPU, Northbridge (memory, graphics), Shouthbridge (PCI)
NUMA
- NUMA = Non-Uniform Memory Access / Architecture, ccNUMA
- NUMA nodes – contain own RAM, CPUs, memory controller, talk to other nodes
- Goal: minimize/eliminate front-bus contention
- Performance penalty to access memory in other nodes (foreign node)
- Applications need to be NUMA-aware to take advantage
- Soft-NUMA – try out how things behave – only performs on NUMA hardware
- Support for NUMA – BIOS setting: Node interleaving disabled = NUMA
- How does the OS knows? ACPI and SRAT – Static Resource Affinity Table
- If Multiple nodes > 1 – You have NUMA, Number of procs – actual sockets populated
- Thread scheduling with NUMA (how “ideal processor” is chosen)
- Important for the app to know the nodes and available memory on each
- https://www.microsoft.com/whdc/archive/numa_isv.mspx
Windows Server 2008 and NUMA
- Optimizations for NUMA, like dividing kernel’s non-paged memory across nodes
- Hardware has evolved and the OS added constructs to help apps
- See https://msdn.microsoft.com/en-us/library/aa363804.aspx
Windows Server 2008 R2
- Support for greater than 64 logical processors
- See https://code.msdn.microsoft.com/64plusLP
- Groups of CPUs, backward compatible with 64-bit affinity
- New APIs to use groups, set the affinity mask and use more than 64 LPs
- Nehalem System Architecture – See Intel QuickPath Architecture
- See https://www.intel.com/technology/quickpath/demo/demo.htm
- Example: HP – 64 dual-core Itanium
- Example: Unisys – 32 dual core, hyper-threaded Xeon
SQL Server Scheduling
- SQL Server 6.x scheduling – multi-threaded – worker thread pool – OS pre-empts threads
- SQL Server 7/2000 – User mode schedulers, 1 per LP – UMS work queues – fewer content switches
- - Users sessions assigned to UMS scheduler, round robin, no re-balancing
- - Bulk Insert is single threaded, not in parallel within the same scheduler
- - Memory: memtoleave, threads, bpool – simple calculations to divide the memory
- SQL Server 2005/2008 – SQLOS
SQLOS
- User mode layer between SQL Server and the Windows OS.
- Thread mode or fiber mode. Default is thread.
- See https://msdn.microsoft.com/en-us/library/ms189248.aspx
- Scheduling/memory/buffer pool/hosting/exceptions/locks/latching
- SQLOS Diagram – Protocols/Query Compilation and Execution Engine/Storage Engine/SQLOS
- SQLCLR, Reporting Services are hosted process in SQL OS
- NUMA – SQLOS directly maps to the hardware architecture, understands NUMA
- NUMA – SQLOS (only one) / SOS_MemoryNode / SOS_CPUNode / SOS_Scheduler / SOS_Task
- NUMA – Performance counters on per-node basis
- SMP – Only one SOS_MemoryNode
- See https://blogs.msdn.com/slavao/archive/2005/02/05/367816.aspx
- See https://blogs.msdn.com/slavao/articles/441058.aspx
- See https://www.redbooks.ibm.com/redpapers/pdfs/redp4093.pdf
- Hot-add CPU? Depends – Windows Server 2003 or 2008, SQL Server 2005 or 2008
- I/O Port – Per SOS_CPUNode
- See https://blogs.msdn.com/slavao/archive/2006/04/12/575185.aspx
SQLOS Architecture
- SOS_OS – Singleton – Manages nodes, system level information
- SOS_Node – 1 per NUMA node (or 1 in SMP) – Memory node, Scheduler Management
- SOS_Scheduler – CPU abstraction, binds tasks and worker threads, manages affinity
- Affinity. See https://msdn.microsoft.com/en-us/library/ms187104.aspx
- SOS_Tasks – Executes request from user, executes on a worker, abortable, timer tasks
Scheduler
- Queue : Runnable / Pending / Timer / IO
- Worker Pool, Current Work, Idle Worker, Wake Event, Abort Tasks List
- States – New, Pending, Runnable, Running, Suspended, Preemptive, Done, Monitor
- Non-preemptive – task runs until – yield, wait on sync. object, quantum expires
- Preemptive – threads not in SQLOS control – external code – wait stat skewed
- I/O completion – Worker on I/O completion port
- See https://msdn.microsoft.com/en-us/library/aa365198.aspx
- Scheduler Monitor – Preemptive/long running tasks, non-yielding, deadlock, maintenance
- Dedicated Admin Connection – DAC – Only one, reserved memory at startup, own scheduler
- DAC: Connection using sqlcmd.exe –a or server name = admin:servername
- Backup– also has its scheduler
- Ring buffer – Key construct for communication between multiple producers/consumers
DMVs
- SELECT * FROM sys.dm_os_sys_info
- SELECT * FROM sys.dm_os_schedulers
- SELECT * FROM sys.dm_os_workers
- SELECT * FROM sys.dm_os_tasks
- SELECT * FROM sys.dm_os_waiting_tasks
- SELECT * FROM sys.dm_os_wait_stats
- - Look at wait_time_ms – signal_wait_time_ms, not just wait_time_ms
- - Clear waits with dbcc sqlperf('sys.dm_os_wait_stats',clear)
- - More detailed wait stats – In SQL Server 2008 with XEvents
- SELECT * FROM sys.dm_os_threads
- SELECT * FROM sys.dm_io_pending_io_requests
- See https://msdn.microsoft.com/en-us/library/ms176083.aspx
OS Memory Management
- Windows provide virtual memory services (except for AWE and PAE)
- 32-bit gives you up to 4GB – 64-bit, in theory, up to 16 EB
- Three buckets: committed memory, reserved memory (not physically used) and free memory
- Committed: working set (RAM), paged out (pagefile) and mapped (shared components, DLLs)
- If you try to access reserved or free memory: exception
Memory Concepts
- Allocation: Reserve with VirtualAlloc() in 64KB chunks, allocation in 4KB/8KB chunks
- See https://msdn.microsoft.com/en-us/library/aa366887(VS.85).aspx
- Memory Limits for Windows - See https://msdn.microsoft.com/en-us/library/aa366778(VS.85).aspx
- /3GB – Instead of a 2/2 split for the 4GB, uses a 3/1 split – more pressure on kernel memory
- /PAE – Access to memory between 4GB and 64GB – 36 bits to address memory
- AWE – How an application can reach beyond the 4GB limit, when using /PAE
- See https://msdn.microsoft.com/en-us/library/aa366796(VS.85).aspx
- Case: Comparing 32bit, 32-bit (AWE), 32-bit (/3GB), 32-bit (AWE/3GB) and 64-bit
- WOW – Running 32-bit applications on 64-bit Windows
- See https://msdn.microsoft.com/en-us/library/aa384209(VS.85).aspx
- 64-bit – 16TB total, 8TB for kernel, 8TB application.
SQL Server Memory
- 32-bit: MemToLeave, thread stacks, page cache, plan cache, query workspace, locks, SQL Server, OS
- 64-bit: No MemToLeave, more thread stacks space
- Lock pages in memory – Only EE – See https://support.microsoft.com/kb/918483
- See https://blogs.msdn.com/psssql/archive/2007/10/18/do-i-have-to-assign-the-lock-privilege-for-local-system.aspx
- Large pages in 64-bit – Only EE – T834 – See https://support.microsoft.com/kb/920093
- Case: Running 64-bit, EE and 16 GB – By default, you are not running with large pages
- Careful – Large pages can lead to increased memory fragmentation, set SQL memory to a fixed size
- See https://msdn.microsoft.com/en-us/library/ms143432.aspx
- Calculating SQL Server max memory
- See https://msdn.microsoft.com/en-us/library/ms178067.aspx
- Configuring Windows pagefile – Minimum size in x64 is 8GB. Physical memory + 4GB
- SQL Server will swap to page file if – Not 64-bit, Not using EE, not setting “lock pages in memory”
- See https://support.microsoft.com/kb/889654
SQL Server Memory Management
- Memory Cycle: Memory broker / resource monitor / cache / heap / memory allocation
- Components: SQLSO / Memory Node / SOS_Node / Memory Clerk / Memory Object
- Components: Buffer Pool / Resource Monitor / Caching Framework / Memory Pools and Brokers
- Allocators: Single Page / Multi-Page / Large Page / …
- See https://support.microsoft.com/kb/907877
- Buffer pool: shared, every node has a piece of it.
- Buffer pool: reserved up front, committed/mapped on demand. Has a clerk, but not an allocator
- Stolen pages: piece of buffer pool “borrowed” for other purposes, like procedure cache
- Memory clerk: The actual page allocator, the one that actually gives you the memory
- Allows us to know where the memory is going, caches are memory clerks too
- Memory objects: memory allocators for arbitrary size
- Caching framework – common way to build caches
- Cache Store – User Store – Clock Algorithm – Clock Hands – Clock Entry Info
DMVs
- DBCC MEMORYSTATUS – A good starting point – Overview on a per node and per clerk details
- See https://support.microsoft.com/kb/271624
- sys.dm_os_memory_clerks – multiple types, per node – By default not fully populated
- See https://technet.microsoft.com/en-us/library/ms175019.aspx
- sys.dm_os_memory_objects
- See https://msdn.microsoft.com/en-us/library/ms179875.aspx
- sys.dm_os_memory_pools
- See https://msdn.microsoft.com/en-us/library/ms175022.aspx
- sys.dm_os_memory_cache%
- sys.dm_os_ring_buffers – Documented via KB articles
- See https://support.microsoft.com/kb/920093
Resource Monitor
- Monitors a set of processes and indicators, using a simple state machine
- Also Important when you change things like max memory, affinity mask, etc.
- Runs per node, on its own hidden scheduler, non-preemptive.
- Sends notifications to clerks – “Hey! We’re running low on resources!”
- Monitors: Low physical resources / Low Virtual Address Space / High physical resources
- Reactive or Proactive: Memory node fails to allocate 4MB / Resource monitor itself tries to allocate 4MB
- Internal pressure: Example: SHRINK notification from broker / Cache over 75% of target
- External pressure: Example: Signaled by OS via events – LowMemoryCondition
- See https://msdn.microsoft.com/en-us/library/aa490194.aspx
- Careful – Trend to consolidate multiple instances of SQL on a single, large box
- See https://blogs.msdn.com/slavao/archive/2005/02/19/376714.aspx
- Memory pressure:
- See https://blogs.msdn.com/sqlprogrammability/archive/2007/01/16/9-0-memory-pressure-limits.aspx
Memory Broker
- Dynamic memory distribution: buffer pool / optimizer / query execution / caches
- Always running, tunes memory consumption
- DMV: sys.dm_os_memory_brokers
- See https://msdn.microsoft.com/en-us/library/bb522548.aspx
Books of interest
- SQL Server 2005 Practical Troubleshooting
- Ken Henderson & others
- https://www.amazon.com/SQL-Server-2005-Practical-Troubleshooting/dp/0321447743
- SQL Server 2008 Internals
- Kalen Delaney & others
- https://www.microsoft.com/learning/en/us/Books/12967.aspx
Bonus topic: Extended Events
- Captures event information, like SQL Server Profiler, but more lightweight on capture
- Managed using CREATE/ALTER/DROP EVENT SESSION
- Very flexible way to determine which events are capture.
- Can save to another database, file., later use ETW tools to view, correlate with Windows events
- See https://msdn.microsoft.com/en-us/library/bb630319.aspx
- See https://www.microsoft.com/whdc/DevTools/tools/EventTracing.mspx