SQL Server 2008 DR, Testing, Waits and Queues
Here are some notes on “SQL Server 2008 DR, Testing, Waits” I took while attending an advanced class on SQL Server taught by Ron Talmage (from https://www.solidq.com/na/MentorDetail.aspx?Id=38).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Disaster Recovery
- RTO (Recovery Time Objective), RPO (Recovery Point Objective), RTA (Recovery Time Actual)
- Backup Media (disk/tape) and Location (local, remote, offsite)
- Recovery Technology (Backup/Restore, Mirroring, Log Shipping, Clustering, Replication)
DR and HA
- What’s the difference between HA and DR?
- How would you prepare for a disaster?
- What happens if you lose the entire data center?
- Primary data center, standby data center.
- Consider risks, cost, time to invest.
- Big question: What are you trying to protect against?
Class discussion
- If you have an RPO that requires very small data loss, what to do?
- Small RPO? Large RPO? Small RTO?
- How far apart can the two nodes be?
- Synchronous or asynchronous replication?
- Consider: Clustering, Mirroring, SAN Replication, GeoClustering, Log Shipping
- Planned downtime: New hardware, Upgrade, Patches, Versions, Consolidation
- Unplanned downtime: Disasters
- How to flip a switch to the DR site?
- Virtualization support for SQL Server - https://support.microsoft.com/KB/956893
- Server Virtualization Validation Program (SVVP) - https://technet.microsoft.com/en-us/library/ms143506.aspx
- Windows Server 2008 Failover Cluster Configuration Program (FCCP) - See https://www.microsoft.com/windowsserver2008/en/us/failover-clustering-program-overview.aspx
Technologies
- VDI - COM-based interface with SQL Server 7 and later
- VDI - Supports snapshot backups using SAN-based technologies
- VSS - https://blogs.technet.com/josebda/archive/2007/10/10/the-basics-of-the-volume-shadow-copy-service-vss.aspx
- DPM - www.microsoft.com/dpm
- Log Shipping - https://msdn.microsoft.com/en-us/library/ms187103.aspx
- Database Mirroring - https://msdn.microsoft.com/en-us/library/bb934127.aspx
Benchmarking and baselining
- Load-testing, benchmarking and baselining
- Meet performance requirements
- Verify correctness
- Determine system limits
- Determine impact of changes
Class Discussion
- Baseline - Ground zero, known state, Point of comparison
- Benchmark - Measurement under specified conditions, goal, used to estabilish baseline
- TPC - https://www.tpc.org/, TPC-E for OLTP, TPC-H for DW/OLAP. Reports trans/sec and response time
- Load test - Application of a load, looking for a sufficient or expected load
- Stress test - Increase load. Done with concrete in construction.
- Saturation test - Add load until it fails.
- Endurance test – With specified load for a long time
White Paper
- Tuning the Performance of Backup Compression in SQL Server 2008
- https://sqlcat.com/technicalnotes/archive/2008/04/21/tuning-the-performance-of-backup-compression-in-sql-server-2008.aspx
- Looking at the white paper
- - Backup throughput to NUL with varying BUFFERCOUNT (what’s the baseline?)
- - Disk throughput and latency during backup compression (disk saturated, increasing queue)
- - Database snapshot creating time with varying workload (what’s the baseline? Load test.)
Performance counters
- Monitoring Resource Usage - https://msdn.microsoft.com/en-us/library/ms191246.aspx
- See https://blogs.msdn.com/jimmymay/archive/2008/10/15/perfmon-objects-counters-thresholds-utilities-for-sql-server.aspx
- See https://msevents.microsoft.com/cui/WebCastEventDetails.aspx?culture=en-US&EventID=1032357640&CountryCode=US
Tools
- SQLIO, IOMeter, SQLIOSim, RML utilities, VSTS, 3rd party tools
- SQLIO – https://www.microsoft.com/downloads/details.aspx?familyid=9A8B005B-84E4-4F24-8D65-CB53442D9E19&displaylang=en
- IOMeter – https://www.iometer.org/
- SQLIOSim - https://blogs.msdn.com/sqlserverstorageengine/archive/2006/10/06/SQLIOSim-available-for-download.aspx
- SQLIOSim - correctness and stress tool, simulates data and log file activity, does not require SQL
- See https://sqlblog.com/blogs/kevin_kline/archive/2007/06/28/understanding-sqliosim-output.aspx
- RML utilities – OSTRESS, ReadTrace, Reporter - https://support.microsoft.com/kb/887057
- See https://blogs.msdn.com/psssql/archive/2007/12/18/rml-utilities-for-microsoft-sql-server-released.aspx
- See https://sqlcat.com/technicalnotes/archive/2008/02/01/precision-performance-for-microsoft-sql-server-using-rml-utilities-9-0.aspx
Waits and Queues
- Wait occurs when thread has to wait for a resource
- How waits reporting works on SQL Server – waiter queue and wait types
- SQL Server 2000 – 76 types of waits
- SQL Server 2005 – 201 types of waits
- SQL Server 2008 – 484 types of waits
- Query with sys.dm_os_wait_stats, reset with DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR)
- See https://msdn.microsoft.com/en-us/library/ms179984.aspx
- See https://blogs.msdn.com/sqlcat/archive/2005/09/05/461199.aspx
- SQL Server 2005 Waits and Queues - White Paper
https://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/Performance_Tuning_Waits_Queues.doc
From the white paper, regarding OLTP workloads
- Database Design
- High Frequency queries having a high number of table joins (>4)
- Frequently updated tables having # indexes (>3)
- Big IOs - Table Scans, Range Scans (>1)
- Unused Indexes
- CPU
- Signal Waits (>25%)
- Plan reuse (<90%)
- Parallelism: Cxpacket waits (>5%)
- Memory
- Page life expectancy (<300 sec)
- Page life expectancy (Drops by 50%)
- Memory Grants Pending (>1)
- SQL cache hit ratio (<90%)
- Disk
- Average Disk sec/read (>20 ms) – Comment: this should be lower (>5, >8)
- Average Disk sec/write (>20 ms) -- Comment: this should be lower (>5, >8)
- Big IOs - Table Scans, Range Scans (>1)
- ASYNCH_IO_COMPLETION, IO_COMPLETION, LOGMGR, WRITELOG, PAGEIOLATCH_x (in top 2)
- Low bytes per sec
- Blocking
- Block percentage (>2%)
- Block process report (30 sec)
- Average Row Lock Waits (>100ms)
- LCK_M_% (in top 2)
- 5, High number of deadlocks (>5 per hour)
- Network
- High network latency + many round trips to DB (Output queue length >2)
- Network bandwidth used up (Packets Outbound Discarded, Packets Outbound Errors, Packets Received Discarded, Packets Received Errors)
Troubleshooting Performance Problems in SQL Server 2005
https://download.microsoft.com/download/1/3/4/134644fd-05ad-4ee8-8b5a-0aed1c18a31e/TShootPerfProbs.doc
Diagnosing Transaction Log Performance Issues and Limits of the Log Manager
https://sqlcat.com/technicalnotes/archive/2008/12/09/diagnosing-transaction-log-performance-issues-and-limits-of-the-log-manager.aspx
Comments
- Anonymous
January 01, 2003
PingBack from http://servercoach.com/?p=1273 - Anonymous
January 01, 2003
Here are some notes on “SQL Server 2008 I/O Performance” I took while attending an advanced class on