Use SQL Server replay tools to reproduce and resolve customer issues

For many ISVs that run into issues at customer sites it is sometimes difficult to isolate underlying problems, especially on 24x7 production environments, where limitations apply to real time troubleshooting and live debugging. In situations like this, constructing a repro scenario in a separate environment would be ideal to minimize impact to live production system, and to speed up the resolution process.

SQL Server Profiler

Allow me introduce SQL Profiler, which offers replay trace functionality. Well, it’s not something new. First shipped in SQL Server 7.0, the feature has gone through many improvements in later releases. You can use the tool to take a captured trace as input and replay it against test databases. It helps identify issues that can be reproduced by replaying the events in the trace. In SQL Server 2005 and 2008, the replay function can be configured to use multiple threads (up to 64) to replay workloads.

Advantages:

1. SQL Server profiler is a built-in tool with full support of Microsoft product team. It works out of box.

2. Easy to set up and run. Capture a trace using predefined replay template with all required events, and replay it against the original database(s). The target machine needs to meet certain requirements: https://msdn2.microsoft.com/en-us/library/ms175525.aspx

3. In addition to multi-threaded replay, it also provides option of step through to replay events in the order they were traced.

Disadvantages:

1. Certain events can’t be replayed including replication, events involving GUID, session binding events, operations on Blobs using bcp, full-text, READTEXT, WRITETEXT, and etc. See BOL for more details: https://msdn2.microsoft.com/en-us/library/ms188238.aspx

2. The tool does not support multiple machine replay (running multiple instances of Profiler from multiple machines to replay the trace).

3. Profiler GUI tool is client side tracing and might be intrusive and generate significant performance overhead when capturing events for replay. Be careful of what events to capture and consider using server side tracing (sp_trace_create).

 

RML Utilities

Starting in SQL Server 2000, the SQL Server Customer Support Services team (CSS) started a project of a similar tool, called Ostress, with higher flexibility and scalability to help troubleshoot some of the more challenging SQL problems. The latest version is packaged in “Replay Markup Language(RML) Utilities” supporting both SQL 2000 and SQL 2005: https://support.microsoft.com/kb/944837. The tool can replay multi-threaded events as profiler does but with multiple machine replay support. It can simulate up to 1000 concurrent threads. The tool has a component called Ostress, which takes a TSQL batch file and “stress” tests it by opening an arbitrary number of connections and iterating the TSQL batch in each connection configurable number of loops. This is useful when the workload can be characterized as the same or similar batch from various number of users (connections).

 

Advantages:

1. The tool offers both replay and stress test options.

2. It supports multiple machine replay (multiple instances of OStress) with up to 1000 concurrent threads.

3. OStress supports 7.0, 2000, and 2005 trace formats.

Disadvantages:

1. The tool is provided as is, no technical support from Microsoft. But you can submit questions via contact in readme of the tool.

2. Requires extra steps to process trace file and convert to RML format before being replayed.

3. Does not support MARS replay.

 

Recommendation

When to use SQL profiler and when to use RML Utilities? If you have a workload that can be replayed/reproduced with no or low concurrency requirement (<64 concurrent threads), use profiler which offers flexibility of step through or multi-threaded replay options. If you need to replay a workload with high concurrency requirement (> 64 threads) or an isolated batch that can be “stress” tested for simulation, use RML Utilities. Keep in mind, for concurrency replay, full sync of ordered events is very hard to replay and no tools exist today to exactly duplicate the original trace. So the issues that occured on a traced source server might not be reproduced consistently afterwards even on same environment.

 

In summary, both tools mentioned above are very useful for database replay, which can be helpful in diagnosinig and resolving SQL Server issues.  For simulation testing of multi-tier application environments, consider a load-test tool such as Visual Studio (Team edition) or 3rd-party vendor products.

Comments