SQL Server Performance Analysis Part III
PART III: "How To" get down to whatever is bringing SQL to its knees...
Now let’s get down to pinpointing the troublemaker:
In most cases, when SQL’s performance is degraded, we never know what exactly is causing the issue. In most cases, we also don’t know how to express SQL Server's performance issue. It is mostly expressed as: "users say that it's slow to retrieve data" or "users are getting timeouts" or just plainly: "sql is slow". You have some indication of where to start (for example: timeout issue orcpu at 100%) when see how SQL is reacting but let's work with the "SQL is just slow" approach.
At first, I like to check the whole server as well as SQL. For that, we use: Performance Monitor.
Note: Please note that I will not go into depth of how to analyze performance monitor: I will however provide the basics + a little more in-depth information if possible.
PERFORMANCE MONITOR:
Memory
Looking at the performance object MEMORY, i would check the following counter:
-
Available Mbytes:
This shows us how much memory on the server is not being used. This lets us know if Kernel is struggling for memory (MB < 300 - average) or if we have more memory that can be given to SQL Server should it require it.
Once that is jotted down, let’s look at SQL Server's memory:
<SQL Instance>BufferManager:
Lazy Writes / Sec:
Number of buffers written by buffer manager's lazy writer: The lazy writer is a system process whose main task is to flush out batches of dirty, aged buffers (buffers that contain changes that must be written back to disk before the buffer can be reused for a different page) and make them available to user processes. The value should be low indicating that there are enough buffers to withstand SQL Server's needs. If this value is quite high on a constant basis then we may have a memory bottleneck.
Page Life Expectancy:
Number if seconds a page will stay in the buffer pool without references: The higher the better. This will allow SQL Server to find more data in the buffer pool. This
counter "normally" goes hand in hand with Lazy Writer: Low PLE = High LW/S : ideally > 300 secs
Page Lookups / Sec:
Number of requests to find a page in the buffer pool: This value, as the above in regards to finding data in the buffer pool, should be a high value.
Page Reads / Sec
Number of physical database page reads issued: This value, depending on the work done by SQL, could be high, but shouldn’t be a very high value. This
value should be below (to not say way below) Page Lookups / Sec (indicates we did not find the page in
the BPool)
Page Wrties / Sec
Number of physical database page writes issued: This give you an idea of the amount of writing to database pages that is happening on SQL (to be used
later).
<SQL Insatance>MemoryManager:
Memory Grants Outstanding:
Total number of processes per second that have successfully acquired a workspace memory grant: if this value is great (> 4/5) than we know that we have large queries being executed that require lots of memory: most likley queries doing hashes or sorts.
Memory Grants Pending:
Total number of processes per second waiting for a workspace memory grant: this number should be 0. if it is not, this means that there isn't sufficient memory to allow all heavy queries to go through and some are stuck waiting. If this number is anything besides 0 then memory bottleneck is an issue: most likley poorly written queries.
Total Server Memory:
Total amount of dynamic memory that the server is using currently (for the BPool): this shows us how much memory SQL server is currently using. This helps to find issues in which we believe SQL Server is using (example) 3 GB and in fact it is not because a parameter was misconfigured.
Targer Server Memory:
Total amount of dynamic memory the server can consume: this is the amount of memory SQL Server can use. This should match the max server memory value in sp_configure. If this value starts to decrease, then we know that an external memory pressure is occurring forcing SQL to abdicate its "allowed" memory. This counter is useful to pinpoint outer or internal memory pressure should memory be a bottleneck.
Disks:
This is quite a tricky subject to analyze as many things need to be taken into consideration, but for a general view, the 2 performance objects worth looking at are:
Logical Disk|Physical Disk\%Idle Time: Reports the percentage of the time that the disk subsytem was not processing requests and no work was queued. If the
average (or the graph design) indicates low values, this means that the disk is being used... A LOT.
This is turn can affect SQL Server performance.
Logical Disk|Physical Disk\Average Disk Sec / Trf: Indicates how fast data is being moved (in seconds). Measures the average time of each data transfer, regardless of
the number of bytes read or written. A high value for this counter might mean that the system is retrying requests
due to lengthy queuing.
Here is a link that shows the recommended value for the above counter:
https://technet.microsoft.com/en-gb/library/cc966540.aspx --> applies for SQL 2000 as well.
Basically: if your avg disk sev / trf is higher than 30 milliseconds, then you may have a disk issue. Please refer to further posts in regards to SQLIOSIM / SQLIO:
Note:
If memory issues are indicated in the performance monitor, or even disk issues, doesn’t mean that the SQL Server needs more memory or better disks. It could mean that the queries executed are heavy and require too much memory, or are reading too much data from disk. Either way, all data should be consolidated and conclusions then be made.
Processors:
%Idle time:
Percentage of time the processors is idle during a sample interval. If this value is low (average), then we have high cpu which indicated a CPU bottleneck. The cause however will be determined using other data however from this we can tell that CPU is overloaded.
There is a whole lot more to performance monitor than just these counters however to follow a certain pattern, I will discuss the counters as the "analysis" progresses. The
information above is to get a general view of Disks / CPU / Memory of the server. Once we filter down, I will enter into more specific counters which shall be used confirm certain diagnostics.
After looking at the files mentioned above, we should have jotted down the following key information:
a. How much memory is allocated to Kernel and to Applications
b. Is the Kernel showing signs of a struggle for memory
c. Is SQL Server experiencing a disk bottleneck
d. Is SQL Server experiencing a memory bottleneck
e. Is SQL Server experiencing a cpu bottleneck
From the answers above, we are funnelling our way into a more in-depth analysis to WHY are the resources mentioned above being caught in a bottleneck. In other words, what is causing memory / cpu / disk struggles which in turn degrade SQL Server performance.
Next Blog Post: Part IV - Blocker Script ....