As a SQL Server DBA one of the most important part of our job is to do performance tuning. But, as soon as some says performance tuning, most DBAs are clueless on how to start troubleshooting.
There are many articles, blogs which focuses on key aspects of performance tuning. However, in this series, we will focus on the approach of troubleshooting.
Each article will give you a step by step checklist of key components that you should check. This helps you infer from the data that you capture leading you to your next step.
Typical Scenario:
A client mails you that his SQL server is working very slowly. Please look into it.
Starting your troubleshooting:
Once you receive such a notification. First question that comes into your mind should be what is currently running on the server. So you go and connect to SQL instance using SSMS and check the current running query.
One of the best way to check that is to run sp_whoisactive which is written by Adam Machanic. It will give you what query is running, for how long it has been running, wait statistics, CPU consumption, memory, estimated query plan among few other important things.
Let’s assume that there is no blocking on any of the query and may be none of the query is culprit for slowness, next step is to check wait statistics. We will take case of blocking and query slowness in a separate post.
There are two kind of wait stats that you need to check, first one is current waiting tasks, which will show you on what resources, wait is currently there on SQL instance. Second is the aggregate waits which will show you total wait stats gathered since the SQL instance was started or since the wait stats were cleared.
This is the first real step that you should be checking. You can go and run the query and save the output in an excel file to save this data for future analysis.
There are many many waits types and Microsoft keeps on adding more wait types with each new version. However, few of the important wait types which you will generally see in your environment are CXPACKET, CXCONSUMER, SOS_SCHEDULER_YIELD, LCK*, PAGEIOLATCH_*, PAGELATCH_*, RESOURCE_SEMAPHORE, ASYNC_NETWORK_IO, BACKUP*, WRITELOG, OLEDB etc.
Each individual wait type is for different issue, for e.g., CXPACKET, CXCONSUMER, SOS_SCHEDULER_YIELD normally point us to parallelism. Similarly, PAGELATCH_* refer to wait caused due to pages lying in buffer. PAGEIOLATCH_* waits reflects waits caused by waiting for pages to come from disk. RESOURCE_SEMAPHORE may reflects problem in memory etc.
You can check multiple blogs to see what each of these waits are meant for.
But, a million dollar question now is what and how to infer what all these wait types are telling us.
For e.g., high CXPACKET wait along with high PAGEIOLATCH_SH wait may point us to an index problem. As soon as you understand on what these total waits are pointing towards. The sooner you will be able to go to next step.
In the next post, we will go through some of the examples of wait statistics which I have captured from multiple servers and would try to gain some meaningful information from them together.
Happy Learning!