Advanced Performance Tuning – 1 :: Importance of data-types
Why did we develop DBMS systems? Not just for storing data – but also to be able to search for and retrieve stored data, and we define what our search criteria should be.
This was probably true a couple of decades back, but we have come a long way now. With the huge advancement of technology and the computing powers being doubled probably a few hundred times, we are now not satisfied with just being able to retrieve stored data; we want to retrieve data faster, we want our operations to complete in the least amount of time possible.
All of us are aware of the fact that queries must be tuned to achieve optimal performance, some of the common ways of tuning database systems include creating appropriate indexes, updating statistics, rebuilding indexes to reduce fragmentation. However, sometimes we tend to forget that each database system is just another piece of code running on a bunch of dumb hardware; unless we design our databases properly, unless we write efficient code, we will never be able to achieve the speeds we desire. In this and subsequent posts, I will try to explain the most common mistakes that I have experienced, and how these can affect system performance.
In this post, I will try to explain the importance of appropriate data-types. To start off, let’s create a new table and populate it with data:
CREATE TABLE [dbo].[Employee](
[EmpID] [int] IDENTITY(1,1) NOT NULL,
[EmpFName] [varchar](50) NOT NULL,
[EmpMName] [char](2) NULL,
[EmpLName] [varchar](50) NULL,
[EmpSal] [numeric](18, 2) NOT NULL,
[EmpAddress] [varchar](250) NULL,
[EmpCountry] [char](2) NOT NULL,
CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED
([EmpID] ASC))
GO
We now, insert 1,000,000 rows of data into the table. We have the following Stored Procedure:
CREATE PROCEDURE sp_GetAvgSal
@Country NCHAR(2)
AS
BEGIN
SELECT AVG(EmpSal) FROM Employee WHERE EmpCountry = @Country
END
We will execute this Stored Procedure and see how much time it takes:
SET STATISTICS IO ON
SET STATISTICS TIME ON
SET STATISTICS PROFILE ON
GO
EXEC dbo.sp_GetAvgSal @Country = 'IN'
GO
It takes ages for the query to return back the results. Finally, when it does give us back the results, here is what we get:
SQL Server Execution Times:
CPU time = 10093 ms, elapsed time = 282344 ms.
We can clearly see that there are lots of Reads, and we know why – there is no Index on the EmpCountry column. Hence, the Clustered Index is being Scanned for getting us back the results. Great! Since we know the source of the problem, let’s create the index to speed things up…
CREATE NONCLUSTERED INDEX [IX_Employee_Country] ON [dbo].[Employee]
([EmpCountry] ASC) INCLUDE ([EmpSal])
We execute the same Stored Procedure with the same parameters again. We get better results now; but not quite to our expectations.
SQL Server Execution Times:
CPU time = 202 ms, elapsed time = 216 ms.
What’s going on? We have a proper Index, still we are not able to achieve the performance we deserve. Let’s have a look at the execution plan. A section of the plan is:
StmtText
----------------------------------------------------------------------------------------
SELECT AVG([EmpSal]) FROM [Employee] WHERE [EmpCountry]=@1
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006
|--Stream Aggregate(DEFINE:([Expr1005]=Count(*), [Expr1006]=SUM([Test].[dbo].[Emp
|--Index Scan(OBJECT:([Test].[dbo].[Employee].[IX_Employee_Country])
An Index Scan? Why is the Index [IX_Employee_Country] being scanned? Shouldn’t we see an Index Seek? The answer lies in the same execution plan. Let’s have a look at that branch:
Index Scan(OBJECT:([Test].[dbo].[Employee].[IX_Employee_Country]),
WHERE:(CONVERT_IMPLICIT(nchar(2),[Test].[dbo].[Employee].[EmpCountry],0)=[@1]))
The EmpCountry column in the Employee table is CHAR(2) which cannot be directly compared with NCHAR(2); hence, SQL Server is forced to CONVERT the EmpCountry column values to NCHAR(2) before it can perform the comparison. Again, SQL Server does not know what the column values will look like after converting the Non-Unicode character values to Unicode. Hence, it converts each column value to Unicode and then compares the result with the value passed. Hence, the Index Scan.
How can we eliminate the Index Scan? Easy, isn’t it? Rewrite the Stored Procedure so that the @Country parameter data-type matches the data-type of the underlying table. We did this, and do you know what the result was?
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.StmtText
----------------------------------------------------------------------------------------
SELECT AVG([EmpSal]) FROM [Employee] WHERE [EmpCountry]=@1
|--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1004]=(0) THEN NULL ELSE [Expr1005
|--Stream Aggregate(DEFINE:([Expr1004]=Count(*), [Expr1005]=SUM([Test].[dbo].[Emp
|--Index Seek(OBJECT:([Test].[dbo].[Employee].[IX_Employee_Country])
So what did we learn? While doing comparison, you must make sure that the data-types used in the underlying tables match the data-types of the data we are comparing to. This will make sure we eliminate the IMPLICIT CONVERSIONS and thus, we can utilize the Indexes in a much better way.
Disclaimer: All information provided here is my personal opinion and is neither verified nor approved by Microsoft before it is published. All information, and code samples, if any, is provided "AS IS" with no warranties and confers no rights.
Comments
Anonymous
September 13, 2010
Nice post and simple explanation are usual ingredients of your posts, keep'em up :-) Looking fwd to more post on 'Query Performace' -VarundAnonymous
September 13, 2010
Great post Suhas, especially as it emphasizes the importance of these suggestions for future enhancement: Please, add support for complex data types in T-SQL, especially ability to derive types from existing database objects (tables, views and columns) (connect.microsoft.com/.../124506) Implement equivalent of Oracle's %ROWTYPE attribute (connect.microsoft.com/.../520497)Anonymous
September 13, 2010
Hi Jamiet Thanks for your feedback. The SQL Development Team has been working continuously to improve T-SQL programmability and in fact, T-SQL standards have been improved from SQL Server 2000 to SQL Server 2005 to SQL Server 2008 and now to SQL Server 2008 R2. Please be rest assured that this effort will continue into future releases. We will definitely pass on your feedback to the Development Team, and we do hope that these get integrated into the next major release. Thanks again, Suhas DeAnonymous
September 20, 2011
Great stuff, very simple explanation Suhas. I have added your blog to my favorites and planning to read it one by one.Anonymous
May 05, 2014
hi!!! to all statistian help me this qns 'why we understand the types of data are very important?' '