Wednesday, March 17, 2010

Performance Tuning 101 – What You Will Not Learn In The Class Room (Part 2)

In my last post, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html I talked about performance tuning queries that appear to be well tuned. There are a lot of optimization techniques available unbeknownst to most developers that do not require indexes or radical code changes. These are the optimizations that I will be talking about in this post. There is absolutely no way I could go over every possible optimization technique available, but I will do my best to present as much content here today, and will make future posts on other techniques.

I will start things off by talking about a challenge that Ramesh Meyyappan presented in his webcast, http://www.sqlworkshops.com/. Ramesh’s challenge was to solve the TOP 101 phenomenon, using SQL Server 2005. To start things off, I will create a sample table, with data.

USE [tempdb]
GO

SET NOCOUNT ON;
GO

IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
BEGIN
    DROP TABLE dbo.[TestData];
END
GO

CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
SomeId INT,
SomeCode CHAR(2000)
);
GO

;WITH
   L0 AS (SELECT 1 AS C UNION ALL SELECT 1)       --2 rows
  ,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B)    --4 rows (2x2)
  ,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B)    --16 rows (4x4)
  ,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B)    --256 rows (16x16)
  ,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B)    --65536 rows (256x256)
  ,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B)    --4,294,967,296 rows (65536x65536)
  ,Number AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5)
INSERT INTO dbo.TestData
SELECT
    N AS RowNumber,
    ABS(CHECKSUM(NEWID()))%1000000+1 AS SomeId ,
    REPLICATE('a',2000) AS SomeCode
FROM Number
WHERE [N] <= 50000
GO

UPDATE STATISTICS dbo.[TestData] WITH FULLSCAN;
GO

Next, I will create a query that uses TOP and an order by to return 100 rows.

--Fast
SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION (MAXDOP 1)
/*
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 1 ms.

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 102 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Now watch what happens when I change the TOP operator to 101. You will notice that I did not change anything else in the query other than increasing the number of rows returned by 1.

--Slow
SELECT TOP 101 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 312 ms,  elapsed time = 1690 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Wow…. TOP 101 is over 17 times slower than TOP 100 and all I changed is the number of rows in the TOP operator!!!!! So why does the optimizer take so much longer to optimize and execute a query using TOP 101, oppose to TOP 100? The short answer is the memory requirements. The TOP 101 queries requires a lot more query memory than TOP 100, which translates into tempdb sorting. As you may recall, I addressed some techniques to solve the tempdb sorting problem in my last post, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html. If you are using SQL 2008, you can use the same optimization techniques presented in my prior post, but SQL 2005 is a completely different animal. To make the TOP 101 query faster, we need to first understand why it is slower. Let’s take a look at what is different when we run the TOP 100 and the TOP 101 query.

Lets start by looking at the memory SQL Server grants to each query. Open two different query windows and execute each top query within a while loop. We can then use sys.dm_os_memory_grants to get the required memory.

Here is a sample of how to run the TOP query in a while loop.

WHILE 1=1
SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

In a new query window, run the following query to get the memory specifications.

SELECT  [granted_memory_kb],[required_memory_kb],[max_used_memory_kb] FROM sys.dm_exec_query_memory_grants WHERE [session_id] = 58

Here are my results:

     granted_memory_kb    required_memory_kb   max_used_memory_kb
---- -------------------- -------------------- --------------------
FAST 1024                 216                  216

     granted_memory_kb    required_memory_kb   max_used_memory_kb
---- -------------------- -------------------- --------------------
SLOW 6040                 512                  6040

The results are simply astonishing. The memory requirements increase nearly 28 times when I use TOP 101, instead of TOP 100. I do not have a formal explanation of why the TOP 101 operator, consumes more memory than TOP 100. Brad Schulz, http://bradsruminations.blogspot.com/, has contacted Conor Cunningham about this issue and believes that 101 is an arbitrary threshold. Brad is working on an in-depth post involving the TOP operator. Keep an eye out for this one, as it should be really good. Anyway, once the 101 threshold is breached the optimizer uses different calculations to optimize a query, which can effectively bloat the memory requirements for the query. This memory bloat forces the sort operation to spill into tempdb. This is where the TOP 101 bottleneck exists. To verify this problem, open profiler and choose the sort warnings counter and you will see that the slow query has a sort warning error, while the fast query does not.

Now that I have identified the problem, how do I solve it? I will start by attempting the methods that I used in the previous article, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html.

SELECT TOP 101 [RowNum],[SomeId],CAST([SomeCode] AS VARCHAR(4200))
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 3385 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Bloating the estimated row size still did not help our situation. Next I will try shrinking the row size.

SELECT TOP 101 [RowNum],[SomeId],RTRIM(CAST([SomeCode] AS VARCHAR(2000)))
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 2461 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Hmm. Still no luck….. How can I reduce the row size of the input passed into the sort operator? When you really sit back and think about the problem, the answer is really simple. To reduce the row size, all you have to do is reduce the columns involved in the sort. I like to use the TOP inside a derived table, making sure to only use the RowNum and SomeId columns. We can then join back onto the TestData table. This gives us a fast sort and a ultra fast index seek on the 101 rows we are returning.

SELECT t.[RowNum],t.[SomeId], t.[SomeCode]
FROM dbo.[TestData] t
INNER JOIN(
    SELECT TOP 101 [RowNum],[SomeId]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
) AS t2
    ON T.RowNum = t2.RowNum
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 104 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Alternatively, we can use correlated subqueries or the cross apply operator.

SELECT TOP 101
    t.[RowNum],
    (SELECT t2.[SomeId] FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS SomeId,
    (SELECT t2.[SomeCode] FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS SomeCode
FROM dbo.[TestData] t
WHERE t.[RowNum] < 30000
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

SELECT TOP 101
    t.[RowNum],
    t2.SomeId,
    t2.SomeCode
FROM dbo.[TestData] T
CROSS APPLY(SELECT t2.SomeId, t2.SomeCode FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS t2
WHERE t.[RowNum] < 30000
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

It should be noted that the correlated subquery method will produce more IO because it uses two subqueries. As you can see, the solution to this challenge is quite simple, but the solution requires an understanding of what is occurring underneath the hood of SQL Server.

Note: It is still possible that some of the sorting will be sent to tempdb, but you should see a elapsed time that rivals TOP 100.

The next optimization technique, I will be demonstrating is a predicate pushing problem. Unbeknownst to most developers, SQL Server 2005 does have a problem with predicate pushing in views. A lot of these issues have been resolved in SQL Server 2008, but should be known. I will be demonstrating a very simple example, using a ranking function. Ranking functions are relatively new to SQL Server and were introduced in 2005. I am sure there are other scenarios that cause predicate pushing problems, but I will only be addressing the ranking problem, in this post.

Let’s start by creating a small sample table.

USE [tempdb]
GO

CREATE TABLE Test(
ID INT IDENTITY(1,1) PRIMARY KEY,
FName VARCHAR(50),
LName VARCHAR(50)
);

INSERT INTO dbo.Test VALUES ('Adam','Haines');
INSERT INTO dbo.Test VALUES ('John','Smith');
INSERT INTO dbo.Test VALUES ('Jane','Doe');
GO

CREATE NONCLUSTERED INDEX ncl_idx_LName ON dbo.Test(LName) INCLUDE(FName);
GO

As you can see, the table is relatively simple. The idea is to present an easy to understand example that demonstrates potential performance problems with views.

Here is my simple query that shows an index seek on LName.

SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
FROM dbo.Test
WHERE LName = 'Smith'
GO

image

Let’s see what happens when I put the logic into a view, with no predicate. The predicate will be called from outside the view and should be pushed down into the view, as views are materialized into the underlying objects at runtime.

CREATE VIEW dbo.vw_Test
AS
SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
FROM dbo.Test
GO

I will now query the view using the same predicate as the original query.

SELECT Id,FName,LName,seq
FROM dbo.vw_Test
WHERE LName = 'Smith'
GO

image

The problem here is the optimizer decided to filter the results of the query AFTER the table “Test” has been scanned. One should expect the optimizer to seek on the LName column because the optimizer should push the predicate; however, SQL Server 2005 does not do a great job of this. SQL Server 2008 will appropriately push the predicate deep into the plan to get the index seek. How do we solve this problem? Unfortunately, there is not a whole lot you can do to make the plan work more efficiently. The best option in my opinion is to a INLINE TVF to parameterize the query.

CREATE FUNCTION dbo.fn_Test(@LName VARCHAR(50))
RETURNS TABLE
RETURN(
    SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
    FROM dbo.Test
    WHERE LName = @LName
)
GO

Now execute a select against the TVF using the same predicate.

SELECT Id,FName,LName,seq
FROM dbo.fn_Test('Smith')

image

There you have it. I have demonstrated a few optimization techniques that I have used to solve performance problems. I have only scratched the surface here. There are many more optimization techniques available. Stay tuned for future posts, where I will explore even more optimization techniques including a shocking example demonstrating how an index rebuild can introduce fragmentation and how to avoid it.

Until next time, happy coding.

13 comments:

Brad Schulz said...

Great stuff, as always, Adam!

I especially liked the predicate pushing issue... that was one I wasn't aware of.

BTW, Conor Cunningham confirmed that 100 is definitely a "tipping point" in a TOP query. It was just an arbitrary (and appropriate) number they chose... One algorithm is used for TOPs less than 101 and a different algorithm is used for TOPs of 101 or greater.

Piotr Rodak said...

That's great post Adam, thanks!
I am recently deeply involved in performance tuning and I will look for suspicious cases you covered here.

Anonymous said...

"The TOP 101 queries requires a lot more query memory than TOP 101" - I think second "101" should be replaced with 100

Adam Haines said...

Thanks. You are correct the statement should read 100. I will make the correction.

-Adam

Anonymous said...

I suspected that ROW_NUMBER() issue for a long time. Thanks for that long waited confirmation.

selinitur said...

I have also heard about the way of repair microsoft outlook

Joshua Smith said...

Thank you for sharing with useful information. It was nice to read the review. Try to use outsourcing software development company thats allow you to get customized software development to upgrade your business.

Joshua Smith said...

A lot of thanks for these some great tips. Casino affiliates, webmasters and partners always look for casino programs to increase their revenue income from best casinos or poker rooms.

Joshua Smith said...

Thanks for sharing with us online info. For auto owners who is searching for new cars, compare autoquotes from top auto insurance companies.

Joshua Smith said...

We appreciate your help. Turn your attention on low cost home insurance provided by top home insurers.

Joshua Smith said...

Nice opportunity to get more info. I'm happy to share with instant cheap insurance that is supported by top insurers. You may save on auto instant insurance quotes which allow customers to get cheap policy.

tabaco123 said...

This information is very interesting, I really enjoyed, I would like get more information about this, because is very important, thanks for sharing! Auto Insurance with International Driver License

Maylene Vidler said...

I’d have to acknowledge with you on this. Which is not something I usually do! I love reading a post that will make people think. Fast Insurance Quote