Demystifying SQL Server: Correlated sub queries for good or evil?

Today on the MSDN forums I engaged a good discussion, on the innards of a correlated sub query http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/0a4e384c-f8c6-45b7-a4db-c14827500fe3/. I will start off by saying that all correlated sub queries are not evil, just most :-). All kidding aside it is not that correlated sub queries are bad, it is just there are better methods to hanlde these types of queries. Here is the BOL link to correlated sub queries, http://msdn.microsoft.com/en-us/library/ms187638.aspx. Let’s get started.

You are probably asking yourself, “How can a correlated sub query be bad?” The answer is because of a nested loop join. A nested loop join takes all the results from the outer query and searches for the a resulting value in the inner query. This type of join can be very expensive, especially for queries with large outer results. For example say that the outer query returns 1 million rows and the inner query has 500 hundred. In this example, each row returned by the outer table has to queried against the inner table, which means that inner table is queried 1 million times! Now that is a serious performance problem. Let’s have a look at some sample code to see how this really works.

First we will create the sample table: (Sample data compliments of Michael Asher)

use [tempdb]

GO

IF OBJECT_ID('dbo.item') IS NOT NULL

BEGIN

    DROP TABLE dbo.item;

END

GO

create table dbo.Item (

 [ItemID] int not null,

 [name] varchar(50) not null);

IF OBJECT_ID('dbo.Child') IS NOT NULL

BEGIN

    DROP TABLE dbo.Child;

END

GO

create table dbo.Child (

 Child1ID int identity(1,1) not null,

 ItemID int not null,

 txt varchar(100) not null );

INSERT INTO Item

SELECT TOP(1000000) rn, 'Row: ' + CONVERT(VARCHAR,rn)

FROM (

SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rn

FROM sys.columns a

CROSS JOIN sys.columns b

CROSS JOIN sys.columns c

) AS A

CREATE CLUSTERED INDEX idx_Item ON Item(ItemID)

Correlated sub query example:

SELECT

    i.[name],

    (SELECT c.[txt] FROM child c WHERE c.ItemID = i.ItemID)

FROM dbo.Item i

WHERE

    i.ItemId < 1000

Here is the execution plan and IO statistics:

Table 'Child'. Scan count 999, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Item'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

As you can see a nested loop join was used to get the data via the correlated sub query and the table “child” was scanned 999 times. Obviously, the increased IO and CPU translates into performance degradation.

Note: You can mouse over the item table show plan operator to view the actual number of rows.

Now let’s change the query to use a join.

SELECT

    i.[name],

    c.[txt]

FROM dbo.Item i

LEFT JOIN child c

    ON c.ItemID = i.ItemID

WHERE

    i.ItemId < 1000

Execution plan and IO statistics:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Item'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Child'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The difference between the queries is night and day. The number of scans decreased to 1, which is a heck of a lot better than 999. Moving a correlated sub query into the join makes the query much more performant and was an easy to implement solution. This is one of the most common performance problems that I have encountered over the years. The solution is easy to fix, but occurs a lot because developers do not understand the impact this type of query can have. The bottom line is RDMS are optimized for joins and best practice suggests the use of joins, to relate data and not correlated sub queries. This does not mean that there are not special cases where using a correlated sub query may be faster , but these cases are the exception and not the rule.

Extra Content:

Note: There are many instances where the query optimizer can transform the correlated sub query into a join. The catch is this usually occurs when a correlated sub query is used in conjunction with Exists/In, in the where clause. A correlated sub query in the select list will more than likely use a nested loop join. Regardless of how the optimizer transforms the query text, I would rather code the TSQL correctly the first time and not have to worry about the optimizer fixing the query for me.

Here is a sample of how the optimizer changes the correlated sub query to an inner join.

SELECT

    i.[name]

FROM dbo.Item i

WHERE

    i.ItemId < 1000

    AND EXISTS(

        (SELECT c.[txt] FROM child c WHERE c.ItemID = i.ItemID)

6 comments:

Anonymous said...: Sorry I gave up on the thread; It was interesing but it became a bit high strung. Thanks for blogging on this one.

Kent Waldrop; June 23, 2009 at 1:30 PM
Adam Haines said...: Yeah, I kinda did too. The post started off well, but ended up so far off tangent that it became frustrating. I posted the blog because I know that these types of queries impact performance. I see it day in and day out, which may not be known to everyone.; June 23, 2009 at 1:43 PM
Deepak said...: Adam,

Thank you for writing this post. Very interesting.; September 27, 2009 at 11:53 PM
Anonymous said...: Excellent post! very helpful; June 21, 2011 at 5:32 AM
Anonymous said...: Thank you for the clear explanation, i was looking for this kind of explanation and it was perfect for my situation; January 9, 2013 at 7:43 AM
Anonymous said...: How you will handle a situation where you want only one record from another table as joins replicates the row.; February 6, 2015 at 4:27 AM

Demystifying SQL Server

Monday, June 22, 2009

Correlated sub queries for good or evil?

6 comments:

Subscribe

Search This Blog

About Me

Twitter

Twitter

Blogs I follow

Followers

Blog Archive

Keywords