Wednesday, January 20, 2010

SQL Server Myths Debunked (Part 2)

question-mark Last time I went over some of the most misconstrued myths to hit SQL Server, http://jahaines.blogspot.com/2010/01/sql-server-myths-debunked-part-1.html.  In this post I will be finishing the remaining items and I encourage you to post any comments about any myths you have seen or you have questions about. 

 

 

 

 

Below is the list of Items I will be addressing in this post:

  • Table data can be stored separately of the clustered index
  • Columns just need to exist in the index to be used
  • While loops are faster/better than cursors
  • Sp_executesql is always better than exec
  • Tables have a guaranteed sort order

Table data can be stored separately of the clustered index

The first myth I will be talking about is the myth that table data can exist separately of the clustered index.  This myth is absolutely false.  Table data exists in the clustered index.  This means the table does not store any data.  If the table is on a differing filegroup than the clustered index, the filegroup containing the table will be empty because the “table” is actually moved with the clustered index.  It should be noted that any non clustered indexes will remain on the same filegroup.  Let’s see this in action.

SET NOCOUNT ON
GO

USE master
GO

IF EXISTS(SELECT 1 FROM sys.databases WHERE NAME='CL_Idx')
BEGIN
    DROP DATABASE [CL_Idx];
END
GO

CREATE DATABASE [CL_Idx] ON  PRIMARY 
( NAME = N'CL_Idx', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10.DEV\MSSQL\DATA\CL_Idx.mdf' , SIZE = 4096KB , FILEGROWTH = 0 ), 
 FILEGROUP [FG1] 
( NAME = N'CL_Idx2', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10.DEV\MSSQL\DATA\CL_Idx2.ndf' , SIZE = 4096KB , FILEGROWTH = 0 ), 
 FILEGROUP [FG2] 
( NAME = N'CL_Idx3', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10.DEV\MSSQL\DATA\CL_Idx3.ndf' , SIZE = 4096KB , FILEGROWTH = 0 )
 LOG ON 
( NAME = N'CL_Idx_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10.DEV\MSSQL\DATA\CL_Idx_log.ldf' , SIZE = 4096KB , FILEGROWTH = 10%)
GO

IF NOT EXISTS (SELECT name FROM sys.filegroups WHERE is_default=1 AND name = N'FG1') 
ALTER DATABASE [CL_Idx] MODIFY FILEGROUP [FG1] DEFAULT
GO

USE [CL_Idx]
GO

IF OBJECT_ID('[CL_Idx].dbo.TestData') IS NOT NULL
BEGIN
    DROP TABLE dbo.TestData;
END
GO

CREATE TABLE dbo.TestData(
RowNum INT,
SomeId INT,
SomeCode CHAR(2)
) ON [FG1];
GO

INSERT INTO dbo.TestData
SELECT TOP 100000 
    ROW_NUMBER() OVER (ORDER BY t1.NAME) AS RowNumber,
    ABS(CHECKSUM(NEWID()))%2500+1 AS SomeId, 
    CHAR(ABS(CHECKSUM(NEWID()))%26+65)
    + CHAR(ABS(CHECKSUM(NEWID()))%26+65) AS SomeCode
FROM 
    Master.dbo.SysColumns t1,
    Master.dbo.SysColumns t2
GO

SELECT 
    CONVERT(varchar(10),s.NAME) AS FlName, 
    CASE WHEN s.name = 'CL_Idx2' THEN 'FG1' ELSE 'FG2' END AS FG,
    (FILEPROPERTY(s.name,'spaceused') * 8)/1024. AS [FG_Size(MB)]
FROM sys.master_files s
WHERE 
    database_id = DB_ID()
    AND type = 0
GO

CREATE UNIQUE CLUSTERED INDEX unq_cl_idx_Row_Num ON dbo.TestData(RowNum) ON FG2;
GO

SELECT 
    CONVERT(varchar(10),s.NAME) AS FlName, 
    CASE WHEN s.name = 'CL_Idx2' THEN 'FG1' ELSE 'FG2' END AS FG,
    (FILEPROPERTY(s.name,'spaceused') * 8)/1024. AS [FG_Size(MB)]
FROM sys.master_files s
WHERE 
    database_id = DB_ID()
    AND type = 0
GO

Results:

image

Columns just need to exist in the index to be used

The next myth is one of the primary causes of poor performance.  Some believe that simply having a column in the index means the optimizer can use that index to seek rows.  This believe is very far from the truth.  In reality, the index has to be the leftmost column in the index key to be used to seek the row.  This means the query predicate has to contain the leftmost index key; otherwise, you will get an index scan. Before I show you an example, here is a link that describes index basics, http://jahaines.blogspot.com/2009/07/real-sql-pages-beginners-guide-to.html.  Let’s get to the example.

SET NOCOUNT ON;
GO

IF OBJECT_ID('tempdb.dbo.#Customers') IS NOT NULL
BEGIN
    DROP TABLE #Customers;
END
GO

CREATE TABLE #Customers(
Id INT PRIMARY KEY CLUSTERED,
FName VARCHAR(25),
LName VARCHAR(25)
);
GO

INSERT INTO #Customers VALUES (1,'Adam','Haines');
INSERT INTO #Customers VALUES (2,'John','Smith');
GO

CREATE NONCLUSTERED INDEX ncl_idx_FName ON [#Customers](FName,LName);
GO

SET STATISTICS PROFILE ON;
GO

SELECT FName,LName
FROM [#Customers]
WHERE LName = 'Haines'
GO

SET STATISTICS PROFILE OFF;
GO

DROP INDEX ncl_idx_FName ON [#Customers];
GO

CREATE NONCLUSTERED INDEX ncl_idx_FName ON [#Customers](LName,FName);
GO

SET STATISTICS PROFILE ON;
GO

SELECT FName,LName
FROM [#Customers]
WHERE LName = 'Haines'
GO

SET STATISTICS PROFILE OFF;
GO

Results:

image

Now by switching the order of the index columns we will get an index seek.

While loops are faster/better than cursors

The next myth is while loops are faster/better than cursors.  This is not true in all scenarios, but it is occasionally true.  The performance solely depends on what you are doing and how you are doing it.  I wont post an example for this because there are too many variances that can occur and I would have to dedicate an entire post on the differences. I will say that you should avoid both of these options, if possible.  These solutions are iterative/recursive in nature, which goes against how SQL Server operates.  90 percent of the time, you can replace iterative logic with set based logic and increase performance 10 fold.  The take away here is while loops are not ALWAYS better….. however there are usually better SET based solutions out there.

Sp_executesql is always better than exec

The next myth has to deal with dynamic SQL and the best way to execute it.  A lot of the time, developers will say that they are using sp_executesql, so the query is optimized.  These believes show a lack of understanding of how sp_executesql really works and where it really benefits.  The short answer is sp_executesql is not always better.  Under some circumstances sp_executesql and exec will produce the same query plan and the query will not be able to benefit from query plan reuse, unless an exact binary match already exists in cache.  This behavior occurs because the developer uses the parameters outside the scope of the dynamic SQL.  Only inline parameters will be used in the parameterization process, using sp_executesql.  Let’s have a look at this.

IF object_id('tempdb.dbo.#t') IS NOT NULL
BEGIN
    DROP TABLE #t;
END
GO

CREATE TABLE #t(
id INT,
col CHAR(1)
);
GO

INSERT INTO #t VALUES (1,'a');
INSERT INTO #t VALUES (2,'b');
INSERT INTO #t VALUES (3,'c');
GO

DECLARE @sql NVARCHAR(MAX),
        @param VARCHAR(25)

SET @param = 'b'        
SET @sql = N'SELECT Id,col FROM #t WHERE col=' + QUOTENAME(@param,'''')
EXEC sp_executesql @sql
GO

DECLARE @sql NVARCHAR(MAX),
        @param VARCHAR(25)

SET @param = 'c'        
SET @sql = 'SELECT Id,col FROM #t WHERE col=' + QUOTENAME(@param,'''')
EXEC(@sql)
GO

DECLARE @sql NVARCHAR(MAX),
        @param VARCHAR(25)

SET @param = 'd'        
SET @sql = N'SELECT Id,col FROM #t WHERE col=' + QUOTENAME(@param,'''')
EXEC sp_executesql @sql
GO


DECLARE @sql NVARCHAR(MAX),
        @param VARCHAR(25)

SET @param = 'e'    
SET @sql = N'SELECT Id,col FROM #t WHERE col= @dyn_param'
EXEC sp_executesql @sql, N'@dyn_param CHAR(1)',@dyn_param=@param
GO

DECLARE @sql NVARCHAR(MAX),
        @param VARCHAR(25)

SET @param = 'f'        
SET @sql = N'SELECT Id,col FROM #t WHERE col= @dyn_param'
EXEC sp_executesql @sql, N'@dyn_param CHAR(1)',@dyn_param=@param
GO

SELECT text, [execution_count] 
FROM sys.dm_exec_query_stats  qs
CROSS APPLY sys.[dm_exec_sql_text](qs.[sql_handle])
WHERE text LIKE '%SELECT Id,col FROM #t%' AND text NOT LIKE '%cross apply%'
ORDER BY [last_execution_time] DESC
GO

Results:

image

As you can see, the only query that took advantage of query plan reuse is the inline parameterized query.  When dealing with dynamic SQL you need to use sp_executesql and inline parameters to take full advantage of query plan reuse.  Also, fully parameterizing dynamic SQL reduces the risk of injection attack.

Tables have a guaranteed sort order

The final myth I will be talking deals with tables and their sort order.  By definition  tables do not have an order.  Tables are an unordered set of rows.  Confusion occurs because clustered indexes are supposed to order the rows.  While clustered indexes do physically sort the rows when they are built, there are no guarantees that clustered indexes will be used, thus you may not get clustered index order.  There are many execution plan operators that can influence the order in which data is returned. Operators such as stream aggregates, partitions, parallelism and even the use of nonclustered indexes can change the order of returned results.  The only guarantee that exists is that there is no guaranteed order, without the use of an ORDER BY clause.  Lets see this in action.

IF object_id('tempdb.dbo.#t') IS NOT NULL
BEGIN
    DROP TABLE #t;
END
GO

CREATE TABLE #t(
Id INT IDENTITY PRIMARY KEY CLUSTERED,
Col CHAR(1)
);
GO

INSERT INTO #t VALUES ('a');
INSERT INTO #t VALUES ('b');
INSERT INTO #t VALUES ('c');
GO

--Clustered Index scan yields clustered index order
SELECT * FROM #t
GO

CREATE UNIQUE NONCLUSTERED INDEX ncl_idx_Id_Col ON [#t](Id DESC,Col);
GO

--NONClustered Index scan yields NONClustered index order
SELECT * FROM [#t]

Results:

image

As you can see, there are a lot of influences to the order of the resultset.  If you need the dataset returned in a specific order you MUST specify an ORDER BY clause; otherwise, you risk using code that will sometimes work.

Wrap-up

That’s it. I have gone through the myths that I have encountered over the years.  If you have any more ideas, or any myths you want to debunk, post comments here.  I hope that you all have learned something new and hopefully the spread of these myths will stop here.

Until next time, happy coding.

3 comments:

BloggerAutoFollow.com said...

nice blog!

i found your blog through keyword "auto follow".

im following you.. pls follow me back.

Joshua Smith said...

Thanks a lot for talking about useful info. It is required to understand that mobile apps development could help in your industry by installing custom development software solutions. Also you may turn your attention on outsource enterprise content management software.

Joshua Smith said...

I'm glad to know more and read this great review. I'm happy to share with instant insurance rates that is provided by major insurance companies. You may save on cheap life insurance quotes which allow customers to get cheap affordable policy.