Last time I went over some of the most misconstrued myths to hit SQL Server, http://jahaines.blogspot.com/2010/01/sql-server-myths-debunked-part-1.html. In this post I will be finishing the remaining items and I encourage you to post any comments about any myths you have seen or you have questions about.
Below is the list of Items I will be addressing in this post:
- Table data can be stored separately of the clustered index
- Columns just need to exist in the index to be used
- While loops are faster/better than cursors
- Sp_executesql is always better than exec
- Tables have a guaranteed sort order
Table data can be stored separately of the clustered index
The first myth I will be talking about is the myth that table data can exist separately of the clustered index. This myth is absolutely false. Table data exists in the clustered index. This means the table does not store any data. If the table is on a differing filegroup than the clustered index, the filegroup containing the table will be empty because the “table” is actually moved with the clustered index. It should be noted that any non clustered indexes will remain on the same filegroup. Let’s see this in action.
SET NOCOUNT ON GO USE master GO IF EXISTS(SELECT 1 FROM sys.databases WHERE NAME='CL_Idx') BEGIN DROP DATABASE [CL_Idx]; END GO CREATE DATABASE [CL_Idx] ON PRIMARY ( NAME = N'CL_Idx', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10.DEV\MSSQL\DATA\CL_Idx.mdf' , SIZE = 4096KB , FILEGROWTH = 0 ), FILEGROUP [FG1] ( NAME = N'CL_Idx2', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10.DEV\MSSQL\DATA\CL_Idx2.ndf' , SIZE = 4096KB , FILEGROWTH = 0 ), FILEGROUP [FG2] ( NAME = N'CL_Idx3', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10.DEV\MSSQL\DATA\CL_Idx3.ndf' , SIZE = 4096KB , FILEGROWTH = 0 ) LOG ON ( NAME = N'CL_Idx_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10.DEV\MSSQL\DATA\CL_Idx_log.ldf' , SIZE = 4096KB , FILEGROWTH = 10%) GO IF NOT EXISTS (SELECT name FROM sys.filegroups WHERE is_default=1 AND name = N'FG1') ALTER DATABASE [CL_Idx] MODIFY FILEGROUP [FG1] DEFAULT GO USE [CL_Idx] GO IF OBJECT_ID('[CL_Idx].dbo.TestData') IS NOT NULL BEGIN DROP TABLE dbo.TestData; END GO CREATE TABLE dbo.TestData( RowNum INT, SomeId INT, SomeCode CHAR(2) ) ON [FG1]; GO INSERT INTO dbo.TestData SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY t1.NAME) AS RowNumber, ABS(CHECKSUM(NEWID()))%2500+1 AS SomeId, CHAR(ABS(CHECKSUM(NEWID()))%26+65) + CHAR(ABS(CHECKSUM(NEWID()))%26+65) AS SomeCode FROM Master.dbo.SysColumns t1, Master.dbo.SysColumns t2 GO SELECT CONVERT(varchar(10),s.NAME) AS FlName, CASE WHEN s.name = 'CL_Idx2' THEN 'FG1' ELSE 'FG2' END AS FG, (FILEPROPERTY(s.name,'spaceused') * 8)/1024. AS [FG_Size(MB)] FROM sys.master_files s WHERE database_id = DB_ID() AND type = 0 GO CREATE UNIQUE CLUSTERED INDEX unq_cl_idx_Row_Num ON dbo.TestData(RowNum) ON FG2; GO SELECT CONVERT(varchar(10),s.NAME) AS FlName, CASE WHEN s.name = 'CL_Idx2' THEN 'FG1' ELSE 'FG2' END AS FG, (FILEPROPERTY(s.name,'spaceused') * 8)/1024. AS [FG_Size(MB)] FROM sys.master_files s WHERE database_id = DB_ID() AND type = 0 GO
Results:
Columns just need to exist in the index to be used
The next myth is one of the primary causes of poor performance. Some believe that simply having a column in the index means the optimizer can use that index to seek rows. This believe is very far from the truth. In reality, the index has to be the leftmost column in the index key to be used to seek the row. This means the query predicate has to contain the leftmost index key; otherwise, you will get an index scan. Before I show you an example, here is a link that describes index basics, http://jahaines.blogspot.com/2009/07/real-sql-pages-beginners-guide-to.html. Let’s get to the example.
SET NOCOUNT ON; GO IF OBJECT_ID('tempdb.dbo.#Customers') IS NOT NULL BEGIN DROP TABLE #Customers; END GO CREATE TABLE #Customers( Id INT PRIMARY KEY CLUSTERED, FName VARCHAR(25), LName VARCHAR(25) ); GO INSERT INTO #Customers VALUES (1,'Adam','Haines'); INSERT INTO #Customers VALUES (2,'John','Smith'); GO CREATE NONCLUSTERED INDEX ncl_idx_FName ON [#Customers](FName,LName); GO SET STATISTICS PROFILE ON; GO SELECT FName,LName FROM [#Customers] WHERE LName = 'Haines' GO SET STATISTICS PROFILE OFF; GO DROP INDEX ncl_idx_FName ON [#Customers]; GO CREATE NONCLUSTERED INDEX ncl_idx_FName ON [#Customers](LName,FName); GO SET STATISTICS PROFILE ON; GO SELECT FName,LName FROM [#Customers] WHERE LName = 'Haines' GO SET STATISTICS PROFILE OFF; GO
Results:
Now by switching the order of the index columns we will get an index seek.
While loops are faster/better than cursors
The next myth is while loops are faster/better than cursors. This is not true in all scenarios, but it is occasionally true. The performance solely depends on what you are doing and how you are doing it. I wont post an example for this because there are too many variances that can occur and I would have to dedicate an entire post on the differences. I will say that you should avoid both of these options, if possible. These solutions are iterative/recursive in nature, which goes against how SQL Server operates. 90 percent of the time, you can replace iterative logic with set based logic and increase performance 10 fold. The take away here is while loops are not ALWAYS better….. however there are usually better SET based solutions out there.
Sp_executesql is always better than exec
The next myth has to deal with dynamic SQL and the best way to execute it. A lot of the time, developers will say that they are using sp_executesql, so the query is optimized. These believes show a lack of understanding of how sp_executesql really works and where it really benefits. The short answer is sp_executesql is not always better. Under some circumstances sp_executesql and exec will produce the same query plan and the query will not be able to benefit from query plan reuse, unless an exact binary match already exists in cache. This behavior occurs because the developer uses the parameters outside the scope of the dynamic SQL. Only inline parameters will be used in the parameterization process, using sp_executesql. Let’s have a look at this.
IF object_id('tempdb.dbo.#t') IS NOT NULL BEGIN DROP TABLE #t; END GO CREATE TABLE #t( id INT, col CHAR(1) ); GO INSERT INTO #t VALUES (1,'a'); INSERT INTO #t VALUES (2,'b'); INSERT INTO #t VALUES (3,'c'); GO DECLARE @sql NVARCHAR(MAX), @param VARCHAR(25) SET @param = 'b' SET @sql = N'SELECT Id,col FROM #t WHERE col=' + QUOTENAME(@param,'''') EXEC sp_executesql @sql GO DECLARE @sql NVARCHAR(MAX), @param VARCHAR(25) SET @param = 'c' SET @sql = 'SELECT Id,col FROM #t WHERE col=' + QUOTENAME(@param,'''') EXEC(@sql) GO DECLARE @sql NVARCHAR(MAX), @param VARCHAR(25) SET @param = 'd' SET @sql = N'SELECT Id,col FROM #t WHERE col=' + QUOTENAME(@param,'''') EXEC sp_executesql @sql GO DECLARE @sql NVARCHAR(MAX), @param VARCHAR(25) SET @param = 'e' SET @sql = N'SELECT Id,col FROM #t WHERE col= @dyn_param' EXEC sp_executesql @sql, N'@dyn_param CHAR(1)',@dyn_param=@param GO DECLARE @sql NVARCHAR(MAX), @param VARCHAR(25) SET @param = 'f' SET @sql = N'SELECT Id,col FROM #t WHERE col= @dyn_param' EXEC sp_executesql @sql, N'@dyn_param CHAR(1)',@dyn_param=@param GO SELECT text, [execution_count] FROM sys.dm_exec_query_stats qs CROSS APPLY sys.[dm_exec_sql_text](qs.[sql_handle]) WHERE text LIKE '%SELECT Id,col FROM #t%' AND text NOT LIKE '%cross apply%' ORDER BY [last_execution_time] DESC GO
Results:
As you can see, the only query that took advantage of query plan reuse is the inline parameterized query. When dealing with dynamic SQL you need to use sp_executesql and inline parameters to take full advantage of query plan reuse. Also, fully parameterizing dynamic SQL reduces the risk of injection attack.
Tables have a guaranteed sort order
The final myth I will be talking deals with tables and their sort order. By definition tables do not have an order. Tables are an unordered set of rows. Confusion occurs because clustered indexes are supposed to order the rows. While clustered indexes do physically sort the rows when they are built, there are no guarantees that clustered indexes will be used, thus you may not get clustered index order. There are many execution plan operators that can influence the order in which data is returned. Operators such as stream aggregates, partitions, parallelism and even the use of nonclustered indexes can change the order of returned results. The only guarantee that exists is that there is no guaranteed order, without the use of an ORDER BY clause. Lets see this in action.
IF object_id('tempdb.dbo.#t') IS NOT NULL BEGIN DROP TABLE #t; END GO CREATE TABLE #t( Id INT IDENTITY PRIMARY KEY CLUSTERED, Col CHAR(1) ); GO INSERT INTO #t VALUES ('a'); INSERT INTO #t VALUES ('b'); INSERT INTO #t VALUES ('c'); GO --Clustered Index scan yields clustered index order SELECT * FROM #t GO CREATE UNIQUE NONCLUSTERED INDEX ncl_idx_Id_Col ON [#t](Id DESC,Col); GO --NONClustered Index scan yields NONClustered index order SELECT * FROM [#t]
Results:
As you can see, there are a lot of influences to the order of the resultset. If you need the dataset returned in a specific order you MUST specify an ORDER BY clause; otherwise, you risk using code that will sometimes work.
Wrap-up
That’s it. I have gone through the myths that I have encountered over the years. If you have any more ideas, or any myths you want to debunk, post comments here. I hope that you all have learned something new and hopefully the spread of these myths will stop here.
Until next time, happy coding.
3 comments:
nice blog!
i found your blog through keyword "auto follow".
im following you.. pls follow me back.
Thanks a lot for talking about useful info. It is required to understand that mobile apps development could help in your industry by installing custom development software solutions. Also you may turn your attention on outsource enterprise content management software.
I'm glad to know more and read this great review. I'm happy to share with instant insurance rates that is provided by major insurance companies. You may save on cheap life insurance quotes which allow customers to get cheap affordable policy.
Post a Comment