Tuesday, April 13, 2010

T-SQL Tuesday #005 – Creating & Emailing HTML Reports

This post is my contribution to the popular TSQL Tuesday event, http://sqlvariant.com/wordpress/index.php/2010/04/t-sql-tuesday-005-reporting/.   The creator of this amazing event is Adam Machanic.   What I love most about this event is how it brings the SQL Server Community together. The “theme” for this TSQL Tuesday is reporting.  As you aware, reporting is a very broad topic.  I will be focusing on creating and emailing HTML reports.  Now this process is no substitute for a SSRS report or a cube report.  What I am about to show you is a very sleek way of presenting data to managers at a very high level.  You do not want to send an entire report as a HTML report, so this process should be limited to dashboards or reports that are small in nature.  If the user needs more detail , or is simply requesting too much data, you may want to add a detail link in the HTML body, as this gives the user the ability to drill through for more detail. 

Let’s get started by creating a sample table and a couple of views.  It should be noted that this process will primarily utilize views to expose data.  This code can be further expanded to filter for specific columns, but as it stands now…. this process returns all columns in the view or table.

USE [tempdb]
GO

SET NOCOUNT ON;
GO

IF OBJECT_ID('tempdb.dbo.Sales') IS NOT NULL
BEGIN
    DROP TABLE dbo.Sales;
END
GO

CREATE TABLE dbo.Sales(
SalesId INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
EmployeeId INT,
Amt NUMERIC(9,2),
LocationCd INT
);
GO

INSERT INTO dbo.Sales VALUES (1,12.50,1);
INSERT INTO dbo.Sales VALUES (1,99.99,4);
INSERT INTO dbo.Sales VALUES (2,45.64,1);
INSERT INTO dbo.Sales VALUES (3,44.65,2);
INSERT INTO dbo.Sales VALUES (3,52.89,4);
INSERT INTO dbo.Sales VALUES (4,250.54,3);
INSERT INTO dbo.Sales VALUES (5,150.00,5);
GO

IF OBJECT_ID('tempdb.dbo.vw_SalesVolumnByLocation') IS NOT NULL
BEGIN
    DROP VIEW dbo.vw_SalesVolumnByLocation;
END
GO

CREATE VIEW dbo.vw_SalesVolumnByLocation
AS
SELECT LocationCd, SUM(Amt) AS SalesVolume
FROM dbo.Sales
GROUP BY LocationCd
GO

CREATE VIEW dbo.vw_SalesBySalesCounselor
AS
SELECT [EmployeeId],[LocationCd],[Amt]
FROM dbo.Sales
GO

Next, the stored procedure.  First and foremost this code looks a lot worse than than it really is.  I had to use dynamic SQL because I did not want to have to create this stored procedure in every database.

The parameter list is pretty massive, but a lot of the parameters have default values, which means you do not have to specify anything.  The parameter are pretty self explanatory.

USE [master]
GO

CREATE PROCEDURE usp_Email_HTML_Rpt
    @DB VARCHAR(255) = NULL,
    @Object VARCHAR(255),
    @Schema VARCHAR(255),
    @Rec NVARCHAR(255),
    @CC NVARCHAR(255) = NULL,
    @rpt_Header VARCHAR(50),
    @rpt_Header_BGColor VARCHAR(10) = '#FFFFFF',
    @TblHdr_BGColor VARCHAR(10) = '#FFFFFF',
    @Condition1_Col VARCHAR(255) = NULL,
    @Condition1_Expression VARCHAR(500) = NULL,
    @Condition1_BGColor VARCHAR(10) = NULL,
    @Condition2_Col VARCHAR(255) = NULL,
    @Condition2_Expression VARCHAR(500) = NULL,
    @Condition2_BGColor VARCHAR(10) = NULL,
    @AltRowBGColor VARCHAR(10) = NULL,
    @Pred_Filter1_Col VARCHAR(255) = NULL,
    @Pred_Filter1_Expression VARCHAR(500) = NULL,
    @Pred_Filter2_Col VARCHAR(255) = NULL,
    @Pred_Filter2_Expression VARCHAR(500) = NULL,
    @OrderBy VARCHAR(500) = NULL
AS
BEGIN

SET NOCOUNT ON;

DECLARE @sql NVARCHAR(MAX),
        @StyleSheet VARCHAR(MAX),
        @RtnSQL NVARCHAR(MAX),
        @html_email NVARCHAR(MAX)
                
DECLARE @HTML TABLE(seq TINYINT, Tag VARCHAR(MAX));

--Create a new style sheet if none was passed in
IF @StyleSheet IS NULL
BEGIN
--Set the Procedure Stylesheet.  You can also supply this as a variable
SET @StyleSheet = 
'<head>
 <style type="text/css">
     th {width:150px;color:"#FFFFFF";font-weight:bold;background-color: ' + QUOTENAME(COALESCE(@TblHdr_BGColor,'#FFFFFF'),'"') +';border:1;border-width:thin; border-style:solid; align:center}
    td {width:150px;background-color: "#FFFFFF"; border: 1; border-style:solid;border-width:thin; text-align: "left"}
    td.Cond1Met {width:150px;background-color: ' + QUOTENAME(COALESCE(@Condition1_BGColor,'#FFFFFF'),'"') +'; border-style:solid;border-width:thin; text-align: "left"}
    td.Cond2Met {width:150px;background-color: ' + QUOTENAME(COALESCE(@Condition2_BGColor,'#FFFFFF'),'"') +'; border-style:solid;border-width:thin; text-align: "left"}
     td.AltRowColor {width:150px;background-color: ' + QUOTENAME(COALESCE(@AltRowBGColor,'#FFFFFF'),'"') +'; border: 1; border-style:solid;border-width:thin; text-align: "left"}
    td.LegendCond1Met {width:200px;background-color: ' + QUOTENAME(COALESCE(@Condition1_BGColor,'#FFFFFF'),'"') +'; border-style:solid;border-width:thin; text-align: "center"}
    td.LegendCond2Met {width:200px;background-color: ' + QUOTENAME(COALESCE(@Condition2_BGColor,'#FFFFFF'),'"') +'; border-style:solid;border-width:thin; text-align: "center"}
     th.LegendHdr {width:200px;color:"#FFFFFF"; font-weight:bold; background-color: ' + QUOTENAME(COALESCE(@rpt_Header_BGColor,'#FFFFFF'),'"') + ';border: 1;border-width:thin; border-style:solid;text-align: "center"}
    td.Legend {width:200px;background-color: "#FFFFFF"; border: 1; border-width:thin; border-style:solid; text-align: "center"} 
    th.LegendTitle {width:200px;color:black;background-color: "#C0C0C0"; border: 1; border-width:thin; border-style:solid; text-align: "center"} 
 </style>
<title>' + COALESCE(@rpt_Header,'Report Header') + '</title>
</head>
'
END 

--Build basic html structure
INSERT INTO @HTML (seq,Tag) 
VALUES (1,'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">' + CHAR(13) + '<html>' + COALESCE(@StyleSheet,'') + '<body>');

--If optional conditions exist, build a legend
IF @Condition1_Col IS NOT NULL OR @Condition2_Col IS NOT NULL
BEGIN

INSERT INTO @HTML (seq,Tag) 
SELECT 2, '<table border="1" align="LEFT">' UNION ALL
SELECT 3, '<tr><th class="LegendTitle"COLSPAN=3>Legend</th></tr>' UNION ALL
SELECT 4, '<tr><th class="LegendHdr">Variable</th><th class="LegendHdr">Condition Column</th><th class="LegendHdr">Condition Expression</th></tr>' UNION ALL
SELECT 5, '<tr><td class="Legend">@Condition1</td><td class="Legend">' + COALESCE(@Condition1_Col,'n/a') + '</td><td class="LegendCond1Met"> ' + COALESCE(@Condition1_Expression,'n/a') + '</td></tr>' UNION ALL
SELECT 6, '<tr><td class="Legend">@Condition2</td><td class="Legend">' + COALESCE(@Condition2_Col,'n/a') + '</td><td class="LegendCond2Met"> ' + COALESCE(@Condition2_Expression,'n/a') + '</td></tr>' UNION ALL
SELECT 7, '</table><br><br><br><br><br><br><br>' + '<h1>' + COALESCE(@rpt_Header,'Report Header') + '</h1>' + '<table border="1" align="left" width="25%">'

END
ELSE
BEGIN --No legend is needed, start building the table
    INSERT INTO @HTML (seq,Tag) 
    SELECT 8, '<br>' + '<h1>' + COALESCE(@rpt_Header,'Report Header') + '</h1>' + '<table border="1" align="left" width="25%">'
END

--Create Table Header
SET @sql = N'
SELECT 9,CAST(
    (
    SELECT CAST(''<th>'' + COALESCE(c.COLUMN_NAME,'''') + ''</th>'' AS XML)
    FROM ' + COALESCE(QUOTENAME(@DB) + '.','') + '[INFORMATION_SCHEMA].[COLUMNS] c
    WHERE c.[TABLE_NAME] = @dynObject AND c.[TABLE_SCHEMA] = @dynSchema
    FOR XML PATH(''''),ELEMENTS,ROOT(''tr''),TYPE
    ) AS VARCHAR(MAX))';
    
INSERT INTO @HTML (seq,Tag) 
EXEC sp_executesql @sql, N'@dynObject VARCHAR(255),@dynSchema VARCHAR(128)',@dynObject = @Object, @dynSchema=@Schema

    
--Create SQL Statement to return actual values
SET @sql = N'
SELECT 
    @dynRtnSQL = ''SELECT 10,''''<tr>''''+'' + STUFF(
    (
        SELECT 
            ''+  CASE '' +
                 COALESCE(''WHEN '' + QUOTENAME(@dynCondition1_Col) +  @dynCondition1_Expression
            + '' THEN  ''''<td class="Cond1Met">'''' + CAST('' + QUOTENAME(c.COLUMN_NAME) + '' AS VARCHAR(MAX))'','''')
            +     COALESCE('' WHEN '' + QUOTENAME(@dynCondition2_Col) +  @dynCondition2_Expression
            + '' THEN  ''''<td class="Cond2Met">'''' + CAST('' + QUOTENAME(c.COLUMN_NAME) + '' AS VARCHAR(MAX))'','''')
            + '' WHEN ''''1''''= CASE WHEN ROW_NUMBER() OVER(ORDER BY ' + COALESCE(@OrderBy,'(SELECT NULL)') + ') % 2 = 0 THEN 1 ELSE 0 END''
            + '' THEN  ''''<td class="AltRowColor">'''' + CAST('' + QUOTENAME(c.COLUMN_NAME) + '' AS VARCHAR(MAX))''
            + '' ELSE ''''<td>'''' + CAST('' + QUOTENAME(c.COLUMN_NAME) + '' AS VARCHAR(MAX))''
            + '' END''
            + '' + ''''</td>''''''
        FROM ' + COALESCE(QUOTENAME(@DB) + '.','') + '[INFORMATION_SCHEMA].[Columns] c
        WHERE c.[TABLE_NAME] = @dynObject AND c.[TABLE_SCHEMA] = @dynSchema
        FOR XML PATH(''''),TYPE
    ).value(''.'',''VARCHAR(MAX)'')
    ,1,1,'''') + ''+''''</tr>'''' FROM '  + COALESCE(QUOTENAME(@DB) + '.','') +  ''' + QUOTENAME(@dynSchema) + ''.'' + QUOTENAME(@dynObject) +
    ''WHERE 1=1 ' + COALESCE(' AND' + QUOTENAME(@Pred_Filter1_Col) + SPACE(1) + @Pred_Filter1_Expression,'') + ''
    + COALESCE(' AND' + QUOTENAME(@Pred_Filter2_Col) + SPACE(1) + @Pred_Filter2_Expression,'') 
    + COALESCE(' ORDER BY ' + @OrderBy,'') + ''''

--Create a variable to hold the newly created dynamic sql statement
--PRINT @sql
EXEC sp_executesql 
    @sql, 
    N'@dynCondition1_Col VARCHAR(255), @dynCondition1_Expression VARCHAR(500), @dynCondition2_Col VARCHAR(255), @dynCondition2_Expression VARCHAR(500), @dynSchema VARCHAR(255), @dynObject VARCHAR(255), @dynRtnSQL NVARCHAR(MAX) OUTPUT',
    @dynCondition1_Col = @Condition1_Col,
    @dynCondition1_Expression = @Condition1_Expression,
    @dynCondition2_Col = @Condition2_Col,
    @dynCondition2_Expression = @Condition2_Expression,
    @dynSchema = @Schema,
    @dynObject = @Object,
    @dynRtnSQL = @RtnSQL OUTPUT

--PRINT @RtnSQL

--Execute the newly created dynamic TSQL statment.
INSERT INTO @HTML (seq,Tag)
EXEC sp_executesql @RtnSQL

--Close all report HTML tags
INSERT INTO @HTML (seq,Tag)
SELECT 11, '</table></body></html>'

--SELECT Tag FROM @HTML ORDER BY seq -- return HTML in the correct order

SELECT @HTML_Email = COALESCE(@HTML_Email,'') + Tag FROM @HTML ORDER BY seq -- return HTML in the correct order

--PRINT @HTML_Email
EXEC msdb.dbo.sp_send_dbmail
    @recipients = @rec,
    @copy_recipients = @CC,
    @subject = @rpt_Header,
    @body = @HTML_Email,
    @body_format = 'HTML',
    @importance = 'Normal'    

END 
GO

Now let’s see this stored procedure in action.  The code is very flexible and gives you a variety of methods to slice and dice data.  I have provided two conditional filters that will highlight data that meets the criteria to a specified color.  I have also include sort and filtering parameters to help reduce the amount of data being returned.  As I stated before, not all of the parameters are required.  One of my favorite parameters is @AltRowBGColor.  @AltRowBGColor accepts an HTML color that will alternate the row color of the HTML table.

Execute the following code: (AltRowBGColor is commented out for this demo)

EXECUTE [dbo].[usp_Email_HTML_Rpt] 
   @DB ='tempdb'
  ,@Rec = 'ahaines@stei.com' –Change to your email address
  ,@Object = 'vw_SalesVolumnByLocation'
  ,@Schema = 'dbo'
  ,@rpt_Header = 'Sales Volumn By Location'
  ,@rpt_Header_BGColor = '#87AFC7'
  ,@TblHdr_BGColor = '#87AFC7'
  ,@Condition1_Col = 'SalesVolume'
  ,@Condition1_Expression = '<100'
  ,@Condition1_BGColor = '#E55451'
  ,@Condition2_Col = 'SalesVolume'
  ,@Condition2_Expression = '>200'
  ,@Condition2_BGColor = '#00FF00'
  --,@AltRowBGColor = '#A0CFEC'
  ,@OrderBy = '[SalesVolume] DESC'

You will get an email similar to the one below.  Note that you have to have database mail enabled for this code to work.  You will note that because a conditional filter was supplied a legend was generated.  The legend contains the details of the supplied parameters.  In the case below, Locations with a sales volume < 100 is considered sub par, hence the red color, and Locations with a sales volume > 200 is green.  As you can see this is a great way to visually see your data.  I use these types of reports in my environment to monitor backups, jobs, and their corresponding metrics. 

image

Now, I will execute the stored procedure with lesser parameters and use the @AltRowBGColor variable.  You will note that no legend is generated because no conditional formatting was supplied.

EXECUTE [master].[dbo].[usp_Email_HTML_Rpt] 
   @DB ='tempdb'
  ,@Rec = 'ahaines@stei.com'
  ,@Object = 'vw_SalesBySalesCounselor'
  ,@Schema = 'dbo'
  ,@rpt_Header = 'Sales Volume By Sales Counselor'
  ,@rpt_Header_BGColor = '#87AFC7'
  ,@TblHdr_BGColor = '#87AFC7'
  ,@AltRowBGColor = '#A0CFEC'
  ,@OrderBy = '[Amt] DESC'

image

This type of reporting is very good for quick and dirty analysis, like dash boarding.  It is also very easy to implement and gives developers/DBAs quick turnaround for reporting.  The alternative would be to open BIDs (or another reporting tool) and generate a report which takes a lot more time that executing a stored procedure.  If you need automation, you can schedule this procedure to execute via a SQL job.  There are a lot of modifications that this stored procedure can undergo.   This stored procedure is by no means perfect, but it does get the job done.  I am planning on enhancing a lot of the features provide here, but for the time being I am satisfied. I personally believe that variable checks need to be put into place and a show/hide legend bit should be introduced.  Someone more versed in HTML might find it better to import a style sheet.  When I get a little more time, I will formally update this post with more complete code.  The idea here was to present a concept and show you the power of TSQL and database mail.

I hope that you find this stored procedure useful and I invite you to modify the code to work for your environment.  If you have ideas on how to optimize the code or make a cool add-on, please keep me informed, so I can update this post.

Until next time happy coding.

Wednesday, March 24, 2010

SQL Server Management Studio Tips And Tricks

This week I decided I wanted to take a step back from my performance tuning series and present material that DBAs and developers may not know.  I will be focusing on SQL Server Management Studio (SSMS) tips and tricks. 

Keyboard Shortcuts:

SQL Server Management Studio allows keyboard shortcuts that can help you save time and increase efficiency.  Keyboard shortcuts allow you to execute TSQL commands at the push of a button. The best part about keyboard shortcuts is the shortcut can be used to pass highlighted text as parameters.  I will start with some of the built-in keyboard shortcuts.

To open the keyboard shortcuts menu, click Tools –> Options –> Expand Environment –> Click Keyboard.  Now you will note that keyboard shortcuts are loaded by default.  Some of the defaults are sp_help (Alt+F1), sp_who (Ctrl+2) etc… You can implement just about any code you want in the keyboard shortcut.  For example, you can put the query “select * from sys.dm_exec_requests” directly into the text box.  The beauty of this is you can open a new query window and just hit the shortcut keys to execute the query.  This saves time and makes life a bit easier.  Below is a screenshot of my shortcut list, including the query I posted above.

image

Well that is all good and nice but what else can shortcuts do.  The best part about shortcuts in my opinion is they can actually be used to execute stored procedures and supply highlighted values as parameters.

Lets run sp_help but we will never type the text sp_help.  Open a new query window and create a new table.

CREATE TABLE dbo.MyTable(ID INT);

Now type MyTable in the query window and highlight the text.  Once the text is highlighted, hold Alt and press F1.  You should see the results of sp_help displayed for the highlighted object, “MyTable.”

image

Note: If you need to specify an object in a different schema, you have to use the two part name “schema.Object” and put it in single quotes.  For example, ‘dbo.MyTable’.

Now that is pretty cool right?  What I will show you next is even cooler!  I will execute a user defined stored procedure with parameters  Open the keyboard shortcut menu and assign the below stored procedure to a shortcut key. 

image

Open a new query window and create the below stored procedure.

CREATE PROCEDURE dbo.KeyBoardShortcut(@Id INT,@Col CHAR(1))
AS
BEGIN
    SELECT @Id,@Col
END
GO

Now type the text below and highlight it.  Once highlighted, hit Ctrl+F1. (Note: If you assigned the procedure to a different shortcut, you will need to use that shortcut)

1,'a'

image

As you can see, the stored procedure was executed with our highlighted parameters!  This is a awesome feature that can save a lot of time, especially when you have custom code that you need to access frequently.  This increases efficiency and puts more information at your fingertips.  One of my favorite shortcut keys is sp_helptext.  You can use create a shortcut for sp_helptext and highlight any procedure, function, view, trigger, or stored procedure and get the create script of that object.

Listing All Table Columns

This is a request that I see very often.  The request usually revolves around developers who have a lot of columns in their table and have to select from most of them.  As you can imagine, typing each column name can be a very tedious process. There are two methods that I use to accomplish this task, when I am feeling a little lazy…… BUT the key is to work smarter not harder right :^) !

The first method is to right-click the table click script table as –> SELECT To –> New Query Window.  Voila we now have a select query that lists all the columns in the table.  You can perform the same steps to get a list of all columns in a view.

The next method is to expand the table –> Drag the columns folder into the query window.  This will generate a column delimited list of the column names.

Scripting Multiple Objects

I am sure many of you have tried to script multiple objects from Management Studio, but have had little luck.  Now, we could choose to right-click the database –> tasks –> generate scripts, but where is the fun it that?  This is an easier way that does not make you jump through hoops or follow a wizard.  The trick is to click on the folder containing the objects you want to script and then look at the Object Explorer Details pane.  Inside the Object Explorer Details pane you can Ctrl or Shift click multiple objects.  Once selected, you can right-click –> Script <Object Type> AS –> Create –> To New Window.

image

Creating a TSQL Toolkit

One of the most underrated or unknown features of SQL Server Management Studio is Template Explorer.  You may be thinking that template explorer is used specifically for TSQL templates, but I will show you a couple of ways to make Template Explorer function as a TSQL Toolkit.  Click CTRL+ALT+T or go to view –> Template Explorer to open the window.

image

Now this does not look that impressive from the get, but trust me it can and does get better. Right-click SQL Server Templates and choose New –> then click folder.  Give the folder a name. I will be using the name Test_Scripts.  Next create a new template in the Test_Scripts folder.  I named the new template sys.dm_exec_requests.  Your template explorer should now look like this.

image

Okay, well that is great…… How does this help me?  Well lets add some TSQL code to that new template.  Once we add code to the template, We can then drag the template into any query window and SSMS will automatically post the code into the open query window.  Basically, we can use Template Explorer as our personal Toolkit.  We can create any folder structure we want and SSMS does a great job of keeping all our scripts organized.  Let’s see this in action.

Right-click the sys.dm_exec_requests template and choose edit.  Paste the code below and click save.

SELECT * FROM sys.dm_exec_requests

Open a new query window and drag the sys.dm_exec_requests  template into the query window. Voila!!!!!!! the TSQL is automatically scripted into our new query window.  As you can see, we can use the template explorer to save our scripts and make them easily accessible.  This alleviates our need to open windows explorer or browse the file system for our scripts.  Plus al the scripts are saved and managed in one place.  If you need to copy your scripts out of template explorer, the actually already exist on the file system.  The directory will be within your documents.  On my SQL Server 2008 instance, my templates are stored here, C:\Documents and Settings\ahaines\Application Data\Microsoft\Microsoft SQL Server\100\Tools\Shell\Templates\Sql.  I don’t know about you, but I for one love this feature.  This gives me the ability to quickly get data, without having to leave SSMS, plus I do not have to waste time searching the file system.  Give it a try and see what you think.

That is all the tips I have for now.  I hope that you have learned something new or any of these tips can help you.  Until next time, happy coding.

Wednesday, March 17, 2010

Performance Tuning 101 – What You Will Not Learn In The Class Room (Part 2)

In my last post, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html I talked about performance tuning queries that appear to be well tuned. There are a lot of optimization techniques available unbeknownst to most developers that do not require indexes or radical code changes. These are the optimizations that I will be talking about in this post. There is absolutely no way I could go over every possible optimization technique available, but I will do my best to present as much content here today, and will make future posts on other techniques.

I will start things off by talking about a challenge that Ramesh Meyyappan presented in his webcast, http://www.sqlworkshops.com/. Ramesh’s challenge was to solve the TOP 101 phenomenon, using SQL Server 2005. To start things off, I will create a sample table, with data.

USE [tempdb]
GO

SET NOCOUNT ON;
GO

IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
BEGIN
    DROP TABLE dbo.[TestData];
END
GO

CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
SomeId INT,
SomeCode CHAR(2000)
);
GO

;WITH
   L0 AS (SELECT 1 AS C UNION ALL SELECT 1)       --2 rows
  ,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B)    --4 rows (2x2)
  ,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B)    --16 rows (4x4)
  ,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B)    --256 rows (16x16)
  ,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B)    --65536 rows (256x256)
  ,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B)    --4,294,967,296 rows (65536x65536)
  ,Number AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5)
INSERT INTO dbo.TestData
SELECT
    N AS RowNumber,
    ABS(CHECKSUM(NEWID()))%1000000+1 AS SomeId ,
    REPLICATE('a',2000) AS SomeCode
FROM Number
WHERE [N] <= 50000
GO

UPDATE STATISTICS dbo.[TestData] WITH FULLSCAN;
GO

Next, I will create a query that uses TOP and an order by to return 100 rows.

--Fast
SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION (MAXDOP 1)
/*
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 1 ms.

 SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 102 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Now watch what happens when I change the TOP operator to 101. You will notice that I did not change anything else in the query other than increasing the number of rows returned by 1.

--Slow
SELECT TOP 101 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 312 ms,  elapsed time = 1690 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Wow…. TOP 101 is over 17 times slower than TOP 100 and all I changed is the number of rows in the TOP operator!!!!! So why does the optimizer take so much longer to optimize and execute a query using TOP 101, oppose to TOP 100? The short answer is the memory requirements. The TOP 101 queries requires a lot more query memory than TOP 100, which translates into tempdb sorting. As you may recall, I addressed some techniques to solve the tempdb sorting problem in my last post, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html. If you are using SQL 2008, you can use the same optimization techniques presented in my prior post, but SQL 2005 is a completely different animal. To make the TOP 101 query faster, we need to first understand why it is slower. Let’s take a look at what is different when we run the TOP 100 and the TOP 101 query.

Lets start by looking at the memory SQL Server grants to each query. Open two different query windows and execute each top query within a while loop. We can then use sys.dm_os_memory_grants to get the required memory.

Here is a sample of how to run the TOP query in a while loop.

WHILE 1=1
SELECT TOP 100 [RowNum],[SomeId],[SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

In a new query window, run the following query to get the memory specifications.

SELECT  [granted_memory_kb],[required_memory_kb],[max_used_memory_kb] FROM sys.dm_exec_query_memory_grants WHERE [session_id] = 58

Here are my results:

     granted_memory_kb    required_memory_kb   max_used_memory_kb
---- -------------------- -------------------- --------------------
FAST 1024                 216                  216

     granted_memory_kb    required_memory_kb   max_used_memory_kb
---- -------------------- -------------------- --------------------
SLOW 6040                 512                  6040

The results are simply astonishing. The memory requirements increase nearly 28 times when I use TOP 101, instead of TOP 100. I do not have a formal explanation of why the TOP 101 operator, consumes more memory than TOP 100. Brad Schulz, http://bradsruminations.blogspot.com/, has contacted Conor Cunningham about this issue and believes that 101 is an arbitrary threshold. Brad is working on an in-depth post involving the TOP operator. Keep an eye out for this one, as it should be really good. Anyway, once the 101 threshold is breached the optimizer uses different calculations to optimize a query, which can effectively bloat the memory requirements for the query. This memory bloat forces the sort operation to spill into tempdb. This is where the TOP 101 bottleneck exists. To verify this problem, open profiler and choose the sort warnings counter and you will see that the slow query has a sort warning error, while the fast query does not.

Now that I have identified the problem, how do I solve it? I will start by attempting the methods that I used in the previous article, http://jahaines.blogspot.com/2010/03/performance-tuning-101-what-you-will.html.

SELECT TOP 101 [RowNum],[SomeId],CAST([SomeCode] AS VARCHAR(4200))
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 3385 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Bloating the estimated row size still did not help our situation. Next I will try shrinking the row size.

SELECT TOP 101 [RowNum],[SomeId],RTRIM(CAST([SomeCode] AS VARCHAR(2000)))
FROM dbo.[TestData]
WHERE [RowNum] < 30000
ORDER BY [SomeId]
OPTION(MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 344 ms,  elapsed time = 2461 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Hmm. Still no luck….. How can I reduce the row size of the input passed into the sort operator? When you really sit back and think about the problem, the answer is really simple. To reduce the row size, all you have to do is reduce the columns involved in the sort. I like to use the TOP inside a derived table, making sure to only use the RowNum and SomeId columns. We can then join back onto the TestData table. This gives us a fast sort and a ultra fast index seek on the 101 rows we are returning.

SELECT t.[RowNum],t.[SomeId], t.[SomeCode]
FROM dbo.[TestData] t
INNER JOIN(
    SELECT TOP 101 [RowNum],[SomeId]
    FROM dbo.[TestData]
    WHERE [RowNum] < 30000
    ORDER BY [SomeId]
) AS t2
    ON T.RowNum = t2.RowNum
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

/*
 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 104 ms.
SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 0 ms.
*/

Alternatively, we can use correlated subqueries or the cross apply operator.

SELECT TOP 101
    t.[RowNum],
    (SELECT t2.[SomeId] FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS SomeId,
    (SELECT t2.[SomeCode] FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS SomeCode
FROM dbo.[TestData] t
WHERE t.[RowNum] < 30000
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

SELECT TOP 101
    t.[RowNum],
    t2.SomeId,
    t2.SomeCode
FROM dbo.[TestData] T
CROSS APPLY(SELECT t2.SomeId, t2.SomeCode FROM dbo.[TestData] t2 WHERE t2.[RowNum] = t.[RowNum]) AS t2
WHERE t.[RowNum] < 30000
ORDER BY t.[SomeId]
OPTION (MAXDOP 1)

It should be noted that the correlated subquery method will produce more IO because it uses two subqueries. As you can see, the solution to this challenge is quite simple, but the solution requires an understanding of what is occurring underneath the hood of SQL Server.

Note: It is still possible that some of the sorting will be sent to tempdb, but you should see a elapsed time that rivals TOP 100.

The next optimization technique, I will be demonstrating is a predicate pushing problem. Unbeknownst to most developers, SQL Server 2005 does have a problem with predicate pushing in views. A lot of these issues have been resolved in SQL Server 2008, but should be known. I will be demonstrating a very simple example, using a ranking function. Ranking functions are relatively new to SQL Server and were introduced in 2005. I am sure there are other scenarios that cause predicate pushing problems, but I will only be addressing the ranking problem, in this post.

Let’s start by creating a small sample table.

USE [tempdb]
GO

CREATE TABLE Test(
ID INT IDENTITY(1,1) PRIMARY KEY,
FName VARCHAR(50),
LName VARCHAR(50)
);

INSERT INTO dbo.Test VALUES ('Adam','Haines');
INSERT INTO dbo.Test VALUES ('John','Smith');
INSERT INTO dbo.Test VALUES ('Jane','Doe');
GO

CREATE NONCLUSTERED INDEX ncl_idx_LName ON dbo.Test(LName) INCLUDE(FName);
GO

As you can see, the table is relatively simple. The idea is to present an easy to understand example that demonstrates potential performance problems with views.

Here is my simple query that shows an index seek on LName.

SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
FROM dbo.Test
WHERE LName = 'Smith'
GO

image

Let’s see what happens when I put the logic into a view, with no predicate. The predicate will be called from outside the view and should be pushed down into the view, as views are materialized into the underlying objects at runtime.

CREATE VIEW dbo.vw_Test
AS
SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
FROM dbo.Test
GO

I will now query the view using the same predicate as the original query.

SELECT Id,FName,LName,seq
FROM dbo.vw_Test
WHERE LName = 'Smith'
GO

image

The problem here is the optimizer decided to filter the results of the query AFTER the table “Test” has been scanned. One should expect the optimizer to seek on the LName column because the optimizer should push the predicate; however, SQL Server 2005 does not do a great job of this. SQL Server 2008 will appropriately push the predicate deep into the plan to get the index seek. How do we solve this problem? Unfortunately, there is not a whole lot you can do to make the plan work more efficiently. The best option in my opinion is to a INLINE TVF to parameterize the query.

CREATE FUNCTION dbo.fn_Test(@LName VARCHAR(50))
RETURNS TABLE
RETURN(
    SELECT Id,FName,LName,ROW_NUMBER() OVER(PARTITION BY LName ORDER BY Id) AS seq
    FROM dbo.Test
    WHERE LName = @LName
)
GO

Now execute a select against the TVF using the same predicate.

SELECT Id,FName,LName,seq
FROM dbo.fn_Test('Smith')

image

There you have it. I have demonstrated a few optimization techniques that I have used to solve performance problems. I have only scratched the surface here. There are many more optimization techniques available. Stay tuned for future posts, where I will explore even more optimization techniques including a shocking example demonstrating how an index rebuild can introduce fragmentation and how to avoid it.

Until next time, happy coding.

Sunday, March 7, 2010

Performance Tuning 101 – What You Will Not Learn In The Classroom

I was reading an article last week and saw a snippet of code that claimed to make fully optimized code, more efficient.  The article itself was not written by the original content creator.  With a little investigative work, I came across the website and author who first developed the presented performance tuning techniques.  The website that hosted the content is http://www.sqlworkshops.com/ and the original author is Ramesh Meyyappan, rmeyyappan@sqlworkshops.com.  Ramesh is SQL Server consultant and professional trainer, who has worked previously with Microsoft and has real world knowledge of performance tuning techniques.  On the SQLWorkshops website, Ramesh offers 3 free webcasts that demonstrate how to performance tune queries.  He focuses on query memory, parallelism, MAXDOP (MAX Degree of parallelism), and wait stats.  I recommend that all database professionals view these webcasts.  There is a lot of great content presented in these webcasts that you cannot find in a classroom.  In this post, I will take a few of the methods provided and explain them.  I will also  provide additional methods that were not covered in the webcast.  On a side note, if you have not figured out the challenge Ramesh presented, I will show you a few of the solutions I came up with, in my next installment. 

The first optimization technique I will explore deals with data type mismanagement.  The two data types I will be evaluating in this post are VARCHAR and CHAR.  There are a lot of do’s and don’ts regarding these two data types, on the Internet.  I will not engage the pros and cons in this post.  The main point is a performance penalty can occur, if the wrong data type is chosen.  You should always choose a data type that most closely resembles the size and structure of your data.  Experienced database professionals know to use data types that closely resemble the data  being stored; however, inexperienced database professionals fail to see the long term impact of poor design choices.  Unfortunately, it is quite common for database professionals to use a catch all data type like VARCHAR(8000) or CHAR(2000), without fully understanding the data itself.  In most cases, choosing the wrong data type results in wasted storage and slower query performance, as I will show in this post.  I will get this party started by creating my sample table and data.

USE [tempdb]
GO

SET NOCOUNT ON
GO

IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
BEGIN
    DROP TABLE dbo.[TestData];
END
GO

CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
SomeId INT,
SomeCode CHAR(2000)
);
GO

;WITH 
   L0 AS (SELECT 1 AS C UNION ALL SELECT 1)       --2 rows
  ,L1 AS (SELECT 1 AS C FROM L0 AS A, L0 AS B)    --4 rows (2x2)
  ,L2 AS (SELECT 1 AS C FROM L1 AS A, L1 AS B)    --16 rows (4x4)
  ,L3 AS (SELECT 1 AS C FROM L2 AS A, L2 AS B)    --256 rows (16x16)
  ,L4 AS (SELECT 1 AS C FROM L3 AS A, L3 AS B)    --65536 rows (256x256)
  ,L5 AS (SELECT 1 AS C FROM L4 AS A, L4 AS B)    --4,294,967,296 rows (65536x65536)
  ,Number AS (SELECT row_number() OVER (ORDER BY (SELECT 0)) AS N FROM L5) 
INSERT INTO dbo.TestData
SELECT
    N AS RowNumber,
    ABS(CHECKSUM(NEWID()))%1000000+1 AS SomeId ,
    REPLICATE('a',ABS(CHECKSUM(NEWID())) % 2000 + 1) AS SomeCode
FROM Number
WHERE [N] <= 50000
GO

UPDATE STATISTICS dbo.[TestData] WITH FULLSCAN;
GO

The first thing I want to illustrate is the average size of our SomeCode column.  We are using a CHAR(2000) data type; however, we are storing much less data than this on average. 

SELECT AVG(LEN(SomeCode)) AS AvgLen FROM dbo.TestData WHERE [RowNum] < 35000 
/*
AvgLen
-----------
997
*/

As you can see from the query above the average SomeCode length is 997 bytes; however, we are storing 2000 bytes per row.    This means we are consuming 50% more storage, per row.  Using a char data type is typically faster than using a VARCHAR data type; however, the cost of storing CHAR data types usually offset the performance gain.  If I were to choose a VARCHAR(2000) data type, I would consume less storage, which translates into faster table scans, smaller tables, faster backups, faster restores etc….  Performance will increase because less storage, is directly correlated to less pages.   The bottom line is less is faster.

The query I will be demonstrating returns a range of rows, where the sort order is different than the primary clustered key sort.  The IO stats will be the same across all variants of this query, so the counter will not illustrate a proper delta.  I will be using elapsed time to illustrate the delta between queries, with varying predicates. 

Let’s turn on STATISTICS TIME and NOCOUNT.

SET NOCOUNT ON;
GO
SET STATISTICS TIME ON;
GO

Now run the following query.  Make sure to make query results go to text. (Note: I assign columns to local variables to avoid results being printed).  Run the query a couple of times to warm the cache and get a baseline CPU and elapsed time.

DECLARE @RowNum INT,@SomeInt INT, @SomeCode CHAR(2000)
SELECT @RowNum = [RowNum], @SomeInt = [SomeId], @SomeCode = [SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 3000 --3000 is in memory and 3500 sort in tempdb
ORDER BY SomeId

image

The query is very fast and performs within our SLA of 2 seconds.  Let’s see what happens when I increase the number of rows in the predicate to 3500.

DECLARE @RowNum INT,@SomeInt INT, @SomeCode CHAR(2000)
SELECT @RowNum = [RowNum], @SomeInt = [SomeId], @SomeCode = [SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 3500 --3000 is in memory and 3500 sort in tempdb
ORDER BY SomeId

image

Wow… by returning 5000 more rows, the query is over 14 times slower!  You may be thinking that the speed difference is neglible, but on production systems the elapsed time delta can be quite extensive, at which point performance is a real problem.  Let’s check the execution plan for each query side by side.

image

Well that is not much help….. both of our query plans look good.    There is really not much you can do to make the query perform better.  The index is doing a great job at returning the data., but that pesky sort is degrading performance.  The first thing I need to investigate is the detail of the query plan.The first query plan attribute I will look at is the clustered index seek estimated row size.  The estimated row size is derived from statistics and some internal calculations.   

image

The row size is estimated to be 2015 bytes for each row in the clustered index seek.  The row size is derived from native data types.  For example, we are using a CHAR(2000) and two integer columns.  This makes our row size 2000 (1 byte per char)+ 4 (int) + 4 (int) = 2008.  As you can see, 2008 is less than 2015.  The additional 7 bytes are associated to row overhead and the NULL bitmap.  The sort operator is also estimated a row size of 2015.  All of the estimates play a part in dictating how much memory will be granted to the query.  Next I am going to put the query into a while loop, so I can see the memory grants.  Open two new query windows and paste the below code.  Make sure to change the RowNum to 3000, in one of the query windows.  Take note of the session id of each of the windows.

WHILE 1 = 1
BEGIN
DECLARE @RowNum INT,@SomeInt INT, @SomeCode CHAR(2000)
SELECT @RowNum = [RowNum], @SomeInt = [SomeId], @SomeCode = [SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 3500 --SWITCH OUT TO 3500 TO 3000
ORDER BY SomeId
END

In a completely new window, paste the below DMV query.  This query will return the memory grants for each of the sessions.  The key columns from this query are Max_Used_Memory and Granted_Memory.  If the Max and Granted memory are the same, it is likely that the query is consuming all the RAM granted to the query and the optimizer has to rely on tempdb to help with some operations.

SELECT [granted_memory_kb],[required_memory_kb],[max_used_memory_kb] FROM sys.dm_exec_query_memory_grants WHERE [session_id] = 56
SELECT [granted_memory_kb],[required_memory_kb],[max_used_memory_kb] FROM sys.dm_exec_query_memory_grants WHERE [session_id] = 60

All the execution plan details look fine, so I will dig a bit deeper into tempdb to unearth the real problem.  Let’s have a look at the tempdb IO stats to see if either query is using tempdb to perform any operations.

I will start by executing the “fast” query.

SELECT * FROM sys.[dm_io_virtual_file_stats](2,1)

--FAST
DECLARE @RowNum INT,@SomeInt INT, @SomeCode CHAR(2000)
SELECT @RowNum = [RowNum], @SomeInt = [SomeId], @SomeCode = [SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 3000 --3000 is in memory and 3500 sort in tempdb
ORDER BY SomeId

SELECT * FROM sys.[dm_io_virtual_file_stats](2,1)
GO

image

The important columns to look at are num_of_reads, num_of_bytes_read, num_of_writes, num_of_bytes_written.  The number of reads and writes before and after the query are exactly the same.  This means the optimizer did not have to use tempdb operations.  Next, I will execute the slow query, which uses a predicate of less than 3500.

SELECT * FROM sys.[dm_io_virtual_file_stats](2,1)

--FAST
DECLARE @RowNum INT,@SomeInt INT, @SomeCode CHAR(2000)
SELECT @RowNum = [RowNum], @SomeInt = [SomeId], @SomeCode = [SomeCode]
FROM dbo.[TestData]
WHERE [RowNum] < 3500 --3000 is in memory and 3500 sort in tempdb
ORDER BY SomeId

SELECT * FROM sys.[dm_io_virtual_file_stats](2,1)
GO

image

By George…… I think we have something!!!  Why does the second query have more reads and writes into tempdb?  The answer is the sort operation is being done in memory for the “fast” query and the sort is being done in tempdb for the “slow” query.  Another method you can use to identify tempdb sort operations is Sort Warninigs.  When tempdb is used to perform sort operations the sort warnings counter will display it.  Below is a screenshot of the counter and the description.

image

Now that I have identified the problem, how do I solve it?  There are a couple of ways to solve this problem.  One solutions is to increase the estimated row size, so the optimizer grants more memory to the query, which in turn allows the sort to fit into memory.  The second option is to decrease the row size, so the sort fits into memory.   Let’s see this in action.

DECLARE @RowNum INT,@SomeInt INT, @SomeCode VARCHAR(2000)
SELECT @RowNum = [RowNum], @SomeInt = [SomeId], @SomeCode = CAST([SomeCode] AS VARCHAR(2000))
FROM dbo.[TestData]
WHERE [RowNum] < 3500 --3000 is in memory and 3500 sort in tempdb
ORDER BY SomeId

This method uses cast to convert the CHAR(2000) column to a VARCHAR(2000).  Casting the column to a VARCHAR(2000)   helps to reduce the estimated row size.  The estimated row size of a VARCHAR column is calculated as 50% of the total size.  The estimated row size is 50% of the VARCHAR column because 50% is a safe estimate because it really does not know how full the row is.  Generally speaking, VARCHAR columns never use 100% of a variable column anyway, so this helps SQL Server save resources. In our case, a VARCHAR(2000) is estimated to be approximately 1000 bytes.  Let’s have a look at the execution plan.

image

Unfortunately the conversion itself is not enough to optimize this query.  Let’s try to trim the CHAR column prior to the cast, so the row size is decreased.  The interesting thing here is the optimizer still reports the same estimated row size; however, the sort now occurs in memory.  This will help the sort fit into memory because the estimated row size is significantly less than 1019 for each row, so the optimizer is unintentionally giving extra memory to process this query.   Let’s see this in action.

DECLARE @RowNum INT,@SomeInt INT, @SomeCode VARCHAR(2000)
SELECT @RowNum = [RowNum], @SomeInt = [SomeId], @SomeCode = RTRIM(CAST([SomeCode] AS VARCHAR(2000)))
FROM dbo.[TestData]
WHERE [RowNum] < 3500 --3000 is in memory and 3500 sort in tempdb
ORDER BY SomeId

image

That’s  more like it.  If you run the query, with sys.dm_io_virtual_stats, you will see the query no longer uses tempdb to sort the rows.  This method works works well for queries where the predicate is < 3500.  If the returned range is significantly larger than 3500 rows, it is likely that the sort would spill into tempdb.  The idea here is to cater to every day usage patterns and let the exceptions take longer.  It should also be noted that if the CHAR(2000) column is nearly full for every row, the RTRIM method will not have the same impact.  When the RTRIM method is not cutting the mustard, it is time to go with another option…. such as bloating the estimated row size.

This is the solution presented in Ramesh’s webcast.  To do this we will use the same cast conversion, to introduce a compute scalar into the query plan, but this time we will make the value larger.  When choosing a larger number it is important to use a number that is not too large, as this could have an adverse effect on query performance, or other queries running on the server.  More information can be obtained via the webcast.

DECLARE @RowNum INT,@SomeInt INT, @SomeCode VARCHAR(4200)
SELECT @RowNum = [RowNum], @SomeInt = [SomeId], @SomeCode = CAST([SomeCode] AS VARCHAR(4200))
FROM dbo.[TestData]
WHERE [RowNum] < 3500 --3000 is in memory and 3500 sort in tempdb
ORDER BY SomeId
image 

Note: I am using SQL Server 2008 to perform my tests, but I have  tested both of these solutions in SQL Server 2005 and obtained similar results.

There you have it. I have done an introduction to performance tuning queries that seemingly have an ideal execution plan.  A lot of the concepts presented here were taken from http://sqlworkshops.com .  SQLWorkshops.com provides 3 webcasts that present a wealth of performance tuning tips that will benefit all levels of database professionals.  Stay tuned for my next post!!! I will be dissecting more query optimization techniques including two solutions to the TOP 101 challenge.

Until next time, happy coding.