In my last post http://jahaines.blogspot.com/2009/10/io-stats-what-are-you-missing.html, I talked about the performance problems associated with scalar user defined functions and how SSMS may report invalid IO statistics, for scalar UDFs. In this post, I will focusing on how to transform those pesky scalar UDFs into more scalable function. When developing user defined functions, you have to keep a few things in mind. Firstly, scalar UDFs are evaluated for each row returned by the query. Additionally, SQL Server is not able to maintain statistics and optimize any function, except an inline table valued function. Lastly, most code logic does not necessarily need to be encapsulated, in a function. You may get better performance if you choose to use a derived table instead of a function; however, the biggest problem with a derived table is it cant be encapsulated and reused across an application. An inline TVF is really useful if you need to encapsulate business logic and reuse it throughout an application. Let’s start by creating the sample table DDL.
USE [tempdb]
GO
IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
BEGIN
DROP TABLE dbo.[TestData];
END
GO
--Create Sample Table
CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
SomeId INT,
SomeDate DATETIME
);
GO
INSERT INTO dbo.TestData
SELECT TOP 1000
ROW_NUMBER() OVER (ORDER BY t1.NAME) AS RowNumber,
ABS(CHECKSUM(NEWID()))%250 AS SomeId,
DATEADD(DAY,ABS(CHECKSUM(NEWID()))%1000,'2008-01-01') AS SomeDate
FROM
Master.dbo.SysColumns t1
GO
Next I am going to create two functions. One function will be a scalar UDF and the other will be an inline table valued function. If you do not know what an inline TVF is, an inline TVF is like a parameterized view and is subject to the same restrictions as a view. For more information you can read the following post in BOL, http://msdn.microsoft.com/en-us/library/ms189294.aspx.
--Create Scalar Function
CREATE FUNCTION dbo.fn_SomeFunction(@SomeId INT)
RETURNS DATETIME
AS
BEGIN
DECLARE @dt DATETIME
SELECT @dt = MAX(SomeDate)
FROM dbo.TestData
WHERE SomeId = @SomeId
RETURN @dt
END
GO
--Create Inline Table Valued Function
CREATE FUNCTION dbo.fn_SomeInlineFunction()
RETURNS TABLE
RETURN(
SELECT SomeId, MAX(SomeDate) AS SomeDate
FROM dbo.TestData
GROUP BY SomeId
)
GO
All of our DDL is in place. All that is left is to test the performance. If you read my last post, you should be expecting the inline TVF to out perform the scalar UDF. Let’s see what actually transpires. Discard the query results to the grid by clicking Tools –> Options –> Query Results –> SQL Server –> Results To Grid –> Discard results after execution. Next open SQL Server profiler and use the standard template. Run the code below to capture the performance counters in profiler.
SELECT SomeId,dbo.fn_SomeFunction(SomeId)
FROM dbo.[TestData]
GO
SELECT TestData.SomeId,max_dt.SomeDate
FROM dbo.[TestData]
INNER JOIN dbo.fn_SomeInlineFunction() AS max_dt
ON max_dt.SomeId = TestData.SomeId
GO
The results of the queries should look similar to my results below. The things of note are the reads and the CPU required to satisfy the each query. The number of reads and CPU required to satisfy the scalar UDF is astronomically greater than the inline TVF.
If the above screenshot is not enough to discourage you from using scalar UDFs, I do not know what can. The point being that there are all kinds of great alternatives to encapsulating code logic, without the use of scalar functions. Inline TVFs offer a SET based approach for encapsulating business logic; plus the optimizer is able to use existing statistics and indexes to optimize inline TVFs. It is my recommendation that you should try to convert all scalar UDFs to inline TVFs. I know this is not always possible, but it is a good start. I typically try to stay away from scalar and multi-line UDFs, unless absolutely necessary. I hope that you have learned something new and that you can use this example to get the needed signoff to change those problematic scalar UDFs, into inline TVFs.
Until next time, happy coding.
2 comments:
Do you mean I can assume that the following are recommended usages :
1 indicate least recommended and 3 indicates most recommended UDFs
1. Scalar UDF
1. Multilne UDF
2. Table UDF
3. Inline Table UDF
3. Inline UDF
I have read your blog its very attractive and impressive. I like your blog MSBI online training
Post a Comment