Friday, October 23, 2009

Converting A Scalar User Defined Function To A Inline Table Valued Function

In my last post http://jahaines.blogspot.com/2009/10/io-stats-what-are-you-missing.html, I talked about the performance problems associated with scalar user defined functions and how SSMS may report invalid IO statistics, for scalar UDFs.  In this post, I will focusing on how to transform those pesky scalar UDFs into more scalable function.  When developing user defined functions, you have to keep a few things in mind.  Firstly, scalar UDFs are evaluated for each row returned by the query.  Additionally, SQL Server is not able to maintain statistics and optimize any function, except an inline table valued function.  Lastly, most code logic does not necessarily need to be encapsulated, in a function.  You may get better performance if you choose to use a derived table instead of a function; however, the biggest problem with a derived table is it cant be encapsulated and reused across an application.  An inline TVF is really useful if you need to encapsulate business logic and reuse it throughout an application.  Let’s start by creating the sample table DDL.

USE [tempdb]
GO
 
IF EXISTS(SELECT 1 FROM sys.tables WHERE NAME = 'TestData')
BEGIN
    DROP TABLE dbo.[TestData];
END
GO
 
--Create Sample Table
CREATE TABLE dbo.TestData(
RowNum INT PRIMARY KEY,
SomeId INT,
SomeDate DATETIME
);
GO
 
INSERT INTO dbo.TestData 
SELECT TOP 1000 
    ROW_NUMBER() OVER (ORDER BY t1.NAME) AS RowNumber,
    ABS(CHECKSUM(NEWID()))%250 AS SomeId, 
    DATEADD(DAY,ABS(CHECKSUM(NEWID()))%1000,'2008-01-01') AS SomeDate
FROM 
    Master.dbo.SysColumns t1
GO

Next I am going to create two functions.  One function will be a scalar UDF and the other will be an inline table valued function.  If you do not know what an inline TVF is, an inline TVF is like a parameterized view and is subject to the same restrictions as a view.  For more information you can read the following post in BOL, http://msdn.microsoft.com/en-us/library/ms189294.aspx.

--Create Scalar Function
CREATE FUNCTION dbo.fn_SomeFunction(@SomeId INT)
RETURNS DATETIME
AS
BEGIN
    DECLARE @dt DATETIME
    SELECT @dt = MAX(SomeDate)
    FROM dbo.TestData
    WHERE SomeId = @SomeId
    
    RETURN @dt
END
GO
 
--Create Inline Table Valued Function
CREATE FUNCTION dbo.fn_SomeInlineFunction()
RETURNS TABLE
RETURN(
    SELECT SomeId, MAX(SomeDate) AS SomeDate
    FROM dbo.TestData
    GROUP BY SomeId
)
GO

All of our DDL is in place.  All that is left is to test the performance.  If you read my last post, you should be expecting the inline TVF to out perform the scalar UDF.  Let’s see what actually transpires.  Discard the query results to the grid by clicking Tools –> Options –> Query Results –> SQL Server –> Results To Grid –> Discard results after execution.  Next open SQL Server profiler and use the standard template.  Run the code below to capture the performance counters in profiler.

SELECT SomeId,dbo.fn_SomeFunction(SomeId)
FROM dbo.[TestData]
GO
 
SELECT TestData.SomeId,max_dt.SomeDate
FROM dbo.[TestData]
INNER JOIN dbo.fn_SomeInlineFunction() AS max_dt
    ON max_dt.SomeId = TestData.SomeId
GO

The results of the queries should look similar to my results below.    The things of note are the reads and the CPU required to satisfy the each query.  The number of reads and CPU required to satisfy the scalar UDF  is astronomically greater than the inline TVF. 

image

If the above screenshot is not enough to discourage you from using scalar UDFs, I do not know what can.  The point being that there are all kinds of great alternatives to encapsulating code logic, without the use of scalar functions.  Inline TVFs offer a SET based  approach for encapsulating business logic; plus the optimizer is able to use existing statistics and indexes to optimize inline TVFs.  It is my recommendation that you should try to convert all scalar UDFs to inline TVFs.  I know this is not always possible, but it is a good start.  I typically try to stay away from scalar and multi-line UDFs, unless absolutely necessary.  I hope that you have learned something new and that you can use this example to get the needed signoff to change those problematic scalar UDFs, into inline TVFs.

Until next time, happy coding.

1 comment:

Rajesh Chandras said...

Do you mean I can assume that the following are recommended usages :
1 indicate least recommended and 3 indicates most recommended UDFs


1. Scalar UDF
1. Multilne UDF
2. Table UDF
3. Inline Table UDF
3. Inline UDF