Monday, June 29, 2009

Concatenating Column Values (Part 1)

As you may know, SQL Server does not have native support for handing arrays.  This post is part one of a two part series, where I will look at the most widely used methods to concatenate column values.  In this post, I will focus on creating delimited column values and I will explore the performance implications of each method, in the second part of this series.

The two methods that are most often used, in SQL Server 2005+, to concatenate column values are, the variable assignment method and the FOR XML PATH method.  The variable assignment method is most commonly used in scalar UDF's, so there are usually some performance implications.  The FOR XML PATH method is more commonly used in a correlated sub query to concatenate column values inline.  This method also has performance implications.

Let’s create a sample table with data.

USE [tempdb]
GO
 
IF EXISTS(SELECT 1 FROM sys.tables WHERE  NAME = 't')
BEGIN
    DROP TABLE dbo.t;
END
GO
 
CREATE TABLE t(
id INT,
col CHAR(1)
);
 
INSERT INTO t VALUES (1,'a');
INSERT INTO t VALUES (1,'b');
INSERT INTO t VALUES (1,'c');
INSERT INTO t VALUES (1,'d');
INSERT INTO t VALUES (2,'e');
INSERT INTO t VALUES (2,'f');
INSERT INTO t VALUES (3,'g');
INSERT INTO t VALUES (4,'h');

We have our data so let’s start with the scalar UDF code.  Essentially we need to create a scalar UDF that takes a surrogate key parameter and will use variable assignment to concatenate all required values.

Here is the Scalar UDF code:

USE [tempdb]
GO
 
CREATE FUNCTION dbo.ConcatenateCols(@Id INT)
RETURNS VARCHAR(MAX)
AS
BEGIN
 
DECLARE @RtnStr VARCHAR(MAX)
 
SELECT @RtnStr = COALESCE(@RtnStr + ',','') + col
FROM dbo.t
WHERE id = @Id AND col > ''
 
RETURN @RtnStr
 
END
GO

Now that we have our function we can test the scalar UDF code.

SELECT 
    id,
    dbo.ConcatenateCols(id) AS Cols
FROM t
GROUP BY
    Id

Here are the results:

image

Now let’s take a look at the FOR XML PATH method. 

SELECT 
    Id,
    STUFF(
            (
            SELECT ',' + CONVERT(VARCHAR(5),col)
            FROM t t2
            WHERE
                t1.id = t2.id AND t2.col > ''
            FOR XML PATH('') 
            )
    ,1,1,'') AS Cols
FROM [t] t1
GROUP BY
    Id

Results:

image

Note: You can also use CROSS APPLY and achieve the same result, but it makes the code a little easier to read, in my opinion, as shown below:

SELECT 
    t1.id,
    MAX(STUFF(t2.x_id,1,1,'')) AS Cols
FROM t t1
CROSS apply(
    SELECT ',' + t2.col
    FROM t t2
    WHERE t2.id = t1.id AND t2.col > ''
    FOR xml PATH('')
) AS t2 (x_id)
GROUP BY
    t1.id
GO

There you have it.  I have shown you the two most widely used methods to concatenate strings in SQL Server 2005+.  They are pretty simple to implement and maintain, but one should know how each method performs, so stay tuned…..in the coming post, I will look at the performance implications associated with the scalar UDF and FOR XML PATH methods.

****************** UPDATE ******************

There was a bug in the code that would cause the concatenated value in the variable assignment method to become NULL.  When I was creating the sample code, I really didn’t think about using COALESCE and a filter predicate because I was using NON NULL values.  Best practice is to use coalesce or a proper predicate filter to prohibit useless data from being concatenated.  I have opted to go with a predicate filter of > ‘’.   Thanks to all for pointing this out.

11 comments:

Anonymous said...

Thank you for the post!

In the SELECT-statement where you test the UDF, you use the wrong UDF-name: it should be 'ConcatenateCols', not 'ConcatenateColumn'.

Naomi said...

Please check this thread http://tek-tips.com/viewthread.cfm?qid=1599177&page=1

There is a bug in the UDF code - you need to add a check for NULL in Col.

Adam Haines said...

Hi Naomi,

I agree that the code should have a coalesce on the column being concatenated. In my sample here it didnt need the COALESCE function because I was concatenating non NULL values. I will modify the post to include coalesce.

Thanks,
Adam

Term Papers said...

I have been visiting various blogs for my term papers writing research. I have found your blog to be quite useful. Keep updating your blog with valuable information... Regards

daspeac said...

I believe you have also heard about the recover illustrator file utility

Anonymous said...

Outstanding work! The data given was very beneficial. I hope you maintain the great work done.
hialeah locksmith
Wellington FL Locksmith
Locksmith Saratoga CA
Locksmith Newark CA
RedwoodCity locksmith
Locksmith Palo Alto
Locksmith Mesquite tx
plano locksmiths
locksmith in hialeah
plano locksmith
hialeah locksmith
plano locksmiths
fort worth locksmith
pembroke pines locksmiths
Locksmith Mesquite
Locksmith Mesquite tx
hialeah locksmiths
hialeah locksmith
locksmith fort worth texas
irvine locksmith
miami locksmiths
locksmith hialeah
locksmiths fort worth
locksmith miami beach

Unknown said...

I know this is fairly old, but I thought I'd just point out that the XML version has a side effect that may cause issues: any invalid XML entities are automatically XML encoded. The one I came across is that CR is converted to &#x0D.

Unknown said...
This comment has been removed by the author.
Adam Haines said...

Navdeep Bains,

In the case of two tables you would want to do an inner join (or whatever the relationship is) inside of the correlated subquery doing the XML concatenation.

Unknown said...

This is very helpful. Thank you.

Sherry Samples said...

I have a similar situation. I start out with a table that has data input into a column from another source. This data is comma delimited coming in. I need to manipulate the data to remove a section at the end of each. So I split the data and remove the end with the code below. (I added the ID column later to be able to sort. I also added WITH SCHEMABINDING later to add an XML index but nothing works. I can remove this ... and the ID column, but I do not see any difference one way or the other):

ALTER VIEW [dbo].[vw_Routing]
WITH SCHEMABINDING
AS
SELECT TOP 99.9999 PERCENT

ROW_NUMBER() OVER (ORDER BY CableID)-1 as ID,
CableID AS [CableID],
SUBSTRING(m.n.value('.[1]','varchar(8000)'),1,13) AS Routing
FROM
(
SELECT CableID,CAST('' + REPLACE([RouteNodeList],',','') + '' AS XML) AS x
FROM [dbo].[Cables]
)t
CROSS APPLY x.nodes('/XMLRoot/RowData')m(n)

ORDER BY ID

Now I need to Concatenate data from the "Routing" column's rows into one row grouped by another column into a column again. I have the code working except that it is reordering my data; I must have the data in the order it is input into the table as it is Cable Routing information. I must also remove duplicates. I use the following code. The SELECT DISTINCT removes the duplicates, but reorders the data. The SELECT (without DISTINCT) keeps the correct data order, but does NOT remove the duplicates:

Substring(
(
SELECT DISTINCT ','+ x3.Routing AS [text()] --This DISTINCT reorders the routes once concatenated.
--SELECT ','+ x3.Routing AS [text()] --This without the DISTINCT does not remove duplicates.
From vw_Routing x3
Where x3.CableID = c.CableId
For XML PATH ('')
), 2, 1000) [Routing],

I tried the code you gave above and it provided the same results with the DISTINCT reordering the data but without DISTINCT not removing the duplicates.

PLEASE HELP!!!

Thanks,
Sherry S.