Demystifying SQL Server: Pivoting Data Using SQL 2005 (Part 1)

SQL Server 2005 introduced a new operator that makes pivoting data really easy. For those of you who may be unfamiliar with the term pivot, pivoting is the process of transforming row values into columns, using an aggregate function to summarize data. The biggest problem with the pivot operator is that the values have to be known, which severely limits the flexibility of the pivot. Hopefully, in future versions Microsoft will give us a more dynamic Pivot operator. In the interim, the only way to dynamically pivot data is to use dynamic SQL, which I will address in part 2 of this series. For now let’s focus a on creating a simple Pivot query. I will start by creating a sample table.

IF OBJECT_ID('tempdb..#Sales') IS NOT NULL

BEGIN

    DROP TABLE #Sales;

END

GO

CREATE TABLE #Sales(

Sales_Id INT IDENTITY(1,1) PRIMARY KEY,

Emp_Id INT,

Sales_Dt DATETIME,

Sales_Amt NUMERIC(9,2)

);

GO

INSERT INTO #Sales VALUES (1,'2009-01-01',100.25);

INSERT INTO #Sales VALUES (1,'2009-02-01',999.99);

INSERT INTO #Sales VALUES (1,'2009-03-01',499.99);

INSERT INTO #Sales VALUES (1,'2009-04-15',654.99);

INSERT INTO #Sales VALUES (1,'2009-07-06',197.99);

INSERT INTO #Sales VALUES (2,'2009-01-01',99.99);

INSERT INTO #Sales VALUES (2,'2009-01-03',107.20);

INSERT INTO #Sales VALUES (3,'2009-01-03',459.36);

INSERT INTO #Sales VALUES (3,'2009-05-01',19.99);

INSERT INTO #Sales VALUES (3,'2009-06-01',29.99);

INSERT INTO #Sales VALUES (3,'2009-07-01',1250.98);

INSERT INTO #Sales VALUES (3,'2009-04-01',488.23);

INSERT INTO #Sales VALUES (3,'2009-02-28',1999.99);

GO

Now that we have our table, we can create our pivot query. The Pivot operator has three crucial components, the aggregate function, the column to pivot, and the row values to aggregate into columns. In the example below, we are using the aggregate function SUM, pivoting the column Qtr, (which is expressed as the quarter datapart of the Sales_Dt), and aggregating the Sales_Amt for sales_dates that fall within Quarters “1,2,3,4”.

Note: I limited the data to the current year. If multiple years are returned the values may not aggregate correctly because quarters will overlap between years. We will address this problem in part 2, of this series by appending the year to the pivot values.

SELECT

    Emp_Id,

    COALESCE([1],0) AS [Q1],

    COALESCE([2],0) AS [Q2],

    COALESCE([3],0) AS [Q3],

    COALESCE([4],0) AS [Q4]

FROM(

    SELECT

        Emp_id,

        DATEPART(QUARTER,Sales_Dt) AS [Qtr],

        Sales_Amt

    FROM #Sales

    WHERE

        Sales_Dt >= '2009-01-01'

        AND Sales_Dt < '2010-01-01'

) AS dataToPivotTable

PIVOT(

    SUM(Sales_Amt) --Aggregate Function

    FOR [Qtr]-- Column To Pivot

    IN ([1],[2],[3],[4]) --Qtr Values to Pivot

) AS whatToPivotOnTable;

I would like to take a minute to describe what the pivot operator is doing, in the background. I believe the easiest way to understand what is occurring is to break apart each key component. For example, the pivot column, or the column used in the FOR clause, is the column that you want to group by, the the aggregate function aggregates and summarizes data (No surprises there :-) , and the row values used in the IN clause are used to create a case expression. Under the hood the pivot operator can be expressed as:

SELECT

Aggregate_Function(CASE WHEN MyCol = MyValue1 THEN MyCol ELSE NULL END),

Aggregate_Function(CASE WHEN MyCol = MyValue2 THEN MyCol ELSE NULL END),

...

FROM MyTable GROUP BY GroupByColumn

You can look at the execution plan of the Pivot query to validate that this is occurs, in the stream aggregate show plan operator. The same construct can be used to create a pivot or a “Cross-Tab” query, in prior versions on SQL Server. We can evaluate the very same pivot statement, as shown below.

SELECT

    [Emp_Id],

    SUM(CASE WHEN DATEPART(QUARTER,Sales_Dt) = 1 THEN Sales_Amt ELSE 0 END) AS [Q1],

    SUM(CASE WHEN DATEPART(QUARTER,Sales_Dt) = 2 THEN Sales_Amt ELSE 0 END) AS [Q2],

    SUM(CASE WHEN DATEPART(QUARTER,Sales_Dt) = 3 THEN Sales_Amt ELSE 0 END) AS [Q3],

    SUM(CASE WHEN DATEPART(QUARTER,Sales_Dt) = 4 THEN Sales_Amt ELSE 0 END) AS [Q4]

FROM #Sales

WHERE

    Sales_Dt >= '2009-01-01'

    AND Sales_Dt < '2010-01-01'

GROUP BY [Emp_Id];

There you have it. A simplistic method to pivot rows of data into columns. In part 2 of this series, I will focus on creating a dynamic pivot statement, where the columns and values are unknown.

Happy coding.

Demystifying SQL Server

Sunday, August 16, 2009

Pivoting Data Using SQL 2005 (Part 1)

1 comment:

Subscribe

Search This Blog

About Me

Twitter

Twitter

Blogs I follow

Followers

Blog Archive

Keywords