Sunday, August 16, 2009

Pivoting Data Using SQL 2005 (Part 1)

SQL Server 2005 introduced a new operator that makes pivoting data really easy.  For those of you who may be unfamiliar with the term pivot, pivoting is the process of transforming row values into columns, using an aggregate function to summarize data.  The biggest problem with the pivot operator is that the values have to be known, which severely limits the flexibility of the pivot. Hopefully, in future versions Microsoft will give us a more dynamic Pivot operator. In the interim, the only way to dynamically pivot data is to use dynamic SQL, which I will address in part 2 of this series.  For now let’s focus a on creating a simple Pivot query.  I will start by creating a sample table.

IF OBJECT_ID('tempdb..#Sales') IS NOT NULL
BEGIN
    DROP TABLE #Sales;
END
GO
 
CREATE TABLE #Sales(
Sales_Id INT IDENTITY(1,1) PRIMARY KEY,
Emp_Id INT,
Sales_Dt DATETIME,
Sales_Amt NUMERIC(9,2)
);
GO
 
INSERT INTO #Sales VALUES (1,'2009-01-01',100.25);
INSERT INTO #Sales VALUES (1,'2009-02-01',999.99);
INSERT INTO #Sales VALUES (1,'2009-03-01',499.99);
INSERT INTO #Sales VALUES (1,'2009-04-15',654.99);
INSERT INTO #Sales VALUES (1,'2009-07-06',197.99);
INSERT INTO #Sales VALUES (2,'2009-01-01',99.99);
INSERT INTO #Sales VALUES (2,'2009-01-03',107.20);
INSERT INTO #Sales VALUES (3,'2009-01-03',459.36);
INSERT INTO #Sales VALUES (3,'2009-05-01',19.99);
INSERT INTO #Sales VALUES (3,'2009-06-01',29.99);
INSERT INTO #Sales VALUES (3,'2009-07-01',1250.98);
INSERT INTO #Sales VALUES (3,'2009-04-01',488.23);
INSERT INTO #Sales VALUES (3,'2009-02-28',1999.99);
GO

Now that we have our table, we can create our pivot query.  The Pivot operator has three crucial components, the aggregate function, the column to pivot, and the row values to aggregate into columns.  In the example below, we are using the aggregate function SUM, pivoting the column Qtr, (which is expressed as the quarter datapart of the Sales_Dt), and aggregating the Sales_Amt for sales_dates that fall within Quarters “1,2,3,4”. 

Note: I limited the data to the current year.  If multiple years are returned the values may not aggregate correctly because quarters will overlap between years.  We will address this problem in part 2, of this series by appending the year to the pivot values.

SELECT 
    Emp_Id,
    COALESCE([1],0) AS [Q1],
    COALESCE([2],0) AS [Q2],
    COALESCE([3],0) AS [Q3],
    COALESCE([4],0) AS [Q4]
FROM(
    SELECT 
        Emp_id,
        DATEPART(QUARTER,Sales_Dt) AS [Qtr],
        Sales_Amt    
    FROM #Sales
    WHERE
        Sales_Dt >= '2009-01-01'
        AND Sales_Dt < '2010-01-01'
) AS dataToPivotTable
PIVOT(
    SUM(Sales_Amt) --Aggregate Function
    FOR [Qtr]-- Column To Pivot
    IN ([1],[2],[3],[4]) --Qtr Values to Pivot
) AS whatToPivotOnTable;

image

I would like to take a minute to describe what the pivot operator is doing, in the background.  I believe the easiest way to understand what is occurring  is to break apart each key component.  For example, the pivot column, or the column used in the FOR clause,  is the column that you want to group by, the the aggregate function aggregates and summarizes data  (No surprises there :-) , and the row values used in the IN clause are used to create a case expression. Under the hood the pivot operator can be expressed as:

 
SELECT 
Aggregate_Function(CASE WHEN MyCol = MyValue1 THEN MyCol ELSE NULL END),
Aggregate_Function(CASE WHEN MyCol = MyValue2 THEN MyCol ELSE NULL END),
...
FROM MyTable GROUP BY GroupByColumn

You can look at the execution plan of the Pivot query to validate that this is occurs, in the stream aggregate show plan operator.  The same construct can be used to create a pivot or a  “Cross-Tab” query, in prior versions on SQL Server.  We can evaluate the very same pivot statement, as shown below.

SELECT 
    [Emp_Id],
    SUM(CASE WHEN DATEPART(QUARTER,Sales_Dt) = 1 THEN Sales_Amt ELSE 0 END) AS [Q1],
    SUM(CASE WHEN DATEPART(QUARTER,Sales_Dt) = 2 THEN Sales_Amt ELSE 0 END) AS [Q2],
    SUM(CASE WHEN DATEPART(QUARTER,Sales_Dt) = 3 THEN Sales_Amt ELSE 0 END) AS [Q3],
    SUM(CASE WHEN DATEPART(QUARTER,Sales_Dt) = 4 THEN Sales_Amt ELSE 0 END) AS [Q4]
FROM #Sales
WHERE
    Sales_Dt >= '2009-01-01'
    AND Sales_Dt < '2010-01-01'
GROUP BY [Emp_Id];

There you have it.  A simplistic method to pivot rows of data into columns.  In part 2 of this series, I will focus on creating a dynamic pivot statement, where the columns and values are unknown.

Happy coding.

1 comment:

daspeac said...

I believe you have already heard about the way of pdf file is not readable