Friday, July 10, 2009

Should I Intersect or Except?

Two very useful functions were introduced, in SQL Server 2005, Intersect and Except.  These two functions can greatly limit the amount of coding required to find the differences or similarities between two result sets.  Let’s start with the Except operator.  Except returns the distinct values from the left query that do not exist in the right query.  When I say left and right query I am referring to the position of the query in relation to EXCEPT.  Take a look at the example below.  In this example, the table 1 query is to the left and table 2 is to the right.  This means the result will include all the rows from table1 than do not exist in table2.

SELECT * FROM Table1 EXCEPT SELECT * FROM Table2

You are probably thinking okay…, but how can this help me?  I believe the best way to demonstrate the behavior is through example.  Let’s start by creating test tables.

DECLARE @t1 TABLE(
id INT,
col CHAR(1)
)
 
INSERT INTO @t1 VALUES (1,'a');
INSERT INTO @t1 VALUES (2,'b');
INSERT INTO @t1 VALUES (3,'c');
INSERT INTO @t1 VALUES (4,'d');
 
DECLARE @t2 TABLE(
id INT,
col CHAR(1)
)
 
INSERT INTO @t2 VALUES (1,'a');
INSERT INTO @t2 VALUES (2,'b');
INSERT INTO @t2 VALUES (4,'d');
INSERT INTO @t2 VALUES (5,'e');

Now, we can test.  In this example we will use the same logic that we presented earlier. 

--use EXCEPT
SELECT 
    *
FROM @t1
 
EXCEPT
 
SELECT *
FROM @t2

The results are as follows:

image

The results is only id =3, col=’c’ because this row is the only row that existed in @t1 that did not exist in @t2.  This example may not seem like it saves you a lot of coding,  but imagine how many statements you would need to check each column, especially when the table has lots of columns. Now let’s expand the example to show distinct values, in both tables.  So how do we get the distinct rows, for each table?  If you said, use two except operators, you are correct.  We can simply reverse the left and right queries to change the logic such that we get the distinct rows from @t2.  We will use the Union all operator to append the results together, so that we get one final result set of rows and the table they exist in.

--use EXCEPT
SELECT *
FROM(
    SELECT 
        't1' AS tbl,id,col
    FROM @t1
 
    EXCEPT --orginal except
 
    SELECT 
        't1' AS tbl,id,col
    FROM @t2
 
    UNION ALL
    
    SELECT 
        't2' AS tbl,id,col
    FROM @t2
 
    EXCEPT –reverse the except logic
 
    SELECT 
        't2' AS tbl,id,col
    FROM @t1
) AS a    

Results:

image

Now lets move on to the Intersect operator.  Intersect does what the name implies. Intersect returns all the rows where the query results match between the left and right queries.

Let’s see it in action:

--use intersect
SELECT 
    *
FROM @t1
 
INTERSECT
 
SELECT *
FROM @t2

image

The rows returned show all the values that exist in both query result sets.  This is a very quick and easy way to get like values.  The alternative to using the intersect method is to use a predicate that contains every column in the table, as shown below.

--use join
SELECT 
    t1.*
FROM @t1 t1
INNER JOIN @t2 t2
    ON  t1.id = t2.id 
    AND t1.col = t2.col

You may be thinking that this does not look so bad, which I can agree with; however, what happens when your queries has 10 or more columns?  Can you imagine having to scale this code to work with 20 filters.  I just think it is a lot easier to use intersect for this, oppose to hand coding the predicate. One could argue that checksum is a valid alternative.  Checksum can be a good alternative; however, it prevents an index from being used, so I would  be a little apprehensive to use it, especially if you are implementing the solution in a production environment.

As you can see, these new function removes a lot of  coding complexity.  The performance of each of these operators varies depending on the size and complexity of the results returned by each query and the indexing in place.  I typically use these operators to quickly identify RI violators and to quickly identify data that exists in one table and not another.  I hope that this tip saves you time and that you learned something new. 

Happy coding!

5 comments:

Brad Schulz said...

Kind of cool that we have all possibilities now: UNION, UNION ALL, INTERSECT, and EXCEPT.

By the way, your profile mentions another baby coming in May2009. I assume you're the father of two now? (I have three... ages 13, 16, and 20).

Adam Haines said...

Brad the new bundle of joy did in fact arrive. Actually he came a little earlier than we expected. I have been meaning to update my profile, but havent gotten around to it yet. I have a 2 year old daughter and my son is now 12 weeks old.

Brian Tkatch said...

Thanx for going over the basics, and congrats on the new kid. :)

Brad Schulz said...

Congratulations! Busy times ahead...

daspeac said...

I believe you have already heard about the analyze and repair pdf file download