Demystifying SQL Server: Should I Intersect or Except?

Friday, July 10, 2009

Should I Intersect or Except?

Two very useful functions were introduced, in SQL Server 2005, Intersect and Except. These two functions can greatly limit the amount of coding required to find the differences or similarities between two result sets. Let’s start with the Except operator. Except returns the distinct values from the left query that do not exist in the right query. When I say left and right query I am referring to the position of the query in relation to EXCEPT. Take a look at the example below. In this example, the table 1 query is to the left and table 2 is to the right. This means the result will include all the rows from table1 than do not exist in table2.

SELECT * FROM Table1 EXCEPT SELECT * FROM Table2

You are probably thinking okay…, but how can this help me? I believe the best way to demonstrate the behavior is through example. Let’s start by creating test tables.

DECLARE @t1 TABLE(

id INT,

col CHAR(1)

INSERT INTO @t1 VALUES (1,'a');

INSERT INTO @t1 VALUES (2,'b');

INSERT INTO @t1 VALUES (3,'c');

INSERT INTO @t1 VALUES (4,'d');

DECLARE @t2 TABLE(

id INT,

col CHAR(1)

INSERT INTO @t2 VALUES (1,'a');

INSERT INTO @t2 VALUES (2,'b');

INSERT INTO @t2 VALUES (4,'d');

INSERT INTO @t2 VALUES (5,'e');

Now, we can test. In this example we will use the same logic that we presented earlier.

--use EXCEPT

SELECT

FROM @t1

EXCEPT

SELECT *

FROM @t2

The results are as follows:

The results is only id =3, col=’c’ because this row is the only row that existed in @t1 that did not exist in @t2. This example may not seem like it saves you a lot of coding, but imagine how many statements you would need to check each column, especially when the table has lots of columns. Now let’s expand the example to show distinct values, in both tables. So how do we get the distinct rows, for each table? If you said, use two except operators, you are correct. We can simply reverse the left and right queries to change the logic such that we get the distinct rows from @t2. We will use the Union all operator to append the results together, so that we get one final result set of rows and the table they exist in.

--use EXCEPT

SELECT *

FROM(

    SELECT

        't1' AS tbl,id,col

    FROM @t1

    EXCEPT --orginal except

    SELECT

        't1' AS tbl,id,col

    FROM @t2

    UNION ALL

    SELECT

        't2' AS tbl,id,col

    FROM @t2

    EXCEPT –reverse the except logic

    SELECT

        't2' AS tbl,id,col

    FROM @t1

) AS a

Results:

Now lets move on to the Intersect operator. Intersect does what the name implies. Intersect returns all the rows where the query results match between the left and right queries.

Let’s see it in action:

--use intersect

SELECT

FROM @t1

INTERSECT

SELECT *

FROM @t2

The rows returned show all the values that exist in both query result sets. This is a very quick and easy way to get like values. The alternative to using the intersect method is to use a predicate that contains every column in the table, as shown below.

--use join

SELECT

    t1.*

FROM @t1 t1

INNER JOIN @t2 t2

    ON  t1.id = t2.id

    AND t1.col = t2.col

You may be thinking that this does not look so bad, which I can agree with; however, what happens when your queries has 10 or more columns? Can you imagine having to scale this code to work with 20 filters. I just think it is a lot easier to use intersect for this, oppose to hand coding the predicate. One could argue that checksum is a valid alternative. Checksum can be a good alternative; however, it prevents an index from being used, so I would be a little apprehensive to use it, especially if you are implementing the solution in a production environment.

As you can see, these new function removes a lot of coding complexity. The performance of each of these operators varies depending on the size and complexity of the results returned by each query and the indexing in place. I typically use these operators to quickly identify RI violators and to quickly identify data that exists in one table and not another. I hope that this tip saves you time and that you learned something new.

Happy coding!

6 comments:

Brad Schulz said...: Kind of cool that we have all possibilities now: UNION, UNION ALL, INTERSECT, and EXCEPT.

By the way, your profile mentions another baby coming in May2009. I assume you're the father of two now? (I have three... ages 13, 16, and 20).; July 11, 2009 at 7:26 PM
Adam Haines said...: Brad the new bundle of joy did in fact arrive. Actually he came a little earlier than we expected. I have been meaning to update my profile, but havent gotten around to it yet. I have a 2 year old daughter and my son is now 12 weeks old.; July 11, 2009 at 10:09 PM
Brian Tkatch said...: Thanx for going over the basics, and congrats on the new kid. :); July 13, 2009 at 5:05 AM
Brad Schulz said...: Congratulations! Busy times ahead...; July 13, 2009 at 10:21 PM
daspeac said...: I believe you have already heard about the analyze and repair pdf file download; December 3, 2010 at 9:14 AM
Anonymous said...: Still helpful and it is 2017. Thank you.; December 14, 2017 at 11:59 AM

Demystifying SQL Server

Friday, July 10, 2009

Should I Intersect or Except?

6 comments:

Subscribe

Search This Blog

About Me

Twitter

Twitter

Blogs I follow

Followers

Blog Archive

Keywords