Sunday, September 27, 2009

SSIS – Only Inserting Rows That Do Not Exist

I have seen an overwhelming trend that suggests that today’s TSQL/BI developers are very interested in using Microsoft’s BI (Business Intelligence) product, SQL Server Business Intelligence Development Studio aka BIDS.  For those of you who may not know, BIDS is installed when SQL Server is installed.  BIDS is nothing more than a Visual Studios add-in that uses the .Net Framework.   BIDS allows you to do Analysis Services (Cube Design and management),  Reporting Services, and Integration Services.  The BI platform that I will be focusing on today is SSIS (SQL Server Integration Services).  SSIS is used for ETL, which is a acronym for  Extract- Transform-Load.  Essentially SSIS is used to migrate data from many different sources,  including MS Access, Excel, Text Files, Oracle, SQL, AS400 etc, while allowing data transformations between source and destination.  The problem that most import processes have is flexibility.  Some bulk import APIs do not have an easy method to import new data, or data that does not currently exist in a table.  In these cases, the import process has to import all source data, then another command has to be issued to filter and insert the data.  A great example of this is BCP or the Bulk Insert command.  SSIS gives developers an easy and efficient method to insert new data.  This is where I will be focusing my efforts today, in a future post, I will demonstrate how to do an UPSERT (Update existing and Insert new) operation using SSIS.  Okay let’s start by creating a sample table.

--Switch DB context to Tempdb
USE [tempdb]
--Drop table if exists
    DROP TABLE dbo.SSIS_Import;
--Create table
SomeInt INT,
SomeChar CHAR(1)
--Insert some test data
INSERT INTO dbo.SSIS_Import ([SomeID],[SomeInt],[SomeChar]) VALUES (1,1,'A');
INSERT INTO dbo.SSIS_Import ([SomeID],[SomeInt],[SomeChar]) VALUES (2,5,'B');
INSERT INTO dbo.SSIS_Import ([SomeID],[SomeInt],[SomeChar]) VALUES (3,8,'C');
INSERT INTO dbo.SSIS_Import ([SomeID],[SomeInt],[SomeChar]) VALUES (4,2,'D');

Now that our table exists, I am going to shift gears and start creating the SSIS package. Before creating the package, you should create a Pipe delimited text file on the C: drive, called SSIS_Import.txt.  Below is a sample of my file.


Okay now we can start creating our SSIS package.  You should now be looking at a blank canvas, the first objective is to drag a Data Flow Task onto the canvas.  Your canvas should look like below.


Note: I have renamed my data flow task to represent what it is doing.

I will now configure the Data Flow Task.  Double-click the data flow task to open the data flow task designer canvas. Drag a “Flat File Source” to the canvas.  Double-click the Flat File Source and create a new connection.  Your connection should be configured as shown below.


You will see an error message at the bottom on the window, stating the columns are not defined.  All you need to do is click the Advanced tab and make sure the data types and length are correct.  Column 0 and column 1 should be of type “four byte signed integer” and column 2 is a string with a length of 1.   In addition to configuring the data types, you should rename the columns.  You can change the name of a column by typing a new name in the Name property field.  The names should align with the table, so the mappings should look like this:

  • Column 0 –> SomeId
  • Column 1 –> SomeInt
  • Column 2 –> SomeChar

After all configurations, your configuration should look like below.


Once the Flat File Source is configured, click “Ok” and “Ok” again.  The next step is to drag a “Lookup” transformation to the canvas and connect the Source connection to the Lookup transformation via the green precedence constraint.  Double-Click the Lookup transformation to configure it.  The first thing I have to do is change how errors are handled.  By default, if a single value is not found in the source, the entire Lookup component fails.  We will need to make sure non matching rows are redirected to the error output, as shown below.


Next, I will configure the connection.  Click the connection tab and the “New” button again.  The OLE DB connection should point to the database where the SSIS_Import table is.  Your screen should look like below.


Next we have to configure the key columns.  For the sample table the key column is SomeID, which maps to the Column 0, or SomeID if you renamed the column in your flat file connection manager, as shown below.

Note: To create the relationship you will have to drag the source column to the destination column.


Once you have finished configuration of this component, click “Ok” to save the changes.  Your canvas should be look like the screenshot below.


Next drag a OLE DB Destination to the canvas.  Make sure you drag the red arrow (precedence constraint) to the OLE DB Destination, which should cause a dialog box to popup. When it does just click “Ok.”  The destination controls where the source data will be inserted.  Double-click the OLE  DB Destination and make the destination point to the SSIS_Import table, as shown below.


Now click the mappings tab to map the source columns to the destination columns.  I have renamed my source columns, so SSIS will automatically map the columns for me. Click “Ok”, once you have mapped all the columns.


That is it!!! The final canvas will look like the screenshot below. 


All that is left is testing.  Click the debug button (looks like a “Play” button). If you have correctly configured all the steps,  the components will light up green.  If any turn red, something is not configured properly. Once all components light up green, run a simple select statement over the table to make sure it did what is supposed to do.

There you have it a method that takes less than 5 minutes to configure and will migrate new data only.  I hope that you have enjoyed reading this post and can make use of this in your environment. Stay tuned…… my next step is to Update pre-existing rows, using SSIS.  This is commonly known as an UPSERT.  I will show you a few ways to accomplish this task and the performance considerations, for each method.

Until next time… Happy Coding.


Anonymous said...

Can't able to see the screenshots

Adam Haines said...

What browser are you using and version?

daspeac said...

that’s good, but you can try another way of open dbx file

drw_online said...

Great Post of a great trick!

Steve Cooper said...

Its good things, keep share
Business Intelligence Product

DataArchitect said...

Note: does not handle dynamic cache lookup - Once the target is updated, the cache still reflects the original state of table, and therefore, the SSIS component will write a duplicate update.... (One fix is to populate the source with "select distint" or equavilant to prevent this. However, still a good post but this condition will be quite common on most business apps updating lookup type tables.

Anonymous said...

thanks adam this helped me out a lot.

Anonymous said...

I know it is an old post but i cannot find a solution for a similar problem without a script task.

As fas as I know the Lookup transformation performs a join and so the error output will have the rows where it cannot join the tables. But in my specific case I need to compare date columns and when setting it up like you suggested, the Lookup component states it cannot perform a join on dates.

Do you have any idea how to fix this particular problem?

Adam Haines said...

You can convert the date column to a string of just the date in ISO format YYYYDDMM.

Another method is to convert the date into an integer. You can use the the datediff between the representative date and a fixed date ie 19000101, as the integer value. You can then do a lookup against the computed integer column.

Anonymous said...

Is this possible with the lookup element? With plain SQL it is not a problem at all but I guess I cannot change the process in the lookup transformation.

Ervin vd merwe said...

Thanks for the help!