Wednesday, September 30, 2009

SSIS – Performing An UPSERT

For those of you who may be wondering what the heck an UPSERT is, an UPSERT is an UPDATE and INSERT into a given table. Typically, an UPSERT operation requires two separate transactions. Usually, the first transaction issued is the UPDATE and then the INSERT is issued shortly after. In this post I will be extending the package I created, in my previous post, http://jahaines.blogspot.com/2009/09/ssis-only-inserting-rows-that-do-not.html. Once your SSIS package resembles the package in the link, you can proceed with the next step. If you want to skip the prior post, you can download the package at the bottom on this post.

The first item of interest is the Lookup transformation. First I will drag an OLE DB Command onto the canvas and connect it to the Lookup transformation. A dialog box should appear. When the dialog box appears choose the Lookup Match Output, in the Output dropdown list. An OLE DB Command, is a parameterized TSQL statement that can issued against a given data set. Before I start configuring the transformation, I would like to point out that this method is extremely easy to setup, but has a major drawback. The OLE DB Command has to be issued for each row returned by the Input, so this type of process works recursively or iteratively. I will demonstrate another method later in this post. With that out of the way, I will start configuring the transformation. First, I will build an UPDATE command using parameters for each column and the predicate.

My canvas currently looks like the screenshot below.

image

The next step is to configure the OLE DB Command transformation. Double-click the OLE DB Command transformation. In the Connection Manager dropdown list, choose the database where you created dbo.SSIS_Import. Click the “Component Properties” tab. Within the “Component Properties” tab, you will need to click the ellipses next to the “SQL Command” property. Paste the code presented below into the box and click ok.

UPDATE dbo.SSIS_Import
SET 
     SomeInt = ?,
     SomeChar = ?
FROM dbo.SSIS_Import import
WHERE SomeId = ?

The code above is a pretty simplistic UPDATE statement. The key thing to note is the “?”. Each of the ‘?” will be given a parameter value in SSIS. The parameters are dynamically named in the order they appear in the command. Next, I will be mapping these parameters to my input columns. Click the “Column Mappings” tab. Align the input columns to each parameter. My column mapping is shown below. Once complete, click “Ok.”

image

That it!!! We have successfully created an UPSERT operation using SSIS… but wait what is the catch? The catch is the UPDATE will operate like a cursor processing one UPDATE command at a time, which can be a nightmare from a performance standpoint. You may be wondering how we can make this process set based. There is not much you can do in SSIS alone, so I will need to find other means. The best way to make this process more scalable is to leverage SQL Server and SSIS.

The first step is to drop the OLE DB command I just created. I will then drag a OLE DB Destination to the canvas and connect it to the Lookup Match Output. My canvas looks like below.

image

Next, I will switch gears and write some TSQL code. I will need to open SSMS and connect to the database where dbo.SSIS_Import exists. Firstly, I have to create a view. I will not use the view to select data, but will use the view as a intermediate object to insert data.

CREATE VIEW vw_SSIS_Import
AS
SELECT [SomeId],[SomeInt],[SomeChar]
FROM dbo.[SSIS_Import]
GO

Now that my view is created, I am going to create an INSTEAD of trigger, on my view. The instead of trigger will allow me to use my OLE DB Destination to bulk insert from SSIS. The bulk insert from SSIS gives me a mechanism to pass the rows for UPDATE, as a set of data. Once I have the rows in a set, the trigger can efficiently UPDATE the data.

CREATE TRIGGER trg_UPdate_SSIS_Import
ON dbo.vw_SSIS_Import
INSTEAD OF INSERT
AS 
BEGIN
 
    UPDATE import
    SET 
        SomeInt = i.SomeInt,
        SomeChar = i.SomeChar
    FROM dbo.SSIS_Import import
    INNER JOIN inserted i
        ON i.SomeId = import.SomeId
 
END
GO

In the code above, I am using efficient TSQL to update all rows that are inserted into the view, from the SSIS package. I am now ready to configure the OLE DB Destination. Double-click the OLE DB Destination to launch the configure dialog box. Make sure your package is using the right connection manager and choose the view. Click the “Column Mappings” tab and then click '”Ok.” Right-click the OLE DB Destination and choose properties. In the “Fast Load Options”, type “FIRE_TRIGGERS.” This allows the SSIS insert to fire the INSTEAD OF trigger, on the view. That’s it we are done!!! The view/INSTEAD OF TRIGGER method is extremely easy to configure and implement; however, it does require that additional database objects to be created.

There you have it. I have demonstrated two methods to UPSERT data, using SSIS. The first method relies on the OLE DB Command, which has the limitation of having to run for each and every row. The latter option is a more scalable solution that requires the creation of a view and trigger. Both methods have pros and cons and I leave it to you to determine which is best for your environment.

Until next time, Happy Coding.

PS: If you want to do all of the work in SQL, you can allow the trigger to perform the UPDATE and INSERT. This method reduces the complexity of the SSIS package because you only need a single source and destination, with no Lookup.

Download SSIS Package:

Note: The package was created in BIDS 2008, which is not compatible with BIDS 2005.

19 comments:

Anonymous said...

Why not just do a merge statement?

Adam Haines said...

Merge would be a great alternative to using the the update statment in the instead of trigger; however, this only works in SQL Server 2008. I wanted to post a solution that would reach a broader audience. By using the merge statement, in the instead of trigger, one could remove the lookup transformation makes the SSIS package much simplier.

I dont know how much of a performance gain the merge statement would offer, if any. I would guess that the performance between the two methods is likely to be similar. Perhaps, I will cover the performance benefits of merge vs the posted solution, in another post.

Thanks for the great feedback.

Piotr Rodak said...

Hey Adam, the trick with the trigger is really cool solution. I guess I'll have to revisit some of my packs..
Thanks! :)

Anonymous said...

What is the inserted part of the CREATE TRIGGER with the alias i, i.e The

inserted i

sqldeveloperramblings said...

Awesome post. I really like the use of both SSIS and SQL Server to come up with an ingenious solution. This is great for a SQL Server 2005 implimentation. I have to confess that I faced the same problems in 2008 and was able to leverage new functionality of table data types and the MERGE statement to accomplish the same task. Check out my post when you get the time:

http://sqldeveloperramblings.blogspot.com/2010/02/technorati-tags-dave-john-paul-this-is.html

Adam Haines said...

@sqldeveloperramblings,

Thanks for the great feedback!

I read through your post and will say touche. Thanks for presenting a 2008 solution. Your article is very concise and has great content. The merge statement is very powerful and makes process like this one very manageable and understandable, which makes it a viable option in SQL Server 2008 environments.

Thanks again for you feedback.

Pablo Peralta said...

I took your ideas and could make a simple SSIS package that when makes the insert, triggers a MERGE on the view an work like a charm!
thank you both!
PP

Anonymous said...

Thank you SOOOOO Much! I've been looking for an efficient SSIS 2005 solution to Update/Insert and your second option (UPSERT trigger with OLEDB Destination) was the key!
DeeDee Kruger

daspeac said...

I believe you have also heard about the fix microsoft exchange program

Anonymous said...

I am doing the method with the trigger, but the insert is blocking the update and the process never finish.

Adam Haines said...

What transaction isolation level is your database using? What locks is sql server using? Take a look at sys.dm_tran_locks during the blocking. If there is blocking, one of the transactions should finish. See if the transactons are in a runnable status. Another thing to look at is, the number of indexes on the table. Maybe the insert is taking extrememly long because of index maintenance or page splits?

Anonymous said...

I am seeing X locks. This is created by the Bulk Inert. This is strange. I might have messed-up the somewhere in the package.

Anonymous said...

I forgot to add, when there is only a few rows to add and lots to update, it works fine. The problem is when there are lots of row to update and lots to insert. The insert seems very very slow and never end.

Adam Haines said...

Sounds like lock escalation is taking a table exclusive lock because the thresholds are being breached.

Have a look at this, http://msdn.microsoft.com/en-us/library/ms184286.aspx

You can try disabling lock escalation on this particular table (SQL 2008 only) or disable lock escalation via trace flag (1211 or 1224). Please note that disabling lock escalation at the table level is not allowed, in SQL 2005.

This is probably the best way to determine, if your problem is lock escalation.

Sample to disable LE at the table level, in 2008:
ALTER TABLE dbo.MyTable SET (LOCK_ESCALATION = DISABLED);

Personally, i dont like the idea of disabling lock escalation, but you may be limited in this situation. If you are using 2008, you can use merge instead and this should eliminate the blocking issue.

Adam Haines said...

Obviously, you should do this on a test/dev instance and not in the production environment ;)

Anonymous said...

Hi Adam, I am using your technique but in my scenario my source is an oledb source which is sql server database and my id's are identity keys which are also primary keys. your package runs accurate when i use it with OLEDB Command, but it is not working with OLEDB Destination. It only adds new data to my destination db but doest not update the records.

Anonymous said...

I am getting this error message:

An OLE DB record is available. Source: "Microsoft SQL Server Native Client 10.0" Hresult: 0x80004005 Description: "'dbo.vw_MyTable' is not a user table. Cannot perform SET operation."

vw_MyTable is a view created on the table

Anonymous said...

I'm having a weird problem. In the SQL Command transformation for the Match output, there are no columns for me to map to in the Available Destination Column. It looks to me as if I have followed your directions exactly. The destination table has over 100 columns, if that matters? It's probably something simple that I missed, but I am stumped!

Anonymous said...

I simplified things by doing the lookup in the trigger itself. Works great. Will probably compare with a version using MERGE. Thanks for this awesome post.