Wednesday, October 14, 2009

SSIS – Importing Binary Files Into A VARBINARY(MAX) Column

Have you every had the need to import images, Word documents, Excel documents, or any other type of file into a SQL Server table?  If yes, you are in luck.  SQL Server 2005 gives us a powerful and scalable method to import binary files or documents, into a relational table.  I will be using the term binary file and document interchangeably throughout this post to describe a file on the file system.  I will be utilizing SSIS and the VARBINAARY(MAX) data type to import the document. Let us start by creating the sample DDL.  In this example, I will need a staging table and a table to house our binary data. 

USE [tempdb]
GO
 
IF OBJECT_ID('tempdb.dbo.[Doc_Stage]') IS NOT NULL
BEGIN
    DROP TABLE [dbo].[Doc_Stage];
END
GO
 
CREATE TABLE [dbo].[Doc_Stage](
DocId INT IDENTITY(1,1) PRIMARY KEY,
DocName VARCHAR(50) NOT NULL,
DocPath VARCHAR(1000) NOT NULL,
DocType VARCHAR(4) NOT NULL
);
GO
 
IF OBJECT_ID('tempdb.dbo.[Documents]') IS NOT NULL
BEGIN
    DROP TABLE [dbo].[Documents];
END
GO
 
CREATE TABLE [dbo].[Documents](
DocId INT IDENTITY(1,1) PRIMARY KEY,
DocName VARCHAR(50) NOT NULL,
Doc VARBINARY(MAX) NULL,
DocType VARCHAR(4)
);
GO

Next, I need to create the SSIS package.  The first step is to add an Execute SQL Task to the designer canvas.  Configure the Execute SQL Task, as shown below.

Note: You will need to create the source connection to the database where you created the DDL

image 

The SQL Statement that I used is below.

TRUNCATE TABLE [dbo].[Doc_Stage] ;

Next I will create a variable called FilePath that is of the string data type.  Now I am ready to add a ForEach Loop Container to the canvas.  Add the container to the canvas and configure it as shown below.

Note: I am grabbing all file types. If you only want a specific type, change the File to include the extension that you want.  E.g. *.jpg

image image

Next, I will have to create another Execute SQL Task, but this time, I have to drag the task into the ForEach Loop Container.  Configure the task as shown below.

image

The SQL Statement is presented below:

INSERT INTO [dbo].[Doc_Stage](DocName,DocPath,DocType) VALUES ('DocName','DocPath','jpg');

Now for the tricky part.  In this step, I have to build an expression to dynamically build an insert statement.  The insert will capture the document name, path, and type.  Click the expressions tab and create an expression, on the property SQLStatementSource (near the bottom).  Below is the code for the expression.

"INSERT INTO [dbo].[Doc_Stage](DocName,DocPath,DocType) VALUES ('" +
REPLACE(RIGHT(@[User::FilePath],FINDSTRING(Reverse(@[User::FilePath] ) ,"\\", 1)-1),RIGHT(@[User::FilePath],4),"")
+ "','" + @[User::FilePath] 
+ "','" + RIGHT( @[User::FilePath] ,FINDSTRING( REVERSE(@[User::FilePath] ),".",1)-1)
+ "');"
 

With that out of the way, we can press on.  The last task I will need is a Data Flow Task.  Drag the data flow task to the canvas.  Here is what my canvas looks like at present:

image

Open the Data Flow task and drag an OLE DB Source to the canvas and configure it as shown below.

image 

Make sure to click the Columns tab to set the column mappings.  Next, I will drag a Import Column Transformation to the canvas and configure it as shown below.

image image

Note: Make sure to take note of the output column’s LineageID.  You will need to take this ID and add it to the Input Column’s FileDataColumnID.

image image

We are almost there!!! The last step is to add the OLE DB Destination.  I will add the OLE DB Destination and configure it as shown below. Do not forget to click the Mappings tab to map the columns.

image

That’s it!!!! Click the debug button and all of the components should light up green. I have successfully implemented a document library solution that allows the insertion of any document type into SQL Server 2005.  Stay tuned because I will show you how to export the images to the file system using BCP and TSQL.

Until next time happy coding.

13 comments:

Gertman said...

Hi.

Excellent article! Helped alot.
You write that another article will show how to export the images again. Is this article written? Looking foreward to reading it.

Adam Haines said...

Gertman,

I have written a follow article on exporting binary files,http://jahaines.blogspot.com/2009/10/exporting-binary-files-to-file-system.html ,; however, I focused on using BCP, not SSIS. You can do this in SSIS using the same concept, as shown in this article. I found that it easier and more flexible to extract specific files, within the confines of SQL Server.

Sundar said...

Nice Blog........

Can you help me on "How to write & read back X++ container in SQL"?

Daniel Ballinger said...

Great article, thanks.

I found another post using OPENROWSET BULK to achieve a similar result. Insert binary data like images into SQL Server without front-end application

The same sample is also on MSDN:OPENROWSET (Transact-SQL) (See
D. Using OPENROWSET to bulk insert file data into a varbinary(max) column)

Vintage Freak said...

Sorry if I'm doing anything stupid. I'm copying and pasting your expression for inserting rows in the staging table. I keep on getting error that says my expression cannot be evaluated.

Can you help?

P.S. The OPENROWSET approach is easier but for reasons I cannot get into right now, it is not an option for me. So your approach is my only hope at this point. If I can get it to work that is

Vintage Freak said...

Allright, I finally got the single quotes and double quotes the way they should be. The expression to insert rows in the staging table for each filename discovered at the "FilePath" evaluates.

At runtime I get the following error.

"Must declare the scalar variable "@".".

For some reason, I'm thinking @[User::FilePath] is not being substituted correctly at run time by the ForEachLoop Container. Any help much appreciated.

Adam Haines said...

Hi Vintage Freak,

Make sure your expression is correct in the actual property; otherwise, the package will bomb on you. I like to give the actual property either an empty value or a literal dummy value. At runtime the expression will kick in and override the actual property value.

Vintage Freak said...

Yes, thanks. I figured that out :-). Having a dummy value certainly helped. My real problem is I cannot get the actual value get populated. @FilePath is just not getting initialized. I keep on getting error that it is not initialized.

I followed the instructions to the letter and checked and rechecked until my eyes are sore. IF I cannot iterate my files in the directory I supplied, then that's a dead end for me.

Just have to bribe someone so I can use OPENROWSET(). It is indeed so much easier and I have used it in the past. I'm in a unique situation here at a client site where they don't "trust me" to use "BULK Inserts". So at the moment the suggested approach using SSIS is my best bet, but unfortunately a dead end.

Thank you.

Adam Haines said...

@Vintage Freak,

The only thing that makes sense is the variable was created at the wrong scope. Make sure the variable is set at the package scope and the you have the correct property set. Something has to be configured differently.

Adam Haines said...

I saw that a lot of folks were having issues following this blog post and getting the binary import working. I have uploaded a working sample to my SkyDrive. All that is required to change is the path to the binary files.

https://skydrive.live.com/?cid=6f041c9a994564d8&sc=documents&uc=1&id=6F041C9A994564D8%21273

Anonymous said...

If I were to use the Export Column to export to a Binary file, what file extension should I give the Filepath

Nirav said...

I want to do the reverse. I have word and pdf documents inside sql table already in varbinary(max) column, I need to get all those pdf,doc files stored outside. Any hint?

Anonymous said...

Great article! I used your package successfully. My dba won't give us rights to do bulk import, but I got around it with the SSIS package!