Programming Samples

Click here to go to:



Excel VBA

Word VBA

MS Access

T-SQL

SSIS

SSRS

Power BI

Crystal Reports

SSAS

SQL Replication

C# Code

ASP .NET Code

Oracle PL/SQL

Database Diagramming


Back to Home Page


SSIS Merge Join Task

SSIS Merge Join to Compare 2 Excel Files

This article details how to compare 2 Excel Files - one with the data from a previous date and a current file with data that has been updated, deleted, or added to the file. This started because one of my bosses was bugging me about providing a report just showing what had changed between Excel File #1 and Excel File #2. Unfortunately, Excel does not have the very handy Compare option that Word does. I had imported both files into SQL Server or MS Access and compared the rows to provide a new Excel file and done manual comparison in Excel of the differences between cells using IF in a formula in another Sheet, but this was extra work that did not need to be done. This solution in SSIS leaves the files in their native format and with just a few steps through a Merge Join and a Conditional Split, I can produce the needed Excel file that summarizes the changes made between the original file and the new file very quickly.

Review of the Excel Files to be Compared

For this example, I have created 2 Excel files. One has the "original" data with an ID column of all of the Birds by Type and Count in it. The 2nd file contains the altered data - same ID column with changes to some of the counts, removal of some types of birds, and additions of new bird types and their counts.

Original File

Updated File

Sql Server Data Tools Tasks - SSIS

Open up SQL Server Data Tools (the version used here is for SQL server 2012 \ Visual Studio 2010). Add a Data Flow Task to the Package on the Control Flow Tab.

SSIS Control Flow Data Flow Task

Setting up the Source Connections

Set up the Connections to the Excel Files: In the Connection Managers section, Right click and select Add new connection.

New Connection

Select Excel from the list Browse to the location of the file. Repeat for ExcelFile2.xlsx

Connection Manager

In the data flow tab, add 2 Excel Sources - 1 for File 1, another for File 2. Right click on the 1st one and select Edit.

SSIS 2 Excel Data Sources

Browse to the location of the file and select the Sheet.

SSIS Excel Connection Manager

Select the file from the Connection Manager drop down and select the Sheet Name.

SSIS Connection Manager Drop Down

Adding the Merge Join

Add a Merge Join task to the data Flow tab. Right click on the Excel Source and select Show Advanced Editor. Click on the last tab and highlight the Excel Source Output.

SSIS Merge Join Task

In the Common Properties box, click next to IsSorted and change it to True.

SSIS Common Properties

Highlight the Output Column "id" and change SortKeyPosition to 1. Repeat this for Excel Source 1. The Data Source must have a Sorted column for the Merge Join task or you will get an Error Message.

SSIS Output Column Merge Join

Drag the blue arrow of the Excel Source 1 to the Merge Join.Select Full Left Join from the drop down box. Drag the blue arrow from Excel Source to the Merge Join and then Right Click on the Merge Join Task and select Edit.

SSIS Merge Join Data Flow

Select the columns from both Sources and change the drop down at the top to Left outer join. If you forgot which Excel source had the most recent data and you had the original file as the main file for your Left Outer Join, you can click on the Swap Inputs button to change the Sources.

Adding the Conditional Split

Drag and drop a Conditional Split on the Data Flow tab and drag the blue arrow from the Merge Join to it. Right click on the Conditional Split and select Edit.

SSIS Conditional Split

Add an Output Name to test for changes in the Bird Count, if anything has been removed (the Birds column would be Null), or if anything has been added (the IdOrig column would be Null).

SSIS Output Test

Make sure to set the Error on the Changes item to Ignore Error when configuring Error Output. The expression for Changes will evaluate to NULL for items that do not meet any of the criteria. This will result in an error when debugging or running the Package.

SSIS Configure Error Output

Adding a Derived Column

Add a Derived Column item to the Data Flow.

SSIS Derived Column in Data Flow

When you drag and drop the blue arrow onto the Derived Column task, a pop up will appear to ask you which output from the Conditional Split to use. The Changes Output is the information that we need.

SSIS input Output Selector

Add 3 new Derived Columns, one for the Adds, one for Changes, and one for Deletes. The syntax in the Expression box is a Ternary Operator where IF the condition stated is True, then do whatever is listed immediately following the "?". IF the condition is False, then do whatever is listed on the opposite side of the colon ":".

SSIS Multiple Derived Columns

Adding the Final "Change" Capture Excel File

Add another Connection for an Excel file called Changes. This file will have columns from both Sources and the 3 Derived columns. Add an Excel Destination to the Data Flow and set it to this new Connection. Set the mappings up between the Derived column item and the Excel Destination.

SSIS Column Mapping to Excel with Changes

Run the SSIS job to confirm that it works.

SSIS Run Job

By opening the Changes file, we can see what the differences are between the 2 files.