TSQL: MERGE With EXCEPT to Handle Type 2 Loading

Type 2 Loading is a pattern that is often used to incrementally load dimensions in a data warehouse. But you can also use it for loading satellites in a data vault or loading tables in a historical/persistent staging area. It consist of three parts:

  1. Identify new rows and add them to the target database.
  2. Identify changed rows and add them to the target database.
  3. End date the old rows in the target database that have been replaced with those of step 2.

There are several ways to implement a Type 2 loading pattern. In this blog post I’ll describe using the TSQL statement: MERGE. I have come to like it and compared to other methods it seems to be a good performing option for handling slowly changing dimension type solutions. (See this study from Microsoft’s Matt Mason)

Warren Thornthwaite from the Kimball Group did a good explanation on using the SQL MERGE Statement for Slowly Changing Dimension in  Design Tip #107 Using the SQL MERGE Statement for Slowly Changing Dimension Processing.

In his example for processing type 2 dimension he uses a sample that is similar to the one below:

INSERT INTO dimCustomer (AuditID,StartDate,EndDate, CustomerID, CustomerName, Adress..)
SELECT @AuditID, GETDATE(),NULL,CustomerID, CustomerName, Adress,..

( MERGE DWH.dbo.dimCustomer tgt
USING STG.dbo.Customer src
ON src.CustomerID = tgt.CustomerID

INSERT VALUES (@AuditID, GETDATE(), NULL, CustomerID, CustomerName, Adress ..)

AND tgt.__EndDate IS NULL
AND (tgt.CustomerName <> src.CustomerName
AND (tgt.Adress <> src.Adress)


OUTPUT $Action Action_Out, src.CustomerID, CustomerName, Adress ..

The outcome of the MERGE statement is used as source for the INSERT INTO statement at the top.
(Afore mentioned step 2: Identify changed rows and add them to the target database)
In the Merge statement there are two actions:
1.   Insert any new rows
3.   End date the rows in the source that have changed.


A problem with this example is the implicit assumption that there are no NULL values stored in the columns. Because otherwise the highlighted expressions with nulls would yield to true (NULL<>NULL) and rows would unnecessary be added to the model.

There are however two SQL expressions that consider NULL Values as equal: INTERSECT and EXCEPT. So we can easily circumvent this problem by changing the highlighted rows into:

AND EXISTS (SELECT  tgt.CustomerName, tgt.Adress …
EXCEPT SELECT  src.CustomerName, src.Adress …)