Building a data warehouse while the source systems are still in development

Some years ago the client of a contractor I worked for made some major investments in their IT-landscape. They replaced their ERP – and CRM system and of course BI/Reporting was part of the acceptance criteria for the new system. It was a big project and a disaster! ERP, CRM and BI consultants from the contractor where running in and out. Discussing requirement details with the client and adapting the systems to these wishes. For the BI team it was especially hard. When we build something on Monday chances were slim that it still worked on Thursday. We depended upon the ERP and CRM team to communicate their changes and provide us with correct test data. And there was no love wasted between the teams. I was very glad when the opportunity arose to leave this war hole and move on. And I did.


Nevertheless it seems to become a trend to build/adapt the BI system before a new source system has been launched. And make it part of the acceptance criteria for the new system. This of course offers the client an easy way to verify the new system by comparing the reports they work with. In my previous three projects (part) of the source system was still in development and I would probably have gone crazy keeping up with all the changes if I hadn’t been able to automate the work.

Agility is the name of the game


In these situations I have found that you (as in an agility drill) need three things:

  • Light feet:
    An alert/reporting system that gives you the insights in the changes that are made in the source systems. I’ll typically import metadata of the source systems tables, views and columns in the database on a daily basis.
  • Run on balls of feet:
    data warehouse automation framework that automates the creation of the data warehouse objects (Tables, Views, Primary keys, Foreign key relations and Indexes) and the ETL procedures (Stored procedures and SSIS packages) to load the data into the data warehouse.
    I have written extensively about this earlier.
  • Use Arms (to keep your balance):
    testing framework that daily checks and alerts you about the results, compared to the results the business users expect.

Testing Framework

A testing framework has a complicated ring to it. But if you keep it simple and focus on the results the users expect to see it will be easy and the advantages are great:

  • You’ll notice any errors early in the process.
  • You can correct the errors before the users’ starts testing.
  • You can pinpoint any changes made in the source system that ruin your reports.
  • You’ll gain confident in your solution and sleep better.

I’ll describe a method that has worked for me. Key to success herein is starting early!

First get hard copies and digital versions of reports the users are using now. Ask them to highlight the important measures. Start entering the tests in an Excel spreadsheet and use the digital versions of the reports to copy any relevant data) I use the following format where I keep values in the column testName unique


Now the difficult/laborious part: start entering SQL statement in the expression column that will return the expected value. Use [criterion1] and/or [criterion2] as placeholders in these statements. The values in the criterion1 and criterion2 column will replace these placeholders at execution time. You will now be able to copy the expression to similar test with different values for the criteria. Example expression:
SELECT CAST(SUM(f.sales) AS int)
FROM factSales f
INNER JOIN dimDate d
ON f.DateKey = d,DateKey
INNER JOIN dimChannel ch
ON f.ChannelKey = ch.ChannelKey
WHERE d.YearMonth = [criterion1]
AND ch.ChannelName = ‘[criterion2]’
/* SQL code of example test expression */

Testing Framework: Tables

Import your Excel spreadsheet into a SQL Server table with the following definition:
testName nvarchar(64)
, testType nvarchar(20)
, criterion1 nvarchar(128)
, criterion2 nvarchar(128)
, expectedValue int
, expression nvarchar(1024)
/* SQL code to create the DataTests table */

Create an additional table to store the result of the tests that were performed. Use the following definition:
CREATE TABLE DataTestResults (
testName nvarchar(64)
, testDate datetime default getdate()
, actualValue int
, expectedValue int
/* SQL code to create the DataTestResults table */

Testing Framework: SSIS Package

Now create an SSIS Package that will perform the tests. It uses:

  • An Execute SQL Task to get a list of tests
  • A For Each Loop Container that is used to loop through the list with in it:
  • An Execute SQL Task to execute every test
  • An Execute SQL Task to log the result of every test

This package can be described with the following BIML (Business Intelligence Markup Language) snippet:

<Package Name="PKG TestData" ConstraintMode="Linear">
<Variable Name="TestList" DataType="Object" />
<Variable Name="testQuery" DataType="String" />
<Variable Name="testName" DataType="String" />
<Variable Name="expectedValue" DataType="Int32"> 0 </Variable>
<Variable Name="actualValue" DataType="Int32"> 0 </Variable>
<!--Get list with tests to be performed—>
<ExecuteSQL Name="SQL GetTestList" ConnectionName="META" ResultSet="Full">
SELECT testName
, testQuery = REPLACE(REPLACE(expression,'[criterion1]',criterion1),'[criterion2]',criterion2) , expectedValue
FROM DataTests
WHERE testType='DWH'
<Result Name="0" VariableName="User.TestList" />
<!--Loop through tests and perform them—>
<ForEachAdoLoop Name="FELC Test" SourceVariableName="User.TestList" ConstraintMode="Linear" >
<VariableMapping Name="0" VariableName="User.testName" />
<VariableMapping Name="1" VariableName="User.testQuery" />
<VariableMapping Name="2" VariableName="User.expectedValue" />
<!--Perform test—>
<ExecuteSQL Name="SQL Execute Test" ConnectionName="DWH" ResultSet="SingleRow" >
<VariableInput VariableName="User.testQuery" />
<Result Name="0" VariableName="User.actualValue" />
<!--Log test result—>
<ExecuteSQL Name="SQL Log Test Result" ConnectionName="META" ResultSet="None">
INSERT INTO DataTestResults (testName,actualValue,expectedValue) VALUES (?,?,?)
<Parameter Name="0" Direction="Input" DataType="String" VariableName="User.testName" />
<Parameter Name="1" Direction="Input" DataType="Int32" VariableName="User.actualValue" />
<Parameter Name="2" Direction="Input" DataType="Int32" VariableName="User.expectedValue" />
*/ BIML code to create the testing ssis package */


In this blog post I discussed the components of an agile system that are necessary when you build a data warehouse where the source systems are still in development:

  • An Alert System to inform you of changes in the source systems.
  • A Data Warehouse Automation Framework.
  • A Testing Framework.

I ended with a simple implementation of a testing framework that worked for me. Does this help you or did you implement something similar? … I love to hear from you!