IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter

Integration in the data vault

And how we handle it in the DWH Deck

One of the key characteristics of using the Data Vault modeling and methodology is the shift of a large part of the ETL work upstream in the data flow. In pictures:

simple etl schema traditioneel datawarehouse

Fig 1. Simple ETL schema in a traditional data warehouse

 

simple etl schema datawarehouse with datavault

Fig 2. Simple ETL schema in a data warehouse with the data vault architecture

This shift comes with several advantages like:

  • auditable/traceable: we can recreate the source on a given day/time
  • flexibility: supporting (possible) multiple versions of the truth (e.g. when business rule change)

Dan Lindstedt in his article: Data Vault series 5 - Loading practices even argues that you shouldn't touch or change the data on the way into the Data Vault, but manipulate it according to business rules on the way out - to the Data Marts. However there is still some ETL from the sources to the data vault:

  • The typical conversion into the data vault structure with hubs, links and satellites.
  • Some acceptable minor changes and default values (article 2.1.3 and 2.1.4 of the Data Vault loading specification)
  • Integration based on business keys.

This integration based on business keys is a very powerful  aspect and sometimes overlooked. Consider the example of a large wholesaler with warehouses around the globe. By using a uniform business key for 'product' a lot of the integration already happens in a smooth way in the data vault. Creating the data marts and answering questions like what's the total stock for product xyz gets a lot easier this way.
For more information on this subject, see the blogpost of Ronald Damhof and the discussions in the comment area.

 

So how do we handle this integration in the DWH Deck?

  1. You can identify the business keys as described in Part 6 of the series describing DWH Deck's functionality.
  2. And make sure you use the same DWHName in de table section of the main application screen for all tables that describe the same object (e.g. Product)

When you do this the SQL statement to create the data vault table will only be generated once. The SQL statements to create the view and ETL stored procedure to import the data will be generated for each table.

» Similar Posts

  1. DWH DECK Part 6: Data Vault
  2. Dimensional modeling and Data Vault – a happy marriage?
  3. Four Approaches to Data Warehousing

» Trackbacks & Pingbacks

  1. The requirements of a Data Warehouse solution continually change during its life time. For some organizations you’ll typically expect more changes than others. Think of maturity, history of organizational changes: mergers, acquisitions etc. You should

    Four Approaches to Data Warehousing — mei 14, 2012 6:08

» Comments

    There are no comments.

Comments are closed