Truncate all Tables

TruncateAllTables

There are many reasons for me to blog. One of the less vain is to recover and reuse code I wrote earlier. In my current project we have a need to test, test and retest the initial load of the data warehouse. 

For this purpose I wrote this T-SQL script that:

  • Drops all foreign key constraints;
  • Truncates all the tables;
  • Recreates all foreign key constraints. I dropped earlier.
/*
Truncate all tables in a database, by:
– dropping all foreign key constraints
– truncating all the tables
– recreating the dropped foreign key constraints
*/
DECLARE @DropConstraints nvarchar(max)
DECLARE @TruncateTables nvarchar(max)
DECLARE @RecreateConstraints nvarchar(max)
SELECT
— DROP CONSTRAINTS
@DropConstraints = STUFF(
(
SELECT char(10) + ‘ALTER TABLE ‘
+ QUOTENAME(OBJECT_SCHEMA_NAME(f.parent_object_id))
+ ‘.’ + QUOTENAME(OBJECT_NAME(f.parent_object_id))
+ ‘ DROP CONSTRAINT ‘ + QUOTENAME(f.name)
FROM .sys.foreign_keys f
JOIN .sys.foreign_key_columns fc
ON f.OBJECT_ID = fc.constraint_object_id FOR XML PATH(”)
)
,1,1,”)
— TRUNCATE TABLES
, @TruncateTables = STUFF(
(
SELECT char(10) +’TRUNCATE TABLE ‘
+ QUOTENAME(OBJECT_SCHEMA_NAME(object_id))
+ ‘.’ + QUOTENAME(OBJECT_NAME(object_id))
FROM sys.tables FOR XML PATH(”)
)
,1,1,”)
— RECREATE CONSTRAINTS
, @RecreateConstraints = STUFF(
(
SELECT char(10) + ‘ALTER TABLE ‘
+ QUOTENAME(OBJECT_SCHEMA_NAME(f.parent_object_id))
+ ‘.’ + QUOTENAME(OBJECT_NAME(f.parent_object_id))
+ ‘ ADD CONSTRAINT ‘ + QUOTENAME(f.name)
+ ‘ FOREIGN KEY ( ‘+ COL_NAME(fc.parent_object_id, fc.parent_column_id) +’ )’
+ ‘ REFERENCES ‘ + QUOTENAME(OBJECT_SCHEMA_NAME (f.referenced_object_id))
+ ‘.’ + QUOTENAME(OBJECT_NAME (f.referenced_object_id))
+ ‘ ( ‘ + COL_NAME(fc.referenced_object_id, fc.referenced_column_id) +’ )’
FROM .sys.foreign_keys f
JOIN .sys.foreign_key_columns fc
ON f.OBJECT_ID = fc.constraint_object_id FOR XML PATH(”)
)
,1,1,”)
PRINT @DropConstraints
PRINT ‘————————————————–‘
PRINT @TruncateTables
PRINT ‘————————————————–‘
PRINT @RecreateConstraints
EXEC sp_executesql @DropConstraints;
EXEC sp_executesql @TruncateTables;
EXEC sp_executesql @RecreateConstraints

As an alternative based on a comment by Roland Bouman you can also use..

EXEC sp_msforeachtable “ALTER TABLE ? NOCHECK CONSTRAINT all”
EXEC sp_msforeachtable “TRUNCATE TABLE ?”
EXEC sp_msforeachtable “ALTER TABLE ? WITH CHECK CHECK CONSTRAINT all”

SQL2012 Windowing Functions In The Data Warehouse–2. Reporting

This is the second post of a diptych on the magical windowing functions in data warehouse scenarios. With these functions you can greatly simplify the TSQL you write. Many complex queries with CTE’s, temp tables and sub queries can be rewritten to simpler, better maintainable and better performing queries.  In this post I’ll dive into some possibilities for reporting.

For the examples in this post I’ll use the Contoso Retail Data Warehouse database. A sample database for data warehouses provided by Microsoft.

Year To Date (YTD) Calculations

On the Internet you’ll find a lot of examples on using the running total technique to calculate year to date values. In this example I need the monthly sales and the YTD sales for every store.

SELECT CalendarMonth
  , StoreName
  , PeriodSales
  , SalesYTD = SUM(PeriodSales) OVER 
     (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth) FROM
  (
  SELECT CalendarYear
    , CalendarMonth
    , StoreName
    , PeriodSales = SUM(sal.SalesAmount)
  FROM FactSales sal
  JOIN DimDate dat ON sal.DateKey = dat.Datekey
  JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
  GROUP BY CalendarYear, CalendarMonth, StoreName
  ) SalesByMonth
ORDER BY StoreName, CalendarMonth

The sub query “SalesByMonth” aggregates the sales amount for every store per month. The windowing function SUM() OVER() calculates the YTD sales. Which will result in the required dataset:

image
The SUM(SUM()) OVER() Construction

Since you can use Windowing Functions over an aggregated we don’t need the sub query and we can simplify this query to:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , SalesYTD = SUM(SUM(SalesAmount)) OVER 
     (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth) FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarYear, CalendarMonth, StoreName
ORDER BY StoreName, CalendarMonth

The second SUM in: “SUM(SUM()) OVER() GROUP BY “ is used in conjunction with the GROUP BY clause to calculate the monthly sales first.
The first SUM in: “SUM(SUM()) OVER() GROUP BY  is then used in conjunction with the OVER clause to calculate the YTD sales.

Comparing to previous year

Adding the figures of the previous year as a comparison is a common reporting requirement. You can easily realize this by using the LAG function returning the results 12 months back in time. Building upon our earlier query:

SELECT *
  , PeriodSalesPrevYear = LAG(PeriodSales,12,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , YTDSalesPrevYear = LAG(SalesYTD,12,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
FROM
(
  SELECT CalendarMonth
    , StoreName
    , PeriodSales = SUM(SalesAmount)
    , SalesYTD = SUM(SUM(SalesAmount))
        OVER (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth)
  FROM FactSales sal
  JOIN DimDate dat ON sal.DateKey = dat.Datekey
  JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
  GROUP BY CalendarYear, CalendarMonth, StoreName
) Base
ORDER BY StoreName, CalendarMonth

Which results into:

image
How Do We Do Compared to the other stores?

In this example I use the RANK() Function to determine the store’s rank in the total monthly sales and the store’s sales as a percentage of the total monthly sales:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , StoreRank = RANK() OVER
      (PARTITION BY CalendarMonth ORDER BY SUM(SalesAmount) DESC)
  , StoreShare = 100*SUM(SalesAmount)/
      SUM(SUM(SalesAmount)) OVER (PARTITION BY CalendarMonth)
FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarMonth, StoreName

Compare to (Average of) Previous Periods

In a recent client engagement a report which was used to audit the monthly invoice process gave a lot of troubles. The SQL query behind it was very difficult to comprehend and consisted of several sub queries. By using windowing functions our team was able to greatly simplify the query. The requirement can be restated/simplified to our example as: Give us the current month sales, the previous 3 and the average of those previous 3. This is the resulting query:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , SalesPrevPeriod1 = LAG(SUM(SalesAmount),1,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , SalesPrevPeriod2 = LAG(SUM(SalesAmount),2,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , SalesPrevPeriod3 = LAG(SUM(SalesAmount),3,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , AveragePrevPeriods = AVG(SUM(SalesAmount))
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth
      ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarYear, CalendarMonth, StoreName
ORDER BY StoreName, CalendarMonth

image

I especially like the way you can use the window frame clause to limit the average to 3 periods: ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING

More Information:

SQL2012 Windowing Functions In The Data Warehouse–1. Dimensions

Windowing functions, introduced in SQL Server 2005 and greatly enhanced in SQL Server 2012, have something magical: Within the context of one row in the result set you have access to the contents of the other rows of the result set.

With these functions you can greatly simplify the TSQL you write. Many complex queries with CTE’s, temp tables and sub queries can be rewritten to more simple, better maintainable and better performing queries.  In this post I’ll dive into some possibilities for dimensions.

In SQL 2008 and earlier

In SQL 2008 and earlier I will typically build a dimension table based upon the Type 2 Slowly Changing Dimensions system. And then use a view with a self join to present the user with the historical and/or current attribute values. I will use this small example of a customer dimension:

image

When Marco moved to Paris on July 20th 1988 and married Jose for both of them a new row was added with the new attributes and the EndDate of the old row was changed to the date of the change. This is how attributes changes are handled for Type 2 Slowly Changing Dimensions.

To consume the dimension information I will typically use a view in the model schema as a source for Analysis Services / PowerPivot / Report:

CREATE VIEW model.Customer AS
SELECT his.Id
  , his.Number
  , his.Name
  , his.City
  , CurrentCity = cur.City 
  , his.MaritalStatus
  , CurrentMaritalStatus = cur.MaritalStatus
FROM dim.Customer his
JOIN dim.Customer cur
  ON his.Number = cur.Number
WHERE cur.EndDate =’9999-12-31′

Which will result into:

image
In SQL 2012  Using the LAST_VALUE Function

If your not working for the  Oracle at Delphi the last value will typically be the current value of an attribute. So in 2012 this view can be replaced with:

CREATE VIEW model.Customer AS
SELECT Id
  , Number
  , Name
  , City
  , CurrentCity = LAST_VALUE(City)
    OVER(PARTITION BY Number ORDER BY StartDate
     ROWS BETWEEN UNBOUNDED PRECEDING
     AND UNBOUNDED FOLLOWING)
  , MaritalStatus
  , CurrentMaritalStatus = LAST_VALUE(MaritalStatus)
    OVER(PARTITION BY Number ORDER BY StartDate
     ROWS BETWEEN UNBOUNDED PRECEDING
     AND UNBOUNDED FOLLOWING)
FROM dim.Customer

Although the LAST_VALUE function seems a bit awkward to write due to the long window frame clause: “ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING” it has some nice advantages:

  • No need to use a self join, which will enhance performance.
  • No need to use the EndDate column.
  • Much easier to maintain. Because the purpose of the LAST_VALUE function will be more obvious for your successors.
SCD Type 0 with the FIRST_VALUE Function

Occasionally you may stumble upon a request/requirement to show the original value of an attribute. (e.g. the sales that landed the customer). In that case you can simply add a column using the FIRST_VALUE function:

FirstCity = FIRST_VALUE(City) 
  OVER(PARTITION BY Number ORDER BY StartDate)

Mapping Queries Using Windowing Functions

When you load fact tables you will want to lookup the surrogate keys of the dimensions. In the most simple variant you would use (in SQL 2008)

SELECT Number, Id FROM dim.Customer
  WHERE EndDate =’9999-12-31′

In SQL 2012, assuming you will not store the EndDate in your ETL process, you can use:

SELECT Number, Id FROM
(SELECT Number, Id, RowNumber = ROW_NUMBER() OVER(PARTITION BY Number ORDER BY StartDate DESC)
FROM dim.Customer) Sub
WHERE RowNumber=1

Unfortunately you will have to use the sub query construct here because it’s not yet possible to use Windowing Functions in the WHERE clause.

But why Would you not add an EndDate in the ETL Process?

If you don’t end date rows the ETL process gets much easier, faster and less error-prone: You don’t have the distinguish between new and changed rows you’ll just add both in the same way to the dimension table. And you don’t have to identify and update the ‘old’ rows.

And of course if you really need the EndDate you can just get if with the new LEAD function:

EndDate = LEAD(StartDate, 1, ‘9999-12-31’)
    OVER(PARTITION BY Number ORDER BY StartDate)

More Information:

MSDN (Books on Line) about the OVER Clause

TSQL: MERGE With EXCEPT to Handle Type 2 Loading

Type 2 Loading is a pattern that is often used to incrementally load dimensions in a data warehouse. But you can also use it for loading satellites in a data vault or loading tables in a historical/persistent staging area. It consist of three parts:

  1. Identify new rows and add them to the target database.
  2. Identify changed rows and add them to the target database.
  3. End date the old rows in the target database that have been replaced with those of step 2.
MERGE

There are several ways to implement a Type 2 loading pattern. In this blog post I’ll describe using the TSQL statement: MERGE. I have come to like it and compared to other methods it seems to be a good performing option for handling slowly changing dimension type solutions. (See this study from Microsoft’s Matt Mason)

Warren Thornthwaite from the Kimball Group did a good explanation on using the SQL MERGE Statement for Slowly Changing Dimension in  Design Tip #107 Using the SQL MERGE Statement for Slowly Changing Dimension Processing.

In his example for processing type 2 dimension he uses a sample that is similar to the one below:

INSERT INTO dimCustomer (AuditID,StartDate,EndDate, CustomerID, CustomerName, Adress..)
SELECT @AuditID, GETDATE(),NULL,CustomerID, CustomerName, Adress,..
FROM

( MERGE DWH.dbo.dimCustomer tgt
USING STG.dbo.Customer src
ON src.CustomerID = tgt.CustomerID

WHEN NOT MATCHED THEN
INSERT VALUES (@AuditID, GETDATE(), NULL, CustomerID, CustomerName, Adress ..)

WHEN MATCHED
AND tgt.__EndDate IS NULL
AND (tgt.CustomerName <> src.CustomerName
AND (tgt.Adress <> src.Adress)

..
THEN
UPDATE SET tgt.EndDate = GETDATE()

OUTPUT $Action Action_Out, src.CustomerID, CustomerName, Adress ..
) AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = ‘UPDATE’;

The outcome of the MERGE statement is used as source for the INSERT INTO statement at the top.
(Afore mentioned step 2: Identify changed rows and add them to the target database)
In the Merge statement there are two actions:
1.   Insert any new rows
3.   End date the rows in the source that have changed.

NULL <> NULL

A problem with this example is the implicit assumption that there are no NULL values stored in the columns. Because otherwise the highlighted expressions with nulls would yield to true (NULL<>NULL) and rows would unnecessary be added to the model.

There are however two SQL expressions that consider NULL Values as equal: INTERSECT and EXCEPT. So we can easily circumvent this problem by changing the highlighted rows into:

AND EXISTS (SELECT  tgt.CustomerName, tgt.Adress …
EXCEPT SELECT  src.CustomerName, src.Adress …)