SQL2012 Windowing Functions In The Data Warehouse–2. Reporting

This is the second post of a diptych on the magical windowing functions in data warehouse scenarios. With these functions you can greatly simplify the TSQL you write. Many complex queries with CTE’s, temp tables and sub queries can be rewritten to simpler, better maintainable and better performing queries.  In this post I’ll dive into some possibilities for reporting.

For the examples in this post I’ll use the Contoso Retail Data Warehouse database. A sample database for data warehouses provided by Microsoft.

Year To Date (YTD) Calculations

On the Internet you’ll find a lot of examples on using the running total technique to calculate year to date values. In this example I need the monthly sales and the YTD sales for every store.

SELECT CalendarMonth
  , StoreName
  , PeriodSales
  , SalesYTD = SUM(PeriodSales) OVER 
     (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth) FROM
  (
  SELECT CalendarYear
    , CalendarMonth
    , StoreName
    , PeriodSales = SUM(sal.SalesAmount)
  FROM FactSales sal
  JOIN DimDate dat ON sal.DateKey = dat.Datekey
  JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
  GROUP BY CalendarYear, CalendarMonth, StoreName
  ) SalesByMonth
ORDER BY StoreName, CalendarMonth

The sub query “SalesByMonth” aggregates the sales amount for every store per month. The windowing function SUM() OVER() calculates the YTD sales. Which will result in the required dataset:

image
The SUM(SUM()) OVER() Construction

Since you can use Windowing Functions over an aggregated we don’t need the sub query and we can simplify this query to:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , SalesYTD = SUM(SUM(SalesAmount)) OVER 
     (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth) FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarYear, CalendarMonth, StoreName
ORDER BY StoreName, CalendarMonth

The second SUM in: “SUM(SUM()) OVER() GROUP BY “ is used in conjunction with the GROUP BY clause to calculate the monthly sales first.
The first SUM in: “SUM(SUM()) OVER() GROUP BY  is then used in conjunction with the OVER clause to calculate the YTD sales.

Comparing to previous year

Adding the figures of the previous year as a comparison is a common reporting requirement. You can easily realize this by using the LAG function returning the results 12 months back in time. Building upon our earlier query:

SELECT *
  , PeriodSalesPrevYear = LAG(PeriodSales,12,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , YTDSalesPrevYear = LAG(SalesYTD,12,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
FROM
(
  SELECT CalendarMonth
    , StoreName
    , PeriodSales = SUM(SalesAmount)
    , SalesYTD = SUM(SUM(SalesAmount))
        OVER (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth)
  FROM FactSales sal
  JOIN DimDate dat ON sal.DateKey = dat.Datekey
  JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
  GROUP BY CalendarYear, CalendarMonth, StoreName
) Base
ORDER BY StoreName, CalendarMonth

Which results into:

image
How Do We Do Compared to the other stores?

In this example I use the RANK() Function to determine the store’s rank in the total monthly sales and the store’s sales as a percentage of the total monthly sales:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , StoreRank = RANK() OVER
      (PARTITION BY CalendarMonth ORDER BY SUM(SalesAmount) DESC)
  , StoreShare = 100*SUM(SalesAmount)/
      SUM(SUM(SalesAmount)) OVER (PARTITION BY CalendarMonth)
FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarMonth, StoreName

Compare to (Average of) Previous Periods

In a recent client engagement a report which was used to audit the monthly invoice process gave a lot of troubles. The SQL query behind it was very difficult to comprehend and consisted of several sub queries. By using windowing functions our team was able to greatly simplify the query. The requirement can be restated/simplified to our example as: Give us the current month sales, the previous 3 and the average of those previous 3. This is the resulting query:

SELECT CalendarMonth
  , StoreName
  , PeriodSales = SUM(SalesAmount)
  , SalesPrevPeriod1 = LAG(SUM(SalesAmount),1,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , SalesPrevPeriod2 = LAG(SUM(SalesAmount),2,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , SalesPrevPeriod3 = LAG(SUM(SalesAmount),3,0)
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
  , AveragePrevPeriods = AVG(SUM(SalesAmount))
      OVER (PARTITION BY StoreName ORDER BY CalendarMonth
      ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
FROM FactSales sal
JOIN DimDate dat ON sal.DateKey = dat.Datekey
JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
GROUP BY CalendarYear, CalendarMonth, StoreName
ORDER BY StoreName, CalendarMonth

image

I especially like the way you can use the window frame clause to limit the average to 3 periods: ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING

More Information: