IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter
$category.Body

Category: 'Microsoft Business Intelligence'

  • ma
    28
    jul 14

    Truncate all Tables

    There are many reasons for me to blog. One of the less vain is to recover and reuse code I wrote earlier. In my current project we have a need to test, test and retest the initial load of the data warehouse. TruncateAllTables

    For this purpose I wrote this T-SQL script that:

    • Drops all foreign key constraints;
    • Truncates all the tables;
    • Recreates all foreign key constraints. I dropped earlier.
  • do
    13
    jun 13

    SQL2012 Windowing Functions In The Data Warehouse–2. Reporting

    This is the second post of a diptych on the magical windowing functions in data warehouse scenarios. With these functions you can greatly simplify the TSQL you write. Many complex queries with CTE’s, temp tables and sub queries can be rewritten to simpler, better maintainable and better performing queries.  In this post I’ll dive into some possibilities for reporting.

    For the examples in this post I’ll use the Contoso Retail Data Warehouse database. A sample database for data warehouses provided by Microsoft.

    Year To Date (YTD) Calculations

    On the Internet you’ll find a lot of examples on using the running total technique to calculate year to date values. In this example I need the monthly sales and the YTD sales for every store.

    SELECT CalendarMonth
      , StoreName
      , PeriodSales
      , SalesYTD = SUM(PeriodSales) OVER 
         (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth) FROM
      (
      SELECT CalendarYear
        , CalendarMonth
        , StoreName
        , PeriodSales = SUM(sal.SalesAmount)
      FROM FactSales sal
      JOIN DimDate dat ON sal.DateKey = dat.Datekey
      JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
      GROUP BY CalendarYear, CalendarMonth, StoreName
      ) SalesByMonth
    ORDER BY StoreName, CalendarMonth

    The sub query “SalesByMonth” aggregates the sales amount for every store per month. The windowing function SUM() OVER() calculates the YTD sales. Which will result in the required dataset:

    image

     

    The SUM(SUM()) OVER() Construction

    Since you can use Windowing Functions over an aggregated we don’t need the sub query and we can simplify this query to:

    SELECT CalendarMonth
      , StoreName
      , PeriodSales = SUM(SalesAmount)
      , SalesYTD = SUM(SUM(SalesAmount)) OVER 
         (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth) FROM FactSales sal
    JOIN DimDate dat ON sal.DateKey = dat.Datekey
    JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
    GROUP BY CalendarYear, CalendarMonth, StoreName
    ORDER BY StoreName, CalendarMonth

    The second SUM in: “SUM(SUM()) OVER() GROUP BY “ is used in conjunction with the GROUP BY clause to calculate the monthly sales first.
    The first SUM in: SUM(SUM()) OVER() GROUP BY is then used in conjunction with the OVER clause to calculate the YTD sales.

     

    Comparing to previous year

    Adding the figures of the previous year as a comparison is a common reporting requirement. You can easily realize this by using the LAG function returning the results 12 months back in time. Building upon our earlier query:

    SELECT *
      , PeriodSalesPrevYear = LAG(PeriodSales,12,0)
          OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
      , YTDSalesPrevYear = LAG(SalesYTD,12,0)
          OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
    FROM
    (
      SELECT CalendarMonth
        , StoreName
        , PeriodSales = SUM(SalesAmount)
        , SalesYTD = SUM(SUM(SalesAmount))
            OVER (PARTITION BY StoreName, CalendarYear ORDER BY CalendarMonth)
      FROM FactSales sal
      JOIN DimDate dat ON sal.DateKey = dat.Datekey
      JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
      GROUP BY CalendarYear, CalendarMonth, StoreName
    ) Base
    ORDER BY StoreName, CalendarMonth

    Which results into:

    image

     

    How Do We Do Compared to the other stores?

    In this example I use the RANK() Function to determine the store’s rank in the total monthly sales and the store’s sales as a percentage of the total monthly sales:

    SELECT CalendarMonth
      , StoreName
      , PeriodSales = SUM(SalesAmount)
      , StoreRank = RANK() OVER
         
    (PARTITION BY CalendarMonth ORDER BY SUM(SalesAmount) DESC)
      , StoreShare = 100*SUM(SalesAmount)/
          SUM(SUM(SalesAmount)) OVER (PARTITION BY CalendarMonth)
    FROM FactSales sal
    JOIN DimDate dat ON sal.DateKey = dat.Datekey
    JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
    GROUP BY CalendarMonth, StoreName

    image

     

    Compare to (Average of) Previous Periods

    In a recent client engagement a report which was used to audit the monthly invoice process gave a lot of troubles. The SQL query behind it was very difficult to comprehend and consisted of several sub queries. By using windowing functions our team was able to greatly simplify the query. The requirement can be restated/simplified to our example as: Give us the current month sales, the previous 3 and the average of those previous 3. This is the resulting query:

    SELECT CalendarMonth
      , StoreName
      , PeriodSales = SUM(SalesAmount)
      , SalesPrevPeriod1 = LAG(SUM(SalesAmount),1,0)
          OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
      , SalesPrevPeriod2 = LAG(SUM(SalesAmount),2,0)
          OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
      , SalesPrevPeriod3 = LAG(SUM(SalesAmount),3,0)
          OVER (PARTITION BY StoreName ORDER BY CalendarMonth)
      , AveragePrevPeriods = AVG(SUM(SalesAmount))
          OVER (PARTITION BY StoreName ORDER BY CalendarMonth
          ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)
    FROM FactSales sal
    JOIN DimDate dat ON sal.DateKey = dat.Datekey
    JOIN DimStore sto ON sal.StoreKey = sto.StoreKey
    GROUP BY CalendarYear, CalendarMonth, StoreName
    ORDER BY StoreName, CalendarMonth

    image

    I especially like the way you can use the window frame clause to limit the average to 3 periods: ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING

    More Information:

    My previous post on using Windowing Functions focused on dimensions 

    MSDN (Books on Line) about the OVER Clause

    Introduction blog series on Windowing Functions by Fabiano Amorim on simple talk

    Blog post with a instructional video on the SQL 2012 Windowing Functions Leaving the Windows Open by Jeremiah Peschka

    Book: Microsoft® SQL Server® 2012 High-Performance T-SQL Using Window Functions by Itzik Ben-Gan

  • vr
    24
    mei 13

    SQL2012 Windowing Functions In The Data Warehouse–1. Dimensions

    Windowing functions, introduced in SQL Server 2005 and greatly enhanced in SQL Server 2012, have something magical: Within the context of one row in the result set you have access to the contents of the other rows of the result set.

    With these functions you can greatly simplify the TSQL you write. Many complex queries with CTE’s, temp tables and sub queries can be rewritten to more simple, better maintainable and better performing queries.  In this post I’ll dive into some possibilities for dimensions.

    In SQL 2008 and earlier

    In SQL 2008 and earlier I will typically build a dimension table based upon the Type 2 Slowly Changing Dimensions system. And then use a view with a self join to present the user with the historical and/or current attribute values. I will use this small example of a customer dimension:

    image

    When Marco moved to Paris on July 20th 1988 and married Jose for both of them a new row was added with the new attributes and the EndDate of the old row was changed to the date of the change. This is how attributes changes are handled for Type 2 Slowly Changing Dimensions.

    To consume the dimension information I will typically use a view in the model schema as a source for Analysis Services / PowerPivot / Report:

    CREATE VIEW model.Customer AS
    SELECT his.Id
      , his.Number
      , his.Name
      , his.City
      , CurrentCity = cur.City 
      , his.MaritalStatus
      , CurrentMaritalStatus = cur.MaritalStatus
    FROM dim.Customer his
    JOIN dim.Customer cur
      ON his.Number = cur.Number
    WHERE cur.EndDate ='9999-12-31'

    Which will result into:

    image

     

    In SQL 2012  Using the LAST_VALUE Function

    If your not working for the  Oracle at Delphi the last value will typically be the current value of an attribute. So in 2012 this view can be replaced with:

    CREATE VIEW model.Customer AS
    SELECT Id
      , Number
      , Name
      , City
      , CurrentCity = LAST_VALUE(City)
        OVER(PARTITION BY Number ORDER BY StartDate
         ROWS BETWEEN UNBOUNDED PRECEDING
         AND UNBOUNDED FOLLOWING)
      , MaritalStatus
      , CurrentMaritalStatus = LAST_VALUE(MaritalStatus)
        OVER(PARTITION BY Number ORDER BY StartDate
         ROWS BETWEEN UNBOUNDED PRECEDING
         AND UNBOUNDED FOLLOWING)
    FROM dim.Customer

    Although the LAST_VALUE function seems a bit awkward to write due to the long window frame clause: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING” it has some nice advantages:

    • No need to use a self join, which will enhance performance.
    • No need to use the EndDate column.
    • Much easier to maintain. Because the purpose of the LAST_VALUE function will be more obvious for your successors.

     

    SCD Type 0 with the FIRST_VALUE Function

    Occasionally you may stumble upon a request/requirement to show the original value of an attribute. (e.g. the sales that landed the customer). In that case you can simply add a column using the FIRST_VALUE function:

    FirstCity = FIRST_VALUE(City) 
      OVER(PARTITION BY Number ORDER BY StartDate)

     

    Mapping Queries Using Windowing Functions

    When you load fact tables you will want to lookup the surrogate keys of the dimensions. In the most simple variant you would use (in SQL 2008)

    SELECT Number, Id FROM dim.Customer
      WHERE EndDate ='9999-12-31'

    In SQL 2012, assuming you will not store the EndDate in your ETL process, you can use:

    SELECT Number, Id FROM
    (SELECT Number, Id, RowNumber = ROW_NUMBER() OVER(PARTITION BY Number ORDER BY StartDate DESC)
    FROM dim.Customer) Sub
    WHERE RowNumber=1

    Unfortunately you will have to use the sub query construct here because it’s not yet possible to use Windowing Functions in the WHERE clause.

     

    But why Would you not add an EndDate in the ETL Process?

    If you don’t end date rows the ETL process gets much easier, faster and less error-prone: You don’t have the distinguish between new and changed rows you’ll just add both in the same way to the dimension table. And you don’t have to identify and update the ‘old’ rows.

    And of course if you really need the EndDate you can just get if with the new LEAD function:

    EndDate = LEAD(StartDate, 1, '9999-12-31')
        OVER(PARTITION BY Number ORDER BY StartDate)

     

    More Information:

    MSDN (Books on Line) about the OVER Clause

    Jamie Thomson Debunking Kimball Effective Dates part 2 – Windowing Functions

    Introduction blog series on Windowing Functions by Fabiano Amorim on simple talk

    Blog post with a instructional video on the SQL 2012 Windowing Functions Leaving the Windows Open by Jeremiah Peschka

    Book: Microsoft® SQL Server® 2012 High-Performance T-SQL Using Window Functions by Itzik Ben-Gan

  • do
    03
    jan 13

    Logging in SSIS with BIML 1 – Logging Task Execution

    The project deployment model in SSIS (2012) offers great default logging functionality. However you may find yourself in need of additional functionality, especially if you use the package deployment model or an earlier version of SQL Server. In this series on logging I dive into 3 methods:

    1. Logging Task Execution using the OnPostExecute Event
      (this post)
    2. Logging Package Execution in the control flow.
    3. Logging Errors using the OnError Event.

    In this post I’ll describe the first method: Logging Task Execution using the OnPostExecute Event. This event is fired after the execution of each task and package. And thus provides a great way to log relevant information like package duration and row counts in the data flow.

    The SsisEventLog Table

    To store these events we need a table:

    CREATE TABLE  meta.SsisEventLog (
          EventID int IDENTITY(1,1) NOT NULL
        , ExecutionID uniqueidentifier 
        , EventLogDate datetime 
        , PackageStartDate datetime 
        , PackageDuration AS datediff(second, PackageStartDate
            , EventLogDate) PERSISTED
        , PackageGUID uniqueidentifier 
        , PackageVersionGUID   uniqueidentifier 
        , PackageName  varchar(50)
        , TaskGUID   uniqueidentifier 
        , TaskName   varchar(50)
        , ExtractRowCount int 
        , InsertRowCount int 
        , UpdateRowCount int 
        , CONSTRAINT  PK_SsisEventLog 
            PRIMARY KEY CLUSTERED (EventID)
    )

    Some remarks:

    • The PackageDuration column is calculated using the datediff function. It calculates the difference in seconds between the time the event was fired and the start date of the package.
    • If the TaskGUID equals to the PackageGUID the event is fired directly after the package was executed.
    • The row count columns will be filled by variables we define in the package and we’ll have to find a way to reset them after logging the event.

    To log this information we’ll use a stored procedures that writes the information in the SsisEventLog table and returns a row with 3 columns, all with the value of 0 to reset the variables after logging.

    CREATE PROCEDURE [meta].[uspLogEvent]      
          @ExecutionID [uniqueidentifier]
        , @PackageStartDate [datetime]
        , @PackageGUID [uniqueidentifier]
        , @PackageVersionGUID [uniqueidentifier]
        , @PackageName [varchar](50)
        , @TaskGUID [uniqueidentifier]
        , @TaskName [varchar](50)
        , @ExtractRowCount [int]
        , @InsertRowCount [int]
        , @UpdateRowCount [int]
    AS       

    INSERT INTO [meta].[SsisEventLog](
          ExecutionID
        , EventLogDate
        , PackageStartDate
        , PackageGUID
        , PackageVersionGUID
        , PackageName
        , TaskGUID
        , TaskName
        , ExtractRowCount
        , InsertRowCount
        , UpdateRowCount
    )
    VALUES(
          @ExecutionID
        , SYSDATETIME()   
        , @PackageStartDate
        , @PackageGUID
        , @PackageVersionGUID
        , @PackageName
        , @TaskGUID
        , @TaskName
        , @ExtractRowCount
        , @InsertRowCount
        , @UpdateRowCount
    );
    SELECT 0 AS ExtractRowCount
        , 0 AS InsertRowCount
        , 0 AS UpdateRowCount
    ;

    Creating the log event handler in the package

    Before we can create this event handler you have to create three variables of type Int32 with value 0 as in this image:

    OnPostExecuteVariables

    To create the event handler in a package open the Event Handlers tab and select the package in the Executable dropdown box and select the OnPostExecute in the Event handler dropdown box.

    Next drop a Execute SQL Task from the toolbox on the canvas and use the following setting in the General tab:

    • Name: SQL Log Event
    • ResultSet: Single row
    • ConnectionType: OLE DB
    • Connection: The name of the connection with your log table
    • SQLSourceType: Direct imput
    • SQLStatement:
      EXEC [meta].[uspLogEvent] ?,?,?,?,?,?,?,?,?,?

     

    OnPostExecute1

    In the Parameter Mapping tab add 9 mappings using the following variable names:

    • System:: ExecutionInstanceGUID
    • System:: StartTime
    • System:: PackageID
    • System:: VersionGUID
    • System:: PackageName
    • System:: SourceID
    • System:: SourceName
    • User:: RcExtract
    • User:: RcInsert
    • User:: RcUpdate

    OnPostExecute2

    In the Result Set tab add 3 results as in the image below.

    OnPostExecute3

    And we are ready with the implementation of the event handler in the SSIS Package.

    The Results

    After executing a package with one Execute SQL Task and one Data Flow Task you will see three rows in the SsisEventLog table. Two for the tasks and one for the package:

    OnPostExecuteResult

    OnPostExecuteResult2

    Using this method you can now easily troubleshoot any (performance) issues with the package.

    Using BIML to implement the event handler

    To implement this event handler with BIML use the following code snippets in the BIMLScript that creates the package:

    <!--Variables-->
    <Variables>
        <Variable Name="RcExtract"
            DataType="Int32"
            Namespace="User" >0</Variable>
        <Variable Name="RcInsert"
            DataType="Int32"
            Namespace="User" >0</Variable>
        <Variable Name="RcUpdate"
            DataType="Int32"
            Namespace="User" >0</Variable>
    </Variables>

    <!--Events-->
    <Events>
    <Event EventType="OnPostExecute"
           Name="OnPostExecute"
           ConstraintMode="Linear">
    <Tasks>
        <ExecuteSQL Name="SQL Log Event"
            ConnectionName="NorthwindSales"
            ResultSet="SingleRow">
            <DirectInput>
    EXEC [meta].[uspLogEvent] 'OnPostExecute',?,?,?,?,?,?,?,?,?,?
            </DirectInput>
            <Parameters>
                <Parameter Name="0" Direction="Input"
                    DataType="Guid"
                    VariableName="System.ExecutionInstanceGUID" />
                <Parameter Name="1" Direction="Input"
                    DataType="DateTime"
                    VariableName="System.StartTime" />
                <Parameter Name="2" Direction="Input"
                    DataType="Guid"
                    VariableName="System.PackageID" />
                <Parameter Name="3" Direction="Input"
                    DataType="Guid"
                    VariableName="System.VersionGUID" />
                <Parameter Name="4" Direction="Input"
                    DataType="String"
                    VariableName="System.PackageName" />
                <Parameter Name="5" Direction="Input"
                    DataType="Guid"
                    VariableName="System.SourceID" />
                <Parameter Name="6" Direction="Input"
                    DataType="String"
                    VariableName="System.SourceName" />
                <Parameter Name="7" Direction="Input"
                    DataType="Int32"
                    VariableName="User.RcExtract" />
                <Parameter Name="8" Direction="Input"
                    DataType="Int32"
                    VariableName="User.RcInsert" />
                <Parameter Name="9" Direction="Input"
                    DataType="Int32"
                    VariableName="User.RcUpdate" />
            </Parameters>
            <Results>
                <Result Name="0"
                    VariableName="User.RcExtract" />
                <Result Name="0"
                    VariableName="User.RcInsert" />
                <Result Name="0"
                    VariableName="User.RcUpdate" />
            </Results>
        </ExecuteSQL>
    </Tasks>
    </Event>
    </Events>

    ABOUT BIML

    You can leverage Business Intelligence Markup Language – or BIML – to automate the creation of Microsoft SQL Server Integration Services (SSIS) Packages. BIML is a creation of Varigence, Inc. and is available in proprietary products, open source projects, and has been published as an open language specification. The popular open source BIDSHelper project includes Biml functionality, enabling anyone to write and execute Biml code for free.

    Do you need a head start with the automation of SSIS packages: consider my BIML Workshop.

  • di
    18
    dec 12

    SQL Server, SSIS, SSAS and SSRS on ONE Server

    Best practice dictates that we use a separate server for each of these SQL Server Services. And this seems logical because otherwise these services will compete over server resources. In IT we call this competition: contention.

    However there are some great reasons to put these services on one box:

    • Licensing: SQL Server licensing can be expensive. And you need licenses for every server on which a services runs.
    • Better resource utilization: Less servers, less power usage, less maintenance and -monitoring cost.
    • Sometimes the network is the problem as a lot of data moves from the SQL Engine to SSIS or SSAS resulting in network congesting. If services run on the same machine, SQL Server uses the Shared Memory Protocol which is faster and leads to less network congestion.

    In a scenario with all SQL/BI Services on one server we need some strategies to diminish the described contention. This is of course a big subject and I can only touch the basics in this post and give you some tips.

    TIP 1: Limit the software and services on the server

    This seems logical but I have been to several sites where developers remote desktop into the production server to do maintenance- and other jobs. This is certainly not a best practice. Better is to not install any client tools on the server and use a separate developer/test server to do maintenance.

    TIP 2: Get an overview of your typical daily workload

    image

    In this image I have mapped a typical daily workload to the services needed to perform the workload:

    • From 07:00 AM to 23:00 is the extended working day window. Some users start early, others end late: all are extensively opening SSRS reports and querying SSAS cubes.
    • From 23:00 to 01:00 AM backups are running for all (source) systems. This is the period we do our own SSAS and SQL Server backups.
    • From 01:00 AM tot 07:00 AM is our process window

    In this period:

    • we use SSIS to land data from our source systems into the staging database
    • we use SSIS to load data from our staging database into the data warehouse
    • we’ll process the dimensions and cubes of our SSAS databases.
    • we warm the cache of our SSAS database and start distributing reports using SSRS subscriptions and SSIS for large Excel reports.

    With this knowledge we have a clear understanding of potential contention issues. Furthermore we can schedule to stop and start services on a need to use basis. We can use the operating system commands NET START and NET STOP for this purpose.

    TIP 3 Set Maximum Memory Setting for SQL Server

    With the max server memory setting you can place a limit to the amount of buffer pool memory used by SQL Server. Which leads to the question to what amount should we limit the buffer pool memory?

    During our extended working day window we will probably not benefit from a large buffer pool memory: Most queries will probably be answered by Analysis Services. However in the process windows we will benefit from a larger buffer pool memory. We can change this property during the day and schedule these changes by using the sp_configure system stored procedure. Based on 24GB of RAM, reserve 4 GB for the operating system and:

    • During working day window set max server memory to 4 GB, reserve 8 GB for SSAS and 8 GB for SSRS.
    • During the process window set max server memory to 10 GB, reserving 10GB for SSIS and 8GB for SSAS.

    TIP 4 Set SSAS Memory Settings

    The SSAS memory settings are available in the msmdsrv.ini file and in properties window of the server. If the value is between 0 and 100 than this means a percentage of total available physical memory. Above 100 means bytes. Change these settings to:

    • Memory\HardMemoryLimit: from the default of 0 to 32
    • Memory\TotalMemoryLimit: from the default of 80 to 28
    • Memory\LowMemoryLimit:from the default of 65 to 24

    This means that Analysis Services will start freeing up memory once its has reached the LowMemoryLimit threshold of 24% of physical memory. This process will get more aggressive if it reaches the other thresholds.
    While you’re at it change some other memory properties:

    • OLAP\Process\BufferMemoryLimit from 60 to 20
    • OLAP\Process\AggregationMemoryLimitMax from 80 to 14
    • OLAP\Process\AggregationMemoryLimitMin from 10 to 4

    TIP 5 Set SSRS Memory Settings

    In the RSReportServer.config file add the WorkingSetMaximum property and set it to 8GB (for our 24GB example):
    <WorkingSetMaximum>8000000</WorkingSetMaximum>
    Restart the reporting services service, so other memory related properties that are based on WorkingSetMaximum get set.

    TIP 6 Monitor and adapt the memory setting

    Use the settings in tip 3, 4 en 5 as a starting point but monitor memory usage and change these properties to map them to your workload.

    TIP 7 Use Windows System Resource Manager to restrict CPU utilization

    With Windows System Resource Manager (WSRM) you can create custom resource allocation policies to restrict resources (CPU, RAM) to an application. You can map your workload windows to policies and define these policies e.g.:

    • policy ProcessData which runs when Analysis Services is processing data, set the CPU percentage for SQL to 45% and 45% for SSAS
    • policy ProcessIndex which runs when Analysis Services is processing aggregations, set the CPU percentage for SQL to 10% and 80% for SSAS

    wsrm_resource_allocation

    Next you can setup calendar rules to apply the different policies to the correct time frames.

    However this has the drawback that you have two calendar schemes: one in SQL agent and one in WSRM that you have to keep in sync. Furthermore it’s possible that there aren’t any CPU contention issues. My advice would be to monitor CPU usage and plan resource allocation policies if necessary based on the monitoring results.

    TIP 8 Don’t write to the same disk you’re reading from

    In the daily process window we will typically load large amount of data in our data warehouse:

    • We load data from the source system in a staging database.
    • From the staging database we load the data in the data warehouse.
    • From the data warehouse we load the Analysis Services cubes.

    In this scenario put the data warehouse database files on another drive as the staging database and the Analysis Services cubes.

    Conclusion

    In this post I focused on diminishing contention issues when using one server for your BI Solution running SQL Server, Analysis Services, Integration Services and Reporting Services. Key aspect is understanding your workload and the role the different applications play as well as the properties you can tweak to diminish contention.

    References

  • ma
    04
    jun 12

    TSQL: MERGE With EXCEPT to Handle Type 2 Loading

    Type 2 Loading is a pattern that is often used to incrementally load dimensions in a data warehouse. But you can also use it for loading satellites in a data vault or loading tables in a historical/persistent staging area. It consist of three parts:

    1. Identify new rows and add them to the target database.
    2. Identify changed rows and add them to the target database.
    3. End date the old rows in the target database that have been replaced with those of step 2.

     

    MERGE

    There are several ways to implement a Type 2 loading pattern. In this blog post I’ll describe using the TSQL statement: MERGE. I’ve written earlier rather critical about using the SQL MERGE Statement (in Dutch) Since then I have come to like it and compared to other methods it seems to be a good performing option for handling slowly changing dimension type solutions. (See this study from Microsoft’s Matt Mason)

    Warren Thornthwaite from the Kimball Group did a good explanation on using the SQL MERGE Statement for Slowly Changing Dimension in  Design Tip #107 Using the SQL MERGE Statement for Slowly Changing Dimension Processing.

    In his example for processing type 2 dimension he uses a sample that is similar to the one below:

    INSERT INTO dimCustomer (AuditID,StartDate,EndDate, CustomerID, CustomerName, Adress..)
    SELECT @AuditID, GETDATE(),NULL,CustomerID, CustomerName, Adress,..
    FROM

    ( MERGE DWH.dbo.dimCustomer tgt
    USING STG.dbo.Customer src
    ON src.CustomerID = tgt.CustomerID

    WHEN NOT MATCHED THEN
    INSERT VALUES (@AuditID, GETDATE(), NULL, CustomerID, CustomerName, Adress ..)

    WHEN MATCHED
    AND tgt.__EndDate IS NULL
    AND (tgt.CustomerName <> src.CustomerName
    AND (tgt.Adress <> src.Adress)
    ..
    THEN
    UPDATE SET tgt.EndDate = GETDATE()

    OUTPUT $Action Action_Out, src.CustomerID, CustomerName, Adress ..
    ) AS MERGE_OUT
    WHERE MERGE_OUT.Action_Out = 'UPDATE';

    The outcome of the MERGE statement is used as source for the INSERT INTO statement at the top.
    (Afore mentioned step 2: Identify changed rows and add them to the target database)
    In the Merge statement there are two actions:
    1.   Insert any new rows
    3.   End date the rows in the source that have changed.

    NULL <> NULL

    A problem with this example is the implicit assumption that there are no NULL values stored in the columns. Because otherwise the highlighted expressions with nulls would yield to true (NULL<>NULL) and rows would unnecessary be added to the model.

    There are however two SQL expressions that consider NULL Values as equal: INTERSECT and EXCEPT. So we can easily circumvent this problem by changing the highlighted rows into:

    AND EXISTS (SELECT  tgt.CustomerName, tgt.Adress …
    EXCEPT SELECT  src.CustomerName, src.Adress …)

  • wo
    02
    mei 12

    Batch File To Start/Stop SQL Server

    I figured this out a few years ago but forgot most of it. With a new blazing fast lap-top and only one instance of SQL Server running these services automatically at start-up wasn’t a big problem.

    But now after installing SQL2012 I felt the need to set these services to manually start-up and wrote four batch-files:

    • Start_SQL2008R2.bat
    • Stop_SQL2008R2.bat
    • Start_SQL2012.bat
    • Stop_SQL2012.bat

    Whenever I need one of these instances I right-click the relevant bat file and Run them as administrator.

    This is the contents of Start_SQL2008R2.bat to start my SQL Server instanced with the name SQL2008R2.:

    net start MSSQL$SQL2008R2
    net start SQLAgent$SQL2008R2
    net start MSOLAP$SQL2008R2
    net start MsDtsServer100
    net start ReportServer$SQL2008R2
    pause

    And this is the contents of my Stop_SQL2008R2.bat file.

    net stop SQLAgent$SQL2008R2
    net stop MSSQL$SQL2008R2
    net stop MSOLAP$SQL2008R2
    net stop MsDtsServer100
    net stop ReportServer$SQL2008R2
    pause

    Notice the order (Stop SQL Agent now before SQL Server)

  • vr
    13
    apr 12

    SSIS Pattern: Drop- and rebuild indexes dynamically

    Fact- and dimension tables in our data warehouses are queried often. The queries used in our reports and the queries used for processing SSAS cubes create a large workload of select queries. These “select” queries benefit from indexes and will return results faster with appropriate indexes.

    However a lot of indexes are a pain when you load the date into your data warehouse. A good practices is therefore to drop your NONCLUSTERED indexes at the start of loading your dimensions and fact tables and rebuilt them after loading has finished.

    The pain of your DBA

    Your DBA will typically change these indexes periodically based on his performance tuning efforts of the latest workload. He will be very disappointed when after changing these indexes he is confronted with the old indexes after a rebuild of indexes.

    clip_image018So we need some way to:

    • script the currently used indexes
    • store them
    • drop them
    • load the dimensions
    • load our fact tables
    • rebuild the indexes again.

    In this blog post I walk you to the creation of the package with this control flow:

    But first let’s focus on the SQL script we need to capture the current indexes. Unfortunately you can not use the SQL Function OBJECT_DEFINITION to generate the code necessary to create an index since that object isn’t supported. So we have to get creative.

    The next SQL Statement is based on this blogpost by Simon Sabin. It will return a dataset with 5 columns:

    • SchemaName
    • ObjectName
    • IndexName
    • DropIndexStatement
    • CreateIndexStatement

    clip_image002

    The SQL Statement to return this list of indexes:


    SELECT SchemaName = SCHEMA_NAME (o.SCHEMA_ID)
       ,
    ObjectName = o.name
      
    ,IndexName = i.name
      
    ,DropIndexStatement = 'DROP INDEX [' + i.Name + '] ON [' + SCHEMA_NAME (o.SCHEMA_ID)
       +
    '].[' + o.name + ']'
      
    ,CreateIndexStatement = 'CREATE ' + (CASE i.is_unique WHEN 1 THEN 'UNIQUE ' ELSE '' END)
       +
    'NONCLUSTERED INDEX ['
      
    + i.name + '] ON [' + SCHEMA_NAME (o.SCHEMA_ID) + '].[' + o.name + '] ( '
      
    + LEFT(list, ISNULL(splitter-1,LEN(list))) + ' ) '
      
    + ISNULL('INCLUDE ( ' + SUBSTRING(list, indCol.splitter +1, 100) + ' ) ','')
       +
    ISNULL('WHERE ' + i.filter_definition,'')
    FROM sys.indexes i
    JOIN sys.objects o ON i.OBJECT_ID = o.OBJECT_ID
    CROSS APPLY
    (SELECT splitter = NULLIF(CHARINDEX('|',indexCols.list),0)
    ,
    list
    FROM (SELECT list=CAST((
      
    SELECT CASE WHEN sc.is_included_column = 1 AND sc.ColPos = 1 THEN '|' ELSE '' END +
        
    CASE WHEN sc.ColPos  > 1 THEN ', ' ELSE '' END + name +
               
    CASE WHEN sc.is_descending_key=1 THEN ' DESC' ELSE '' END
                FROM
    (SELECT sc.is_descending_key
          
    , sc.is_included_column
          
    , index_column_id
          
    , name = '[' + name + ']'
          
    , ColPos = ROW_NUMBER() OVER (PARTITION BY sc.is_included_column
              
    ORDER BY sc.index_column_id)
            
    FROM sys.index_columns  sc
            
    JOIN sys.columns c ON sc.OBJECT_ID = c.OBJECT_ID
              
    AND sc.column_id = c.column_id
            
    WHERE sc.index_id = i.index_id
                       
    AND sc.OBJECT_ID = i.OBJECT_ID ) sc
                       
    ORDER BY sc.is_included_column, ColPos
                         
    FOR XML PATH (''), TYPE) AS VARCHAR(MAX))
       )
    indexCols
    ) indCol
    WHERE i.is_primary_key=0
    AND i.is_unique_constraint=0
    AND i.type_desc=N'NONCLUSTERED'
    AND o.TYPE=N'U'
    ORDER BY SchemaName, ObjectName, IndexName

    The Steps to change/create the master package:

    1. Open the master package that you use to load the dimensions and facts.

    2. Add Variables:

    • IndexList with Data Type=Object
    • DropIndexStatement with Data Type =String
    • CreateIndexStatement with Data Type =String
      clip_image004


    3. Add an Execute SQL Task to the Control Flow

    • Change Name to Get IndexList
    • Change ResultSet to Full result set
    • Select the target connection under Connection
    • Copy the SQL statement (see above) as the SQLStatement
      clip_image006

     

    • Select the tab Result Set
    • Choose Add and use 0 as the Result Name and Select User::IndexList as the Variable Name 
      clip_image008


    4. Add a Foreach Loop Container to the Control Flow and double click on it to open the Foreach Loop Editor

    • Under the tab General change Name and Description to Drop each index
    • Under the tab Collection change Enumerator to Foreach ADO Enumerator and select User::IndexList from the dropdown box as the ADO Object source variable clip_image010

     

    • Under the tab Variable Mappings select User::DropIndexStatement and type 3 as Indexclip_image012


    5. Drop an Execute SQL Task in the Drop each index container and double click on it to open the Execute SQL Task Editor.

    • Change Name and Description to Drop Index
    • Select the target connection under Connection
    • Select Variable as the SQLSourceType
    • Select User::DropIndexStatement as the SourceVariable clip_image014


    6. Add a Foreach Loop Container to the Control Flow and double click on it to open the Foreach Loop Editor

    • Use Create each index as name for the Foreach loop container
    • In the Collection tab of the Foreach Loop Editor select Foreach ADO Enumerator and use the User::IndexList as ADO Object source variable
    • In the Variable Mappings tab Select User::CreateIndexStatement as variable and 4 as index


    7. Drop an Execute SQL Task in the Drop each index container and double click on it to open the Execute SQL Task Editor.

    • Change Name and Description to Create Index
    • Select your Connection under Connection
    • Select Variable as the SQLSourceType
    • Select User::CreateIndexStatement as the SourceVariableclip_image016


    8. Change the Precedence Constraints to map the example of the afore mentioned control flow.

  • do
    03
    nov 11

    SQL Server 2012 Editions and Licensing

    image

    Microsoft published the licensing options and prices. Two aspects stand out:

    • A new edition: Business Intelligence (between Standard- and Enterprise Edition)
    • A move to core based licensing (from processor based)

     

    The main editions and the differences in functionality:


    Standard Edition:

    • Basic OLTP
    • Basic Reporting & Analytics
    • Programmability
    • Manageability
    • Basic High Availability


    Business Intelligence Edition:

    Standard Edition +:

    • Data Quality Services
    • Master Data Services
    • Power View
    • PowerPivot for Sharepoint Portal Server
    • Semantic model, advanced analytics


    Enterprise Edition

    Business Intelligence Edition +:

    • Advanced Security
    • Advanced High availability
    • ColumnStore indexing
    • Compression
    • Partitioning

     

    US Pricing:

    This is an overview of the expected prices:

    image

  • ma
    17
    okt 11

    TSQL Metadata Queries

    When you want to query SQL Server for the metadata you basically have two options:

    • Use the Catalog views
    • Use the Information Schema Views

    The last one are based on catalog view definitions in the ISO standard. You typically use them if you need to support more database systems.

    The first is one is recommended by Microsoft.

    In this blogpost I recapitulate some queries that I have used in the past and will probably use in the future.

    1. Get column information from all tables

    -----------------------------------------------------------
    -- Query: GetColumnInfoFromAllTables
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT SchemaName = SCHEMA_NAME(T.schema_id)
       ,
    TableName = T.name
      
    ,ColumnName = C.name
      
    ,ColumnID = C.column_id
      
    ,DataType = ty.name
      
    ,MaxLength = C.max_length
      
    ,PRECISION = C.PRECISION
      
    ,Scale = C.scale
      
    ,CondensedDataType=(CASE
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length=-1)
                  THEN ty.name + '(MAX)'
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length>0)
                 THEN ty.name + '(' + CONVERT(VARCHAR(20),C.max_length) + ')'
          
    WHEN ty.name IN ('decimal','numeric')
                 THEN ty.name + '(' + CONVERT(VARCHAR(20),C.PRECISION) + ',' +
                      
    CONVERT(VARCHAR(20),C.scale) + ')'
          
    WHEN ty.name IN ('datetime2','datetimeoffset')
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    ELSE ty.name
          
    END )
       ,
    IsIdentity = c.is_identity
      
    ,IsNullable = c.is_nullable
      
    ,ModifiedDate = T.modify_date
    FROM sys.tables T
    INNER JOIN sys.columns C ON C.OBJECT_ID=T.OBJECT_ID
    INNER JOIN sys.types ty ON C.system_type_id=ty.user_type_id
    ORDER BY SCHEMA_NAME(T.schema_id),T.Name,C.column_id

    This will return this result set:

    image

    Notice the join between sys.types and sys.columns on type.user_type_id. Otherwise you would get duplicate rows if there are user types defined.

    2. Get column information from all views

    Almost the same as 1. Replacing sys.tables with sys.views:

    -----------------------------------------------------------
    -- Query: GetColumnInfoFromAllViews
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT SchemaName = SCHEMA_NAME(V.schema_id)
       ,
    ViewName = V.name
      
    ,ColumnName = C.name
      
    ,ColumnID = C.column_id
      
    ,DataType = ty.name
      
    ,MaxLength = C.max_length
      
    ,PRECISION = C.PRECISION
      
    ,Scale = C.scale
      
    ,CondensedDataType=(CASE
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length=-1)
                
    THEN ty.name + '(MAX)'
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length>0)
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.max_length) + ')'
          
    WHEN ty.name IN ('decimal','numeric') THEN ty.name + '(' + CONVERT(
                
    VARCHAR(20),C.PRECISION) + ',' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    WHEN ty.name IN ('datetime2','datetimeoffset')
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    ELSE ty.name
          
    END )
       ,
    IsIdentity = c.is_identity
      
    ,IsNullable = c.is_nullable
      
    ,ModifiedDate = V.modify_date
    FROM sys.views V
    INNER JOIN sys.columns C ON C.OBJECT_ID=V.OBJECT_ID
    INNER JOIN sys.types ty ON C.system_type_id=ty.user_type_id
    ORDER BY V.Name,C.column_id

    Resulting in:

    image

    3. Get details of foreign key constraints

    -----------------------------------------------------------
    -- Query: GetFKDetails
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT FKName = f.name
      
    ,FKSchema = OBJECT_SCHEMA_NAME(f.parent_object_id)
       ,
    FKTable = OBJECT_NAME(f.parent_object_id)
       ,
    FKColumn = COL_NAME(f.parent_object_id,fc.parent_column_id)
       ,
    RefSchema = OBJECT_SCHEMA_NAME(f.referenced_object_id)
       ,
    RefTable = OBJECT_NAME(f.referenced_object_id)
       ,
    RefColumn = COL_NAME(fc.referenced_object_id,fc.referenced_column_id)
       ,
    ConstraintColumnID = fc.constraint_column_id
      
    ,ModifiedDate = f.modify_date
    FROM sys.foreign_keys f
    INNER JOIN sys.foreign_key_columns fc
      
    ON f.OBJECT_ID = fc.constraint_object_id

    returns:

    image

    4. Get details of the indices created on user tables

    -----------------------------------------------------------
    -- Query: GetIndexDetails
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT IndexName = i.name
      
    ,SchemaName = OBJECT_SCHEMA_NAME(i.OBJECT_ID)
       ,
    ObjectName = OBJECT_NAME(i.OBJECT_ID)
       ,
    ColumnName = COL_NAME(i.OBJECT_ID,ic.column_id)
       ,
    IndexType = i.type_desc
      
    ,IndexID = i.index_id
      
    ,IndexColumnID = ic.index_column_id
      
    ,KeyOrdinal = ic.key_ordinal
      
    ,IsIncludedColumn = ic.is_included_column
      
    ,IsUnique = i.is_unique
      
    ,IsPrimaryKey = i.is_primary_key
    FROM sys.indexes i
    INNER JOIN sys.index_columns ic
      
    ON i.OBJECT_ID = ic.OBJECT_ID AND i.index_id=ic.index_id
    --limit to indexes on user tables
    INNER JOIN sys.tables t
      
    ON i.OBJECT_ID = t.OBJECT_ID

    which returns:

    image