IN2BI, Microsoft Business Intelligence
data, stories & insights
rss | email | twitter

Category: 'Microsoft Business Intelligence'

  • Thu
    03
    Jan 13

    Logging in SSIS with BIML 1 – Logging Task Execution

    The project deployment model in SSIS (2012) offers great default logging functionality. However you may find yourself in need of additional functionality, especially if you use the package deployment model or an earlier version of SQL Server. In this series on logging I dive into 3 methods:

    1. Logging Task Execution using the OnPostExecute Event
      (this post)
    2. Logging Package Execution in the control flow.
    3. Logging Errors using the OnError Event.

    In this post I’ll describe the first method: Logging Task Execution using the OnPostExecute Event. This event is fired after the execution of each task and package. And thus provides a great way to log relevant information like package duration and row counts in the data flow.

    The SsisEventLog Table

    To store these events we need a table:

    CREATE TABLE  meta.SsisEventLog (
          EventID int IDENTITY(1,1) NOT NULL
        , ExecutionID uniqueidentifier 
        , EventLogDate datetime 
        , PackageStartDate datetime 
        , PackageDuration AS datediff(second, PackageStartDate
            , EventLogDate) PERSISTED
        , PackageGUID uniqueidentifier 
        , PackageVersionGUID   uniqueidentifier 
        , PackageName  varchar(50)
        , TaskGUID   uniqueidentifier 
        , TaskName   varchar(50)
        , ExtractRowCount int 
        , InsertRowCount int 
        , UpdateRowCount int 
        , CONSTRAINT  PK_SsisEventLog 
            PRIMARY KEY CLUSTERED (EventID)
    )

    Some remarks:

    • The PackageDuration column is calculated using the datediff function. It calculates the difference in seconds between the time the event was fired and the start date of the package.
    • If the TaskGUID equals to the PackageGUID the event is fired directly after the package was executed.
    • The row count columns will be filled by variables we define in the package and we’ll have to find a way to reset them after logging the event.

    To log this information we’ll use a stored procedures that writes the information in the SsisEventLog table and returns a row with 3 columns, all with the value of 0 to reset the variables after logging.

    CREATE PROCEDURE [meta].[uspLogEvent]      
          @ExecutionID [uniqueidentifier]
        , @PackageStartDate [datetime]
        , @PackageGUID [uniqueidentifier]
        , @PackageVersionGUID [uniqueidentifier]
        , @PackageName [varchar](50)
        , @TaskGUID [uniqueidentifier]
        , @TaskName [varchar](50)
        , @ExtractRowCount [int]
        , @InsertRowCount [int]
        , @UpdateRowCount [int]
    AS       

    INSERT INTO [meta].[SsisEventLog](
          ExecutionID
        , EventLogDate
        , PackageStartDate
        , PackageGUID
        , PackageVersionGUID
        , PackageName
        , TaskGUID
        , TaskName
        , ExtractRowCount
        , InsertRowCount
        , UpdateRowCount
    )
    VALUES(
          @ExecutionID
        , SYSDATETIME()   
        , @PackageStartDate
        , @PackageGUID
        , @PackageVersionGUID
        , @PackageName
        , @TaskGUID
        , @TaskName
        , @ExtractRowCount
        , @InsertRowCount
        , @UpdateRowCount
    );
    SELECT 0 AS ExtractRowCount
        , 0 AS InsertRowCount
        , 0 AS UpdateRowCount
    ;

    Creating the log event handler in the package

    Before we can create this event handler you have to create three variables of type Int32 with value 0 as in this image:

    OnPostExecuteVariables

    To create the event handler in a package open the Event Handlers tab and select the package in the Executable dropdown box and select the OnPostExecute in the Event handler dropdown box.

    Next drop a Execute SQL Task from the toolbox on the canvas and use the following setting in the General tab:

    • Name: SQL Log Event
    • ResultSet: Single row
    • ConnectionType: OLE DB
    • Connection: The name of the connection with your log table
    • SQLSourceType: Direct imput
    • SQLStatement:
      EXEC [meta].[uspLogEvent] ?,?,?,?,?,?,?,?,?,?

     

    OnPostExecute1

    In the Parameter Mapping tab add 9 mappings using the following variable names:

    • System:: ExecutionInstanceGUID
    • System:: StartTime
    • System:: PackageID
    • System:: VersionGUID
    • System:: PackageName
    • System:: SourceID
    • System:: SourceName
    • User:: RcExtract
    • User:: RcInsert
    • User:: RcUpdate

    OnPostExecute2

    In the Result Set tab add 3 results as in the image below.

    OnPostExecute3

    And we are ready with the implementation of the event handler in the SSIS Package.

    The Results

    After executing a package with one Execute SQL Task and one Data Flow Task you will see three rows in the SsisEventLog table. Two for the tasks and one for the package:

    OnPostExecuteResult

    OnPostExecuteResult2

    Using this method you can now easily troubleshoot any (performance) issues with the package.

    Using BIML to implement the event handler

    To implement this event handler with BIML use the following code snippets in the BIMLScript that creates the package:

    <!--Variables-->
    <Variables>
        <Variable Name="RcExtract"
            DataType="Int32"
            Namespace="User" >0</Variable>
        <Variable Name="RcInsert"
            DataType="Int32"
            Namespace="User" >0</Variable>
        <Variable Name="RcUpdate"
            DataType="Int32"
            Namespace="User" >0</Variable>
    </Variables>

    <!--Events-->
    <Events>
    <Event EventType="OnPostExecute"
           Name="OnPostExecute"
           ConstraintMode="Linear">
    <Tasks>
        <ExecuteSQL Name="SQL Log Event"
            ConnectionName="NorthwindSales"
            ResultSet="SingleRow">
            <DirectInput>
    EXEC [meta].[uspLogEvent] 'OnPostExecute',?,?,?,?,?,?,?,?,?,?
            </DirectInput>
            <Parameters>
                <Parameter Name="0" Direction="Input"
                    DataType="Guid"
                    VariableName="System.ExecutionInstanceGUID" />
                <Parameter Name="1" Direction="Input"
                    DataType="DateTime"
                    VariableName="System.StartTime" />
                <Parameter Name="2" Direction="Input"
                    DataType="Guid"
                    VariableName="System.PackageID" />
                <Parameter Name="3" Direction="Input"
                    DataType="Guid"
                    VariableName="System.VersionGUID" />
                <Parameter Name="4" Direction="Input"
                    DataType="String"
                    VariableName="System.PackageName" />
                <Parameter Name="5" Direction="Input"
                    DataType="Guid"
                    VariableName="System.SourceID" />
                <Parameter Name="6" Direction="Input"
                    DataType="String"
                    VariableName="System.SourceName" />
                <Parameter Name="7" Direction="Input"
                    DataType="Int32"
                    VariableName="User.RcExtract" />
                <Parameter Name="8" Direction="Input"
                    DataType="Int32"
                    VariableName="User.RcInsert" />
                <Parameter Name="9" Direction="Input"
                    DataType="Int32"
                    VariableName="User.RcUpdate" />
            </Parameters>
            <Results>
                <Result Name="0"
                    VariableName="User.RcExtract" />
                <Result Name="0"
                    VariableName="User.RcInsert" />
                <Result Name="0"
                    VariableName="User.RcUpdate" />
            </Results>
        </ExecuteSQL>
    </Tasks>
    </Event>
    </Events>

    ABOUT BIML

    You can leverage Business Intelligence Markup Language – or BIML – to automate the creation of Microsoft SQL Server Integration Services (SSIS) Packages. BIML is a creation of Varigence, Inc. and is available in proprietary products, open source projects, and has been published as an open language specification. The popular open source BIDSHelper project includes Biml functionality, enabling anyone to write and execute Biml code for free.

    Do you need a head start with the automation of SSIS packages: consider my BIML Workshop.

  • Tue
    18
    Dec 12

    SQL Server, SSIS, SSAS and SSRS on ONE Server

    Best practice dictates that we use a separate server for each of these SQL Server Services. And this seems logical because otherwise these services will compete over server resources. In IT we call this competition: contention.

    However there are some great reasons to put these services on one box:

    • Licensing: SQL Server licensing can be expensive. And you need licenses for every server on which a services runs.
    • Better resource utilization: Less servers, less power usage, less maintenance and -monitoring cost.
    • Sometimes the network is the problem as a lot of data moves from the SQL Engine to SSIS or SSAS resulting in network congesting. If services run on the same machine, SQL Server uses the Shared Memory Protocol which is faster and leads to less network congestion.

    In a scenario with all SQL/BI Services on one server we need some strategies to diminish the described contention. This is of course a big subject and I can only touch the basics in this post and give you some tips.

    TIP 1: Limit the software and services on the server

    This seems logical but I have been to several sites where developers remote desktop into the production server to do maintenance- and other jobs. This is certainly not a best practice. Better is to not install any client tools on the server and use a separate developer/test server to do maintenance.

    TIP 2: Get an overview of your typical daily workload

    image

    In this image I have mapped a typical daily workload to the services needed to perform the workload:

    • From 07:00 AM to 23:00 is the extended working day window. Some users start early, others end late: all are extensively opening SSRS reports and querying SSAS cubes.
    • From 23:00 to 01:00 AM backups are running for all (source) systems. This is the period we do our own SSAS and SQL Server backups.
    • From 01:00 AM tot 07:00 AM is our process window

    In this period:

    • we use SSIS to land data from our source systems into the staging database
    • we use SSIS to load data from our staging database into the data warehouse
    • we’ll process the dimensions and cubes of our SSAS databases.
    • we warm the cache of our SSAS database and start distributing reports using SSRS subscriptions and SSIS for large Excel reports.

    With this knowledge we have a clear understanding of potential contention issues. Furthermore we can schedule to stop and start services on a need to use basis. We can use the operating system commands NET START and NET STOP for this purpose.

    TIP 3 Set Maximum Memory Setting for SQL Server

    With the max server memory setting you can place a limit to the amount of buffer pool memory used by SQL Server. Which leads to the question to what amount should we limit the buffer pool memory?

    During our extended working day window we will probably not benefit from a large buffer pool memory: Most queries will probably be answered by Analysis Services. However in the process windows we will benefit from a larger buffer pool memory. We can change this property during the day and schedule these changes by using the sp_configure system stored procedure. Based on 24GB of RAM, reserve 4 GB for the operating system and:

    • During working day window set max server memory to 4 GB, reserve 8 GB for SSAS and 8 GB for SSRS.
    • During the process window set max server memory to 10 GB, reserving 10GB for SSIS and 8GB for SSAS.

    TIP 4 Set SSAS Memory Settings

    The SSAS memory settings are available in the msmdsrv.ini file and in properties window of the server. If the value is between 0 and 100 than this means a percentage of total available physical memory. Above 100 means bytes. Change these settings to:

    • Memory\HardMemoryLimit: from the default of 0 to 32
    • Memory\TotalMemoryLimit: from the default of 80 to 28
    • Memory\LowMemoryLimit:from the default of 65 to 24

    This means that Analysis Services will start freeing up memory once its has reached the LowMemoryLimit threshold of 24% of physical memory. This process will get more aggressive if it reaches the other thresholds.
    While you’re at it change some other memory properties:

    • OLAP\Process\BufferMemoryLimit from 60 to 20
    • OLAP\Process\AggregationMemoryLimitMax from 80 to 14
    • OLAP\Process\AggregationMemoryLimitMin from 10 to 4

    TIP 5 Set SSRS Memory Settings

    In the RSReportServer.config file add the WorkingSetMaximum property and set it to 8GB (for our 24GB example):
    <WorkingSetMaximum>8000000</WorkingSetMaximum>
    Restart the reporting services service, so other memory related properties that are based on WorkingSetMaximum get set.

    TIP 6 Monitor and adapt the memory setting

    Use the settings in tip 3, 4 en 5 as a starting point but monitor memory usage and change these properties to map them to your workload.

    TIP 7 Use Windows System Resource Manager to restrict CPU utilization

    With Windows System Resource Manager (WSRM) you can create custom resource allocation policies to restrict resources (CPU, RAM) to an application. You can map your workload windows to policies and define these policies e.g.:

    • policy ProcessData which runs when Analysis Services is processing data, set the CPU percentage for SQL to 45% and 45% for SSAS
    • policy ProcessIndex which runs when Analysis Services is processing aggregations, set the CPU percentage for SQL to 10% and 80% for SSAS

    wsrm_resource_allocation

    Next you can setup calendar rules to apply the different policies to the correct time frames.

    However this has the drawback that you have two calendar schemes: one in SQL agent and one in WSRM that you have to keep in sync. Furthermore it’s possible that there aren’t any CPU contention issues. My advice would be to monitor CPU usage and plan resource allocation policies if necessary based on the monitoring results.

    TIP 8 Don’t write to the same disk you’re reading from

    In the daily process window we will typically load large amount of data in our data warehouse:

    • We load data from the source system in a staging database.
    • From the staging database we load the data in the data warehouse.
    • From the data warehouse we load the Analysis Services cubes.

    In this scenario put the data warehouse database files on another drive as the staging database and the Analysis Services cubes.

    Conclusion

    In this post I focused on diminishing contention issues when using one server for your BI Solution running SQL Server, Analysis Services, Integration Services and Reporting Services. Key aspect is understanding your workload and the role the different applications play as well as the properties you can tweak to diminish contention.

    References

  • Mon
    04
    Jun 12

    TSQL: MERGE With EXCEPT to Handle Type 2 Loading

    Type 2 Loading is a pattern that is often used to incrementally load dimensions in a data warehouse. But you can also use it for loading satellites in a data vault or loading tables in a historical/persistent staging area. It consist of three parts:

    1. Identify new rows and add them to the target database.
    2. Identify changed rows and add them to the target database.
    3. End date the old rows in the target database that have been replaced with those of step 2.

     

    MERGE

    There are several ways to implement a Type 2 loading pattern. In this blog post I’ll describe using the TSQL statement: MERGE. I’ve written earlier rather critical about using the SQL MERGE Statement (in Dutch) Since then I have come to like it and compared to other methods it seems to be a good performing option for handling slowly changing dimension type solutions. (See this study from Microsoft’s Matt Mason)

    Warren Thornthwaite from the Kimball Group did a good explanation on using the SQL MERGE Statement for Slowly Changing Dimension in  Design Tip #107 Using the SQL MERGE Statement for Slowly Changing Dimension Processing.

    In his example for processing type 2 dimension he uses a sample that is similar to the one below:

    INSERT INTO dimCustomer (AuditID,StartDate,EndDate, CustomerID, CustomerName, Adress..)
    SELECT @AuditID, GETDATE(),NULL,CustomerID, CustomerName, Adress,..
    FROM

    ( MERGE DWH.dbo.dimCustomer tgt
    USING STG.dbo.Customer src
    ON src.CustomerID = tgt.CustomerID

    WHEN NOT MATCHED THEN
    INSERT VALUES (@AuditID, GETDATE(), NULL, CustomerID, CustomerName, Adress ..)

    WHEN MATCHED
    AND tgt.__EndDate IS NULL
    AND (tgt.CustomerName <> src.CustomerName
    AND (tgt.Adress <> src.Adress)
    ..
    THEN
    UPDATE SET tgt.EndDate = GETDATE()

    OUTPUT $Action Action_Out, src.CustomerID, CustomerName, Adress ..
    ) AS MERGE_OUT
    WHERE MERGE_OUT.Action_Out = 'UPDATE';

    The outcome of the MERGE statement is used as source for the INSERT INTO statement at the top.
    (Afore mentioned step 2: Identify changed rows and add them to the target database)
    In the Merge statement there are two actions:
    1.   Insert any new rows
    3.   End date the rows in the source that have changed.

    NULL <> NULL

    A problem with this example is the implicit assumption that there are no NULL values stored in the columns. Because otherwise the highlighted expressions with nulls would yield to true (NULL<>NULL) and rows would unnecessary be added to the model.

    There are however two SQL expressions that consider NULL Values as equal: INTERSECT and EXCEPT. So we can easily circumvent this problem by changing the highlighted rows into:

    AND EXISTS (SELECT  tgt.CustomerName, tgt.Adress …
    EXCEPT SELECT  src.CustomerName, src.Adress …)

  • Wed
    02
    May 12

    Batch File To Start/Stop SQL Server

    I figured this out a few years ago but forgot most of it. With a new blazing fast lap-top and only one instance of SQL Server running these services automatically at start-up wasn’t a big problem.

    But now after installing SQL2012 I felt the need to set these services to manually start-up and wrote four batch-files:

    • Start_SQL2008R2.bat
    • Stop_SQL2008R2.bat
    • Start_SQL2012.bat
    • Stop_SQL2012.bat

    Whenever I need one of these instances I right-click the relevant bat file and Run them as administrator.

    This is the contents of Start_SQL2008R2.bat to start my SQL Server instanced with the name SQL2008R2.:

    net start MSSQL$SQL2008R2
    net start SQLAgent$SQL2008R2
    net start MSOLAP$SQL2008R2
    net start MsDtsServer100
    net start ReportServer$SQL2008R2
    pause

    And this is the contents of my Stop_SQL2008R2.bat file.

    net stop SQLAgent$SQL2008R2
    net stop MSSQL$SQL2008R2
    net stop MSOLAP$SQL2008R2
    net stop MsDtsServer100
    net stop ReportServer$SQL2008R2
    pause

    Notice the order (Stop SQL Agent now before SQL Server)

  • Fri
    13
    Apr 12

    SSIS Pattern: Drop- and rebuild indexes dynamically

    Fact- and dimension tables in our data warehouses are queried often. The queries used in our reports and the queries used for processing SSAS cubes create a large workload of select queries. These “select” queries benefit from indexes and will return results faster with appropriate indexes.

    However a lot of indexes are a pain when you load the date into your data warehouse. A good practices is therefore to drop your NONCLUSTERED indexes at the start of loading your dimensions and fact tables and rebuilt them after loading has finished.

    The pain of your DBA

    Your DBA will typically change these indexes periodically based on his performance tuning efforts of the latest workload. He will be very disappointed when after changing these indexes he is confronted with the old indexes after a rebuild of indexes.

    clip_image018So we need some way to:

    • script the currently used indexes
    • store them
    • drop them
    • load the dimensions
    • load our fact tables
    • rebuild the indexes again.

    In this blog post I walk you to the creation of the package with this control flow:

    But first let’s focus on the SQL script we need to capture the current indexes. Unfortunately you can not use the SQL Function OBJECT_DEFINITION to generate the code necessary to create an index since that object isn’t supported. So we have to get creative.

    The next SQL Statement is based on this blogpost by Simon Sabin. It will return a dataset with 5 columns:

    • SchemaName
    • ObjectName
    • IndexName
    • DropIndexStatement
    • CreateIndexStatement

    clip_image002

    The SQL Statement to return this list of indexes:


    SELECT SchemaName = SCHEMA_NAME (o.SCHEMA_ID)
       ,
    ObjectName = o.name
      
    ,IndexName = i.name
      
    ,DropIndexStatement = 'DROP INDEX [' + i.Name + '] ON [' + SCHEMA_NAME (o.SCHEMA_ID)
       +
    '].[' + o.name + ']'
      
    ,CreateIndexStatement = 'CREATE ' + (CASE i.is_unique WHEN 1 THEN 'UNIQUE ' ELSE '' END)
       +
    'NONCLUSTERED INDEX ['
      
    + i.name + '] ON [' + SCHEMA_NAME (o.SCHEMA_ID) + '].[' + o.name + '] ( '
      
    + LEFT(list, ISNULL(splitter-1,LEN(list))) + ' ) '
      
    + ISNULL('INCLUDE ( ' + SUBSTRING(list, indCol.splitter +1, 100) + ' ) ','')
       +
    ISNULL('WHERE ' + i.filter_definition,'')
    FROM sys.indexes i
    JOIN sys.objects o ON i.OBJECT_ID = o.OBJECT_ID
    CROSS APPLY
    (SELECT splitter = NULLIF(CHARINDEX('|',indexCols.list),0)
    ,
    list
    FROM (SELECT list=CAST((
      
    SELECT CASE WHEN sc.is_included_column = 1 AND sc.ColPos = 1 THEN '|' ELSE '' END +
        
    CASE WHEN sc.ColPos  > 1 THEN ', ' ELSE '' END + name +
               
    CASE WHEN sc.is_descending_key=1 THEN ' DESC' ELSE '' END
                FROM
    (SELECT sc.is_descending_key
          
    , sc.is_included_column
          
    , index_column_id
          
    , name = '[' + name + ']'
          
    , ColPos = ROW_NUMBER() OVER (PARTITION BY sc.is_included_column
              
    ORDER BY sc.index_column_id)
            
    FROM sys.index_columns  sc
            
    JOIN sys.columns c ON sc.OBJECT_ID = c.OBJECT_ID
              
    AND sc.column_id = c.column_id
            
    WHERE sc.index_id = i.index_id
                       
    AND sc.OBJECT_ID = i.OBJECT_ID ) sc
                       
    ORDER BY sc.is_included_column, ColPos
                         
    FOR XML PATH (''), TYPE) AS VARCHAR(MAX))
       )
    indexCols
    ) indCol
    WHERE i.is_primary_key=0
    AND i.is_unique_constraint=0
    AND i.type_desc=N'NONCLUSTERED'
    AND o.TYPE=N'U'
    ORDER BY SchemaName, ObjectName, IndexName

    The Steps to change/create the master package:

    1. Open the master package that you use to load the dimensions and facts.

    2. Add Variables:

    • IndexList with Data Type=Object
    • DropIndexStatement with Data Type =String
    • CreateIndexStatement with Data Type =String
      clip_image004


    3. Add an Execute SQL Task to the Control Flow

    • Change Name to Get IndexList
    • Change ResultSet to Full result set
    • Select the target connection under Connection
    • Copy the SQL statement (see above) as the SQLStatement
      clip_image006

     

    • Select the tab Result Set
    • Choose Add and use 0 as the Result Name and Select User::IndexList as the Variable Name 
      clip_image008


    4. Add a Foreach Loop Container to the Control Flow and double click on it to open the Foreach Loop Editor

    • Under the tab General change Name and Description to Drop each index
    • Under the tab Collection change Enumerator to Foreach ADO Enumerator and select User::IndexList from the dropdown box as the ADO Object source variable clip_image010

     

    • Under the tab Variable Mappings select User::DropIndexStatement and type 3 as Indexclip_image012


    5. Drop an Execute SQL Task in the Drop each index container and double click on it to open the Execute SQL Task Editor.

    • Change Name and Description to Drop Index
    • Select the target connection under Connection
    • Select Variable as the SQLSourceType
    • Select User::DropIndexStatement as the SourceVariable clip_image014


    6. Add a Foreach Loop Container to the Control Flow and double click on it to open the Foreach Loop Editor

    • Use Create each index as name for the Foreach loop container
    • In the Collection tab of the Foreach Loop Editor select Foreach ADO Enumerator and use the User::IndexList as ADO Object source variable
    • In the Variable Mappings tab Select User::CreateIndexStatement as variable and 4 as index


    7. Drop an Execute SQL Task in the Drop each index container and double click on it to open the Execute SQL Task Editor.

    • Change Name and Description to Create Index
    • Select your Connection under Connection
    • Select Variable as the SQLSourceType
    • Select User::CreateIndexStatement as the SourceVariableclip_image016


    8. Change the Precedence Constraints to map the example of the afore mentioned control flow.

  • Thu
    03
    Nov 11

    SQL Server 2012 Editions and Licensing

    image

    Microsoft published the licensing options and prices. Two aspects stand out:

    • A new edition: Business Intelligence (between Standard- and Enterprise Edition)
    • A move to core based licensing (from processor based)

     

    The main editions and the differences in functionality:


    Standard Edition:

    • Basic OLTP
    • Basic Reporting & Analytics
    • Programmability
    • Manageability
    • Basic High Availability


    Business Intelligence Edition:

    Standard Edition +:

    • Data Quality Services
    • Master Data Services
    • Power View
    • PowerPivot for Sharepoint Portal Server
    • Semantic model, advanced analytics


    Enterprise Edition

    Business Intelligence Edition +:

    • Advanced Security
    • Advanced High availability
    • ColumnStore indexing
    • Compression
    • Partitioning

     

    US Pricing:

    This is an overview of the expected prices:

    image

  • Mon
    17
    Oct 11

    TSQL Metadata Queries

    When you want to query SQL Server for the metadata you basically have two options:

    • Use the Catalog views
    • Use the Information Schema Views

    The last one are based on catalog view definitions in the ISO standard. You typically use them if you need to support more database systems.

    The first is one is recommended by Microsoft.

    In this blogpost I recapitulate some queries that I have used in the past and will probably use in the future.

    1. Get column information from all tables

    -----------------------------------------------------------
    -- Query: GetColumnInfoFromAllTables
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT SchemaName = SCHEMA_NAME(T.schema_id)
       ,
    TableName = T.name
      
    ,ColumnName = C.name
      
    ,ColumnID = C.column_id
      
    ,DataType = ty.name
      
    ,MaxLength = C.max_length
      
    ,PRECISION = C.PRECISION
      
    ,Scale = C.scale
      
    ,CondensedDataType=(CASE
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length=-1)
                  THEN ty.name + '(MAX)'
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length>0)
                 THEN ty.name + '(' + CONVERT(VARCHAR(20),C.max_length) + ')'
          
    WHEN ty.name IN ('decimal','numeric')
                 THEN ty.name + '(' + CONVERT(VARCHAR(20),C.PRECISION) + ',' +
                      
    CONVERT(VARCHAR(20),C.scale) + ')'
          
    WHEN ty.name IN ('datetime2','datetimeoffset')
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    ELSE ty.name
          
    END )
       ,
    IsIdentity = c.is_identity
      
    ,IsNullable = c.is_nullable
      
    ,ModifiedDate = T.modify_date
    FROM sys.tables T
    INNER JOIN sys.columns C ON C.OBJECT_ID=T.OBJECT_ID
    INNER JOIN sys.types ty ON C.system_type_id=ty.user_type_id
    ORDER BY SCHEMA_NAME(T.schema_id),T.Name,C.column_id

    This will return this result set:

    image

    Notice the join between sys.types and sys.columns on type.user_type_id. Otherwise you would get duplicate rows if there are user types defined.

    2. Get column information from all views

    Almost the same as 1. Replacing sys.tables with sys.views:

    -----------------------------------------------------------
    -- Query: GetColumnInfoFromAllViews
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT SchemaName = SCHEMA_NAME(V.schema_id)
       ,
    ViewName = V.name
      
    ,ColumnName = C.name
      
    ,ColumnID = C.column_id
      
    ,DataType = ty.name
      
    ,MaxLength = C.max_length
      
    ,PRECISION = C.PRECISION
      
    ,Scale = C.scale
      
    ,CondensedDataType=(CASE
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length=-1)
                
    THEN ty.name + '(MAX)'
          
    WHEN (ty.name IN ('char','nchar','varchar','nvarchar') AND C.max_length>0)
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.max_length) + ')'
          
    WHEN ty.name IN ('decimal','numeric') THEN ty.name + '(' + CONVERT(
                
    VARCHAR(20),C.PRECISION) + ',' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    WHEN ty.name IN ('datetime2','datetimeoffset')
                
    THEN ty.name + '(' + CONVERT(VARCHAR(20),C.scale) + ')'
          
    ELSE ty.name
          
    END )
       ,
    IsIdentity = c.is_identity
      
    ,IsNullable = c.is_nullable
      
    ,ModifiedDate = V.modify_date
    FROM sys.views V
    INNER JOIN sys.columns C ON C.OBJECT_ID=V.OBJECT_ID
    INNER JOIN sys.types ty ON C.system_type_id=ty.user_type_id
    ORDER BY V.Name,C.column_id

    Resulting in:

    image

    3. Get details of foreign key constraints

    -----------------------------------------------------------
    -- Query: GetFKDetails
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT FKName = f.name
      
    ,FKSchema = OBJECT_SCHEMA_NAME(f.parent_object_id)
       ,
    FKTable = OBJECT_NAME(f.parent_object_id)
       ,
    FKColumn = COL_NAME(f.parent_object_id,fc.parent_column_id)
       ,
    RefSchema = OBJECT_SCHEMA_NAME(f.referenced_object_id)
       ,
    RefTable = OBJECT_NAME(f.referenced_object_id)
       ,
    RefColumn = COL_NAME(fc.referenced_object_id,fc.referenced_column_id)
       ,
    ConstraintColumnID = fc.constraint_column_id
      
    ,ModifiedDate = f.modify_date
    FROM sys.foreign_keys f
    INNER JOIN sys.foreign_key_columns fc
      
    ON f.OBJECT_ID = fc.constraint_object_id

    returns:

    image

    4. Get details of the indices created on user tables

    -----------------------------------------------------------
    -- Query: GetIndexDetails
    -- By Marco Schreuder 27-09-2011
    -----------------------------------------------------------
    SELECT IndexName = i.name
      
    ,SchemaName = OBJECT_SCHEMA_NAME(i.OBJECT_ID)
       ,
    ObjectName = OBJECT_NAME(i.OBJECT_ID)
       ,
    ColumnName = COL_NAME(i.OBJECT_ID,ic.column_id)
       ,
    IndexType = i.type_desc
      
    ,IndexID = i.index_id
      
    ,IndexColumnID = ic.index_column_id
      
    ,KeyOrdinal = ic.key_ordinal
      
    ,IsIncludedColumn = ic.is_included_column
      
    ,IsUnique = i.is_unique
      
    ,IsPrimaryKey = i.is_primary_key
    FROM sys.indexes i
    INNER JOIN sys.index_columns ic
      
    ON i.OBJECT_ID = ic.OBJECT_ID AND i.index_id=ic.index_id
    --limit to indexes on user tables
    INNER JOIN sys.tables t
      
    ON i.OBJECT_ID = t.OBJECT_ID

    which returns:

    image

  • Tue
    28
    Jun 11

    SQLBits - Query across the Mersey

    I am very pleased my session "Creating a Meta Data Driven SSIS Solution with Biml" has been selected for SQLBits this year. And I liked to thank the voters and the organization to make this possible.

    SQLBits will be the largest SQL Server Conference in Europe this year with a lot of great technical sessions lined up on Friday September 30 and Saturday October 1. So if you didn't already do so register now and meet me in Liverpool.

    My talk on ETL Automation using Biml will expand on my earlier blog series on Creating a Meta Driven SSIS Solution with Biml.

     

    Liverpool_Waterfront_by_Night

  • Mon
    20
    Jun 11

    Creating a Meta Data Driven SSIS Solution with Biml - 2. Setting the stage

    This is the second blogpost on the creation of a Meta Data Driven SSIS Solution using Biml. In this post we are settings the stage. We are going to create a simple Kimball based data warehouse with 3 dimensions:

    • dimCustomer
    • dimEmployee
    • dimProduct

    and one fact table: factOrder.

    Furthermore we'll implement a staging area and a meta database.

    1. Create the databases:

    USE [master]
    GO
    CREATE DATABASE [MyDwh]
    GO
    CREATE DATABASE [MyDwh_st]
    GO
    CREATE DATABASE [MyDwh_meta]
    GO

    2. Create the staging tables:

    USE [MyDwh_st]
    GO
    CREATE TABLE [dbo].[Northwind_Suppliers](
      
    [SupplierID] [int] NOT NULL,
      
    [CompanyName] [nvarchar](40) NOT NULL,
      
    [InsertAuditID] [int] NULL,
      
    [UpdateAuditID] [int] NULL,
      
    [RowStartDate] [datetime] NULL,
      
    [RowEndDate] [datetime] NULL);
    CREATE TABLE [dbo].[Northwind_Products](
      
    [ProductID] [int] NOT NULL,
      
    [ProductName] [nvarchar](40) NOT NULL,
      
    [SupplierID] [int] NULL,
      
    [CategoryID] [int] NULL,
      
    [InsertAuditID] [int] NULL,
      
    [UpdateAuditID] [int] NULL,
      
    [RowStartDate] [datetime] NULL,
      
    [RowEndDate] [datetime] NULL);
    CREATE TABLE [dbo].[Northwind_Orders](
      
    [OrderID] [int] NOT NULL,
      
    [CustomerID] [nchar](5) NULL,
      
    [EmployeeID] [int] NULL,
      
    [OrderDate] [datetime] NULL,
      
    [InsertAuditID] [int] NULL,
      
    [UpdateAuditID] [int] NULL,
      
    [RowStartDate] [datetime] NULL,
      
    [RowEndDate] [datetime] NULL);
    CREATE TABLE [dbo].[Northwind_OrderDetails](
      
    [OrderID] [int] NOT NULL,
      
    [ProductID] [int] NOT NULL,
      
    [UnitPrice] [money] NOT NULL,
      
    [Quantity] [smallint] NOT NULL,
      
    [Discount] [real] NOT NULL,
      
    [InsertAuditID] [int] NULL,
      
    [UpdateAuditID] [int] NULL,
      
    [RowStartDate] [datetime] NULL,
      
    [RowEndDate] [datetime] NULL);
    CREATE TABLE [dbo].[Northwind_Employees](
      
    [EmployeeID] [int] NOT NULL,
      
    [LastName] [nvarchar](20) NOT NULL,
      
    [FirstName] [nvarchar](10) NOT NULL,
      
    [Title] [nvarchar](30) NULL,
      
    [ReportsTo] [int] NULL,
      
    [InsertAuditID] [int] NULL,
      
    [UpdateAuditID] [int] NULL,
      
    [RowStartDate] [datetime] NULL,
      
    [RowEndDate] [datetime] NULL);
    CREATE TABLE [dbo].[Northwind_Customers](
      
    [CustomerID] [nchar](5) NOT NULL,
      
    [CompanyName] [nvarchar](40) NOT NULL,
      
    [City] [nvarchar](15) NULL,
      
    [Country] [nvarchar](15) NULL,
      
    [InsertAuditID] [int] NULL,
      
    [UpdateAuditID] [int] NULL,
      
    [RowStartDate] [datetime] NULL,
      
    [RowEndDate] [datetime] NULL);
    CREATE TABLE [dbo].[Northwind_Categories](
      
    [CategoryID] [int] NOT NULL,
      
    [CategoryName] [nvarchar](15) NOT NULL,
      
    [InsertAuditID] [int] NULL,
      
    [UpdateAuditID] [int] NULL,
      
    [RowStartDate] [datetime] NULL,
      
    [RowEndDate] [datetime] NULL);

    3. Create the dimension and fact tables

    USE [MyDwh]
    GO

    CREATE TABLE dimCustomer (
      
    CustomerKey INT IDENTITY(1,1) NOT NULL
       ,
    CustomerID NCHAR(5) NOT NULL
       ,
    CustomerName NVARCHAR(40)
       ,
    City NVARCHAR(15)
       ,
    Country NVARCHAR(15)
       ,
    InsertAuditID INT
      
    ,UpdateAuditID INT
      
    ,RowStartDate DATETIME
      
    ,RowEndDate DATETIME);
    GO
    CREATE TABLE dimEmployee (
      
    EmployeeKey INT IDENTITY(1,1) NOT NULL
       ,
    EmployeeID INT NOT NULL
       ,
    EmployeeName NVARCHAR(50)
       ,
    Title NVARCHAR(30)
       ,
    Manager NVARCHAR(50)
       ,
    InsertAuditID INT
      
    ,UpdateAuditID INT
      
    ,RowStartDate DATETIME
      
    ,RowEndDate DATETIME);
    GO
    CREATE TABLE dimProduct (
      
    ProductKey INT IDENTITY(1,1) NOT NULL
       ,
    ProductID INT NOT NULL
       ,
    ProductName NVARCHAR(40)
       ,
    Category NVARCHAR(15)
       ,
    Supplier NVARCHAR(40)
       ,
    InsertAuditID INT
      
    ,UpdateAuditID INT
      
    ,RowStartDate DATETIME
      
    ,RowEndDate DATETIME);
    GO
    CREATE TABLE factOrder (
      
    OrderDateKey INT NOT NULL
       ,
    CustomerKey INT NOT NULL
       ,
    EmployeeKey INT NOT NULL
       ,
    ProductKey INT NOT NULL
       ,
    OrderQuantity INT
      
    ,OrderAmount money
      
    ,DiscountAmount money
      
    ,InsertAuditID INT
      
    ,UpdateAuditID INT
      
    ,RowStartDate DATETIME
      
    ,RowEndDate DATETIME);
    GO

    4. Create the views for the dimensions and fact tables

    USE [MyDwh]
    GO

    CREATE VIEW [dbo].[vwDimCustomer] AS
    SELECT
    [CustomerID]
         
    ,[CompanyName] AS CustomerName
         
    ,[City]
         
    ,[Country]
    FROM  MyDwh_st.dbo.Northwind_Customers;
    GO

    CREATE VIEW vwDimProduct AS
    SELECT
    pr.ProductID
      
    , pr.ProductName
      
    , ca.CategoryName AS Category
      
    , su.CompanyName AS Supplier
    FROM MyDwh_st.dbo.Northwind_Categories AS ca
    INNER JOIN MyDwh_st.dbo.Northwind_Products AS pr ON ca.CategoryID = pr.CategoryID
    INNER JOIN MyDwh_st.dbo.Northwind_Suppliers AS su ON pr.SupplierID = su.SupplierID;
    GO

    CREATE VIEW vwDimEmployee AS
    SELECT
    em.[EmployeeID]
         
    ,em.[FirstName] + ' ' + em.[LastName] AS EmployeeName
         
    ,em.Title
         
    ,ma.[FirstName] + ' ' + ma.[LastName] AS Manager
    FROM MyDwh_st.dbo.Northwind_Employees em
    LEFT OUTER JOIN  MyDwh_st.dbo.Northwind_Employees ma ON ma.[EmployeeID]=em.[ReportsTo];
    GO

    CREATE VIEW vwFactOrders AS
    SELECT
    OrderDateKey=YEAR(o.OrderDate)*10000+MONTH(o.OrderDate)*100+DAY(o.OrderDate)
       ,
    CustomerKey=ISNULL(cu.CustomerKey,-1)
       ,
    EmployeeKey=ISNULL(em.EmployeeKey,-1)
       ,
    ProductKey=ISNULL(pr.ProductKey,-1)
       ,
    OrderQuantity=od.Quantity
      
    , OrderAmount=od.Quantity * od.UnitPrice
      
    , DiscountAmount=CONVERT(money, od.Quantity * od.UnitPrice * od.Discount)
    FROM  MyDwh_st.dbo.[Northwind_OrderDetails] od
    INNER JOIN MyDwh_st.dbo.Northwind_Orders o ON o.OrderID = od.OrderID
    LEFT OUTER JOIN dbo.dimCustomer cu ON cu.CustomerID=o.CustomerID AND cu.RowEndDate>o.OrderDate
    LEFT OUTER JOIN dbo.dimEmployee em ON em.EmployeeID=o.EmployeeID AND em.RowEndDate>o.OrderDate
    LEFT OUTER JOIN dbo.dimProduct pr ON pr.ProductID=od.ProductID AND pr.RowEndDate>o.OrderDate;
    GO

    5. Create the objects for the meta database

    USE [MyDwh_meta]
    GO

    CREATE TABLE [dbo].[SsisPackages](
      
    [PackageID] [int] IDENTITY(1,1) NOT NULL,
      
    [PackageType] [int] NULL,
      
    [PackageName] [nvarchar](50) NULL,
      
    [PackageGroup] [nvarchar](50) NULL,
      
    [SourceInput] [nvarchar](MAX) NULL,
      
    [SourceConnection] [nvarchar](50) NULL,
      
    [DestinationName] [nvarchar](50) NULL,
      
    [DestinationConnection] [nvarchar](50) NULL,
      
    [ErrorFile] [nvarchar](MAX) NULL,
      
    [TruncateDestination] [bit] NULL,
    CONSTRAINT [PK_ssisPackages] PRIMARY KEY CLUSTERED
    [PackageID] ASC));
    GO
    CREATE TABLE [dbo].[SsisConfiguration](
      
    [ConfigurationFilter] [nvarchar](255) NOT NULL,
      
    [ConfiguredValue] [nvarchar](1000) NULL,
      
    [PackagePath] [nvarchar](255) NOT NULL,
      
    [ConfiguredValueType] [nvarchar](20) NOT NULL);
    GO
    CREATE TABLE [dbo].[SsisAudit](
      
    [AuditID] [int] IDENTITY(1,1) NOT NULL,
      
    [PackageID] [int] NOT NULL,
      
    [PackageGUID] [uniqueidentifier] NULL,
      
    [PackageVersionGUID] [uniqueidentifier] NULL,
      
    [ExecutionID] [uniqueidentifier] NULL,
      
    [PackageStartDate] [datetime] NULL,
      
    [PackageEnddate] [datetime] NULL,
      
    [ExtractRowCount] [int] NULL,
      
    [InsertRowCount] [int] NULL,
      
    [UpdateRowCount] [int] NULL,
      
    [ErrorRowCount] [int] NULL,
      
    [TableInitialRowCount] [int] NULL,
      
    [TableFinalRowCount] [int] NULL,
      
    [IsProcessed] [bit] NULL,
    CONSTRAINT [PK_SsisAudit] PRIMARY KEY CLUSTERED
    [AuditID] ASC));
    GO

    CREATE PROCEDURE [dbo].[uspUpdateAuditRow]
       
    @AuditID INT
       
    ,@ExtractRowCount INT
       
    ,@InsertRowCount INT
       
    ,@UpdateRowCount INT
       
    ,@ErrorRowCount INT
       
    ,@TableFinalRowCount INT
    AS
    UPDATE
    [dbo].[SsisAudit]
    SET [PackageEnddate]=SYSDATETIME()
        ,
    [ExtractRowCount]=@ExtractRowCount
       
    ,[InsertRowCount]=@InsertRowCount
       
    ,[UpdateRowCount]=@UpdateRowCount
       
    ,[ErrorRowCount]=@ErrorRowCount
       
    ,[TableFinalRowCount]=@TableFinalRowCount
       
    ,[IsProcessed]=1
    WHERE [AuditID]=@AuditID;
    GO

    CREATE PROCEDURE [dbo].[uspNewAuditRow]
       
    @PackageID INT
       
    ,@PackageGUID UNIQUEIDENTIFIER
       
    ,@PackageVersionGUID UNIQUEIDENTIFIER
       
    ,@ExecutionID UNIQUEIDENTIFIER
       
    ,@PackageStartDate DATETIME
       
    ,@TableInitialRowCount INT
    AS
    INSERT INTO
    [dbo].[SsisAudit]
              
    ([PackageID]
              
    ,[PackageGUID]
              
    ,[PackageVersionGUID]
              
    ,[ExecutionID]
              
    ,[PackageStartDate]
              
    ,[TableInitialRowCount]
              
    ,[IsProcessed])
    VALUES
              
    (@PackageID
              
    ,@PackageGUID
              
    ,@PackageVersionGUID
              
    ,@ExecutionID
              
    ,@PackageStartDate
              
    ,@TableInitialRowCount
              
    ,0);
         
    SELECT CAST(SCOPE_IDENTITY() AS INT) AS AuditID
    GO

    6. Create an environment variable to store the connection

    I'll use package configurations to make it easier to deploy the solution to different servers (production, test, development) These package configuration will hold the information for the several connections. The connection to the meta database is stored in an environment variable. The other package configurations will be stored in the SsisConfigurations table in the meta database.

    So create an environment variable with the name Northwind_Config and value Data Source=.;Initial Catalog=MyDwh_meta;Provider=SQLNCLI10.1;Integrated Security=SSPI;

    Windows Explorer - right click the (my)computer node - select Properties - under tab Advanced click the buttton Environment Variables - under System variables click New.

  • Fri
    22
    Apr 11

    IntelliSense in SQL Server Management Studio 2008

     

    Microsoft introduced its implementation of autocomplete (IntelliSense) in SQL Server Management Studio (SSMS) with version 2008. Great of course but it has some irritating quirks. Whenever you create new objects IntelliSense doesn't recognize them and keeps giving you those red curly lines. Yesterday I finally figured it out:

    Choose menu Edit - IntelliSense - Refresh Local Cache or short-cut: Ctrl+Shift+R

    SSMS 2008 Intellisense