SSIS Pattern: Drop- and rebuild indexes dynamically
Fact- and dimension tables in our data warehouses are queried often. The queries used in our reports and the queries used for processing SSAS cubes create a large workload of select queries. These “select” queries benefit from indexes and will return results faster with appropriate indexes.
However a lot of indexes are a pain when you load the date into your data warehouse. A good practices is therefore to drop your NONCLUSTERED indexes at the start of loading your dimensions and fact tables and rebuilt them after loading has finished.
The pain of your DBA
Your DBA will typically change these indexes periodically based on his performance tuning efforts of the latest workload. He will be very disappointed when after changing these indexes he is confronted with the old indexes after a rebuild of indexes.
- script the currently used indexes
- store them
- drop them
- load the dimensions
- load our fact tables
- rebuild the indexes again.
In this blog post I walk you to the creation of the package with this control flow:
But first let’s focus on the SQL script we need to capture the current indexes. Unfortunately you can not use the SQL Function OBJECT_DEFINITION to generate the code necessary to create an index since that object isn’t supported. So we have to get creative.
The next SQL Statement is based on this blogpost by Simon Sabin. It will return a dataset with 5 columns:
- SchemaName
- ObjectName
- IndexName
- DropIndexStatement
- CreateIndexStatement
The SQL Statement to return this list of indexes:
SELECT SchemaName = SCHEMA_NAME (o.SCHEMA_ID)
,ObjectName = o.name
,IndexName = i.name
,DropIndexStatement = 'DROP INDEX [' + i.Name + '] ON [' + SCHEMA_NAME (o.SCHEMA_ID)
+ '].[' + o.name + ']'
,CreateIndexStatement = 'CREATE ' + (CASE i.is_unique WHEN 1 THEN 'UNIQUE ' ELSE '' END)
+ 'NONCLUSTERED INDEX ['
+ i.name + '] ON [' + SCHEMA_NAME (o.SCHEMA_ID) + '].[' + o.name + '] ( '
+ LEFT(list, ISNULL(splitter-1,LEN(list))) + ' ) '
+ ISNULL('INCLUDE ( ' + SUBSTRING(list, indCol.splitter +1, 100) + ' ) ','')
+ ISNULL('WHERE ' + i.filter_definition,'')
FROM sys.indexes i
JOIN sys.objects o ON i.OBJECT_ID = o.OBJECT_ID
CROSS APPLY
(SELECT splitter = NULLIF(CHARINDEX('|',indexCols.list),0)
, list
FROM (SELECT list=CAST((
SELECT CASE WHEN sc.is_included_column = 1 AND sc.ColPos = 1 THEN '|' ELSE '' END +
CASE WHEN sc.ColPos > 1 THEN ', ' ELSE '' END + name +
CASE WHEN sc.is_descending_key=1 THEN ' DESC' ELSE '' END
FROM (SELECT sc.is_descending_key
, sc.is_included_column
, index_column_id
, name = '[' + name + ']'
, ColPos = ROW_NUMBER() OVER (PARTITION BY sc.is_included_column
ORDER BY sc.index_column_id)
FROM sys.index_columns sc
JOIN sys.columns c ON sc.OBJECT_ID = c.OBJECT_ID
AND sc.column_id = c.column_id
WHERE sc.index_id = i.index_id
AND sc.OBJECT_ID = i.OBJECT_ID ) sc
ORDER BY sc.is_included_column, ColPos
FOR XML PATH (''), TYPE) AS VARCHAR(MAX))
)indexCols
) indCol
WHERE i.is_primary_key=0
AND i.is_unique_constraint=0
AND i.type_desc=N'NONCLUSTERED'
AND o.TYPE=N'U'
ORDER BY SchemaName, ObjectName, IndexName
The Steps to change/create the master package:
1. Open the master package that you use to load the dimensions and facts.
2. Add Variables:
- IndexList with Data Type=Object
- DropIndexStatement with Data Type =String
- CreateIndexStatement with Data Type =String
3. Add an Execute SQL Task to the Control Flow
- Change Name to Get IndexList
- Change ResultSet to Full result set
- Select the target connection under Connection
- Copy the SQL statement (see above) as the SQLStatement
- Select the tab Result Set
- Choose Add and use 0 as the Result Name and Select User::IndexList as the Variable Name
4. Add a Foreach Loop Container to the Control Flow and double click on it to open the Foreach Loop Editor
- Under the tab General change Name and Description to Drop each index
- Under the tab Collection change Enumerator to Foreach ADO Enumerator and select User::IndexList from the dropdown box as the ADO Object source variable
5. Drop an Execute SQL Task in the Drop each index container and double click on it to open the Execute SQL Task Editor.
- Change Name and Description to Drop Index
- Select the target connection under Connection
- Select Variable as the SQLSourceType
- Select User::DropIndexStatement as the SourceVariable
6. Add a Foreach Loop Container to the Control Flow and double click on it to open the Foreach Loop Editor
- Use Create each index as name for the Foreach loop container
- In the Collection tab of the Foreach Loop Editor select Foreach ADO Enumerator and use the User::IndexList as ADO Object source variable
- In the Variable Mappings tab Select User::CreateIndexStatement as variable and 4 as index
7. Drop an Execute SQL Task in the Drop each index container and double click on it to open the Execute SQL Task Editor.
- Change Name and Description to Create Index
- Select your Connection under Connection
- Select Variable as the SQLSourceType
- Select User::CreateIndexStatement as the SourceVariable
8. Change the Precedence Constraints to map the example of the afore mentioned control flow.