Using SQL Server 2012 Window Functions to Solve Common T-SQL Challenges

September 10, 2012 at 1:55 AMSteven Wang

Thanks to all who attended my session at the TechED New Zealand 2012. I hope that you have enjoyed the session. I have attached the presentation slides deck and Demo code in this blog.

Session Abstract:

SQL Server 2012 has introduced several new window functions and enhanced support for window aggregate functions by introducing window order and frame clauses, support for offset functions. In this session, the presenter will apply these new functions to solve some most frequently asked questions in MSDN T-SQL forum. If you have ever been faced with a challenge of how to calculate moving average, identify gap between records, combine consecutive and/or overlapped records and calculate running totals etc., then this session is for you.

 

Session Video:

Using SQL Server 2012 Window Functions 

Session Slides:

DBI309_Using_Window_Functions.pptx (1.79 mb)

Demo Code and DB:

TechEdNZ2012.bak (5.27 mb)

TechEdNZ_2012_Demo.zip (258.64 kb)

 

 

An Odd Behaviour to Use SQL Server 2012 Profiler for Analysis Service From SSMS

July 19, 2012 at 7:15 PMSteven Wang

 Environment:

1. Windows server 2008 R2 enterprise edition

2. A new server with multiple SQL server 2012 Analysis Service instances installed.

3. SSMS is installed.

4. SQL Server 2012 database engine is not installed in the server.

5. SQL Server 2012 CU2 is applied

Problem:

When I tried to use the SQl Server Profile on SSMS from the new server to run a trace on any SQL server 2012 SSAS instance as below:

I got the error message below.

Indeed this error message is not new to SQL server 2012. Nikesh Mhatre had an article on MSDB blog for this issue. Please see this link: Unable to read trace definition file Microsoft SQL Server TraceDefinition n.n1.n2.xml

However, the problem I had is different even though the error message is similar.

I then used another server which has both SQL server database engine and SSAS installed, and I used the same way trying to connect to the server I mentioned above. I still got the same error message.

 

Solution:

The solution to this issue is indeed very simple. There are 2 solutions:

Solution 1: Instead launch SQL Server Profiler from SSMS, you launch the Profiler from Start, All Program, Microsoft SQL server 2012, Performance tools, then click SQL Server Profiler as depicted below. By launching the profiler from here, I have no problem to connect to the SSAS instance. 

 

 Solution 2: After I launched the Profiler from SSMS, I used the SQL Server 2012 SQL Server Profiler to connect to any SQL Server either SQL server 2008 R2 or SQL server 2012 by using database Engine as Server Type first as below. It is no problem.

After I connected to a Database engine trace, I was able to launch a new trace on the SSAS instance with no problem. It seems very odd.

 

I’m not sure if this is a bug on launching SQL server profiler from SSMS 2012 for SSAS instances or there is any specific permission needs to be set.

I hope this post helps if you have same problem. If you know the rooting cause of the problem, please let me know. thanks.

Use 4 Different Ways to Monitor Database Log Size and its Usage

April 28, 2012 at 3:02 AMSteven Wang

In SQL server, there are often different ways to achieve the same goal. This is also true for monitoring the database log size and its usage.

1. DBCC SQLPERF(LOGSPACE)

Typically, we use DBCC SQLPERF(LOGSPACE) to monitor the log size and percent of space used for all databases. This script is very simple and straightforward to use. However, if you want to progmatically use the information output by the DBCC command, then there is a little bit work to do. The script below is one example which can give you the log space information programtically by using DBCC SQLPERF(LOGSPACE):

Declare @LogUsage Table (
							Database_Name sysname
							, Log_Size_MB Float
							, Log_Used_Percent Decimal(8,5)
							, Status tinyint
						);

Insert Into @LogUsage
Exec ('DBCC SQLPERF (LOGSPACE);');

Select	Database_Name
		, Log_Size_MB
		, Log_Size_MB * Log_Used_Percent / 100 As Log_Used_Size_MB
		, Log_Used_Percent
From	@LogUsage

2. sys.dm_os_performance_counters  

The second and less commonly used technique to monitor the log space usage is to use the sys.dm_os_performance_counters dynamic management view. This view keeps the records of the log file size and also data file size as counter name for all databases. As I only concern the log file size for this purpose, I use the script below to monitor the log space usage:

Select		Database_Name
			, [Log File(s) Size (KB)] / 1024.0 As Log_Size_MB
			, [Log File(s) Used Size (KB)] / 1024.0 As Log_Used_Size_MB
			,  ([Log File(s) Used Size (KB)] * 1.0 / [Log File(s) Size (KB)]) * 100 As Log_Used_Percent
From
(
select	instance_Name as Database_Name, counter_name, cntr_value as Log_Size
From	sys.dm_os_performance_counters
Where	counter_name in		(
								'Log File(s) Size (KB)'
								, 'Log File(s) Used Size (KB)'
							)     
			And
		instance_name not in	(
									'mssqlsystemresource'
									, '_Total'
								)  
) X
Pivot	(	Max(Log_Size)
			For Counter_Name 
			In
				(
					[Log File(s) Size (KB)]
					, [Log File(s) Used Size (KB)]
				)  
		) myPiv

If you only concern one database then you can limit the database name in instance_name. Although I calculated the log usage percent, it has indeed a count name 'Percent Log Used' for the sys.dm_os_performance_counters view, but it has no decimal points for the percent value. 

 3. sys.database_files

 The sys.database_files catalog view has information for data files and log file of the current refferenced database. This is different with the preceding 2 ways that can return back the log information for all databases.  As there is no space used information in this view, we need to use the fileproperty function to get the log spaced used value as below:

Select	DB_Name()
		, size / 128.0 As Log_Size_MB
		, FILEPROPERTY(name, 'spaceused') / 128.0 As Log_Used_Size_MB
		, (FILEPROPERTY(name, 'spaceused') * 1.0 / size) * 100 As Log_Used_Percent
From	sys.database_files
Where	type_desc = 'LOG'

Although this script is for log space usage only, it can also be used to monitoring data files size by taking off the where clause.

4. sys.dm_db_log_space_usage (SQL 2012 only)

This dynamic management view is not even documented in the BOL, but this is very simple view. As the view return back the log size in bytes, it may be more useful if converting the bytes to MBs as below:

Select	DB_Name(Database_id)
		, total_log_size_in_bytes / 1024 / 1024.0 As Log_Size_MB
		, used_log_space_in_bytes / 1024 / 1024.0 As Log_Used_Size_MB
		, used_log_space_in_percent AS Log_Used_Percent
From		sys.dm_db_log_space_usage

Given that, this view is so simple I believe it will be easily adopted. However, this view only returns the log space statistics for the current database only.

In SQL server 2012, there is indeed another DMV addded for the data file space as well which is: sys.dm_db_file_space_usage.

 

 

SQL Server 2012 Columnstore Index In Depth Part 3: the Myth of an Extra Column in sys.column_store_segments

April 22, 2012 at 4:47 AMSteven Wang

There 2 new system catalog views added into SQL server 2012 for columnstore indexes:

1) sys.column_store_segments

2) sys.column_store_dictionaries

(Note: there was another view: sys.column_store_index_stats before the RTM version, but it is no long available in the final release. Not sure why this is dropped.)

These 2 system views are very useful to find out more about an columnstore index in your database, like segments, datasize etc.

For both system catalog views, the BOL states that 'Contains a row for each column in a columnstore index', which means each column in a columnstore index will have a row in these system viewss. While this statement is true, the reverse is not always. There are some times that you might notice that there is an extra column in  sys.column_store_segments view. What is this extra column? let's go to find it out.

As the original FactResellerSales table is a small table, we can use this table to perform the test for this purpose. I will create a nonclustered columnstore index directly on this table for all the columns (27 columns)

Use [AdventureWorksDW2012]
Go

If Exists (	Select *
			From sys.indexes
			Where	type = 6
					And Object_id = Object_id('dbo.FactResellerSales')
			)
	Begin
		Declare @IndexName sysname
				, @SQL nvarchar(max)

		Select	@IndexName = Name
		From	sys.indexes
		Where	type = 6
				And Object_id = Object_id('dbo.FactResellerSales')
		Set @SQL = 'DROP INDEX ' + quotename(@IndexName) + ' ON [dbo].[FactResellerSales];'
		
		Exec (@SQL);

		--DROP INDEX [idx_FactResellerSales_CI] ON [dbo].[FactResellerSales];
	End
--columns

CREATE NONCLUSTERED COLUMNSTORE INDEX [idx_FactResellerSales_CI] 
ON [dbo].[FactResellerSales]
(
	[ProductKey],
	[OrderDateKey],
	[DueDateKey],
	[ShipDateKey],
	[ResellerKey],
	[EmployeeKey],
	[PromotionKey],
	[CurrencyKey],
	[SalesTerritoryKey],
	[SalesOrderNumber],
	[SalesOrderLineNumber],
	[RevisionNumber],
	[OrderQuantity],
	[UnitPrice],
	[ExtendedAmount],
	[UnitPriceDiscountPct],
	[DiscountAmount],
	[ProductStandardCost],
	[TotalProductCost],
	[SalesAmount],
	[TaxAmt],
	[Freight],
	[CarrierTrackingNumber],
	[CustomerPONumber],
	[OrderDate],
	[DueDate],
	[ShipDate]
);
GO

Select	S.*
From	sys.column_store_segments S
			Inner Join
		sys.Partitions P
			On	S.hobt_id = P.hobt_id
			Inner Join
		sys.indexes I
			On	i.object_id = P.object_id
				And I.index_id = P.index_id
Where	I.Object_id = Object_Id('FactResellerSales')
		And I.Type = 6

From the results we can see that there is exactly 27 columns in the sys.column_store_segments table. So far so good. now let's create an exctaly same structure table from FactResellerSales without any B-Tree indexes, and create a noclustered columnstore index on it:

Use [AdventureWorksDW2012]
Go
Select	*
Into	FactResellerSales_Heap
From	[dbo].[FactResellerSales];

CREATE NONCLUSTERED COLUMNSTORE INDEX [idx_FactResellerSales_Heap_CI] 
ON [dbo].[FactResellerSales_Heap]
(
	[ProductKey],
	[OrderDateKey],
	[DueDateKey],
	[ShipDateKey],
	[ResellerKey],
	[EmployeeKey],
	[PromotionKey],
	[CurrencyKey],
	[SalesTerritoryKey],
	[SalesOrderNumber],
	[SalesOrderLineNumber],
	[RevisionNumber],
	[OrderQuantity],
	[UnitPrice],
	[ExtendedAmount],
	[UnitPriceDiscountPct],
	[DiscountAmount],
	[ProductStandardCost],
	[TotalProductCost],
	[SalesAmount],
	[TaxAmt],
	[Freight],
	[CarrierTrackingNumber],
	[CustomerPONumber],
	[OrderDate],
	[DueDate],
	[ShipDate]
);
GO

Select	S.*
From	sys.column_store_segments S
			Inner Join
		sys.Partitions P
			On	S.hobt_id = P.hobt_id
			Inner Join
		sys.indexes I
			On	i.object_id = P.object_id
				And I.index_id = P.index_id
Where	I.Object_id = Object_Id('FactResellerSales_Heap')
		And I.Type = 6

This query returns back 28 rows and the maximun column_id value of sys.column_store_segments is 28. This 28th column in the sys.column_store_segments table is an extra column than the base table. Before we try to conclude what this column is, I'm going to do another test.  I will create an exctaly same structure table from FactResellerSales and create an non-unique clustered index on the new created table, and then create a noclustered columnstore index on it:

Use [AdventureWorksDW2012]
Go
Select	*
Into	FactResellerSales_Clustered_nonunique
From	[dbo].[FactResellerSales];

Create clustered index idx_FactResellerSales_Clustered_nonunique_1 on  [dbo].[FactResellerSales_Clustered_nonunique] ([OrderDateKey]);

CREATE NONCLUSTERED COLUMNSTORE INDEX [idx_FactResellerSales_Clustered_nonunique_CI] 
ON [dbo].[FactResellerSales_Clustered_nonunique]
(
	[ProductKey],
	[OrderDateKey],
	[DueDateKey],
	[ShipDateKey],
	[ResellerKey],
	[EmployeeKey],
	[PromotionKey],
	[CurrencyKey],
	[SalesTerritoryKey],
	[SalesOrderNumber],
	[SalesOrderLineNumber],
	[RevisionNumber],
	[OrderQuantity],
	[UnitPrice],
	[ExtendedAmount],
	[UnitPriceDiscountPct],
	[DiscountAmount],
	[ProductStandardCost],
	[TotalProductCost],
	[SalesAmount],
	[TaxAmt],
	[Freight],
	[CarrierTrackingNumber],
	[CustomerPONumber],
	[OrderDate],
	[DueDate],
	[ShipDate]
);
GO

Select	S.*
From	sys.column_store_segments S
			Inner Join
		sys.Partitions P
			On	S.hobt_id = P.hobt_id
			Inner Join
		sys.indexes I
			On	i.object_id = P.object_id
				And I.index_id = P.index_id
Where	I.Object_id = Object_Id('FactResellerSales_Clustered_nonunique')
		And I.Type = 6

Again the result returns 28 rows and the 28th column in the sys.column_store_segments table is an extra column than the base table.

Now we have a clear picture:

So why the nonclustered columnstore index on a heap table will have an extra column in sys.column_store_segments?

It seems that the nonclucstered columnstore indexes like a B-Tree nonclustered index need a RID (Row ID) to refer back to the base heap table.

For a table with a none unique colustered index, a 4 byte integer uniquifier is added into the table. We can examin this in three different ways.

If you use the old sysindexes table there is a keycnt column will tell you exactly how many fields are used to form a key for the index.

Select	'FactResellerSales_Clustered_Nonunique' As Table_Name
		, Keycnt as clustered_index_keys
From	sysindexes A, sys.indexes B
Where	A.id = Object_id ('FactResellerSales_Clustered_Nonunique')
		And A.id = b.object_id
		And A.indid = B.index_id
		And	B.Type = 6

This returns 2. as we only used ([OrderDateKey]) to create the idx_FactResellerSales_Clustered_nonunique_1 clustered index, which indicate that an unique identifier is added by system. The sysindexes is an SQL server 2000 system table, which is included as a view for backward compatibility only. Although it is still available now, it is better to avoid use this and this view merely tells the count of keys with no more further information.

There is a new undocumented system view: sys.system_internals_partition_columns, which has much detailed information about columns in a table partition. From this view, I can clearly get the aditional uniquifier column is added for this nonunique clustered index. Run the script below:

Select	Object_Name(B.object_id) As Table_Name
		, partition_column_id
		, Col_Name(B.object_id, partition_column_id) As Column_Name
		, C.Name AS Data_Type
		, Key_ordinal
		, Count(Key_Ordinal) Over() As Key_Count --Total number of keys for the clustered index
		, is_uniqueifier
From	sys.system_internals_partition_columns A
			Inner Join
		sys.partitions B
			On	A.partition_id = B.partition_id
			Inner Join
		sys.types C
			On	A.system_type_id = C.system_type_id
Where	B.object_id = object_id('FactResellerSales_Clustered_nonunique')
		And B.index_id = 1		--Clustered index
		And key_ordinal != 0	--Only interested in the Clustered index key columns

 

From the above result, we can see that a uniquifier is added and the data type is Integer and there is no column name associated with it.

We also can use the DBCC Page to delve into the physical data page to see the uniquifier is added in the page level.

Use [AdventureWorksDW2012]
Go

Select	P.database_id
		, P.allocated_page_file_id
		, P.allocated_page_page_id
		, P.allocation_unit_type_desc
From	sys.indexes I
			Cross Apply
		sys.dm_db_database_page_allocations(DB_ID(), I.object_id, I.index_id, Null, 'limited') P
Where 
		I.object_id = Object_id ('FactResellerSales_clustered_nonunique')
		And P.is_iam_page = 0
		And P.is_allocated = 1
		And P.is_mixed_page_allocation = 0
		And I.Type = 1  --Type 0 is heap
Go

DBCC traceon(3604, -1)
Go

DBCC page (
			7,				--Database ID
			1,				--Page File ID
			1033784,		--Page ID, check with your own pageid
			3				--Commnad output style
			)

The highlighted page output tells that a 4 byte integer uniquifier is added.

 

This finishes this post. Thanks for reading.

Previous Related posts:

1. SQL Server 2012 Columnstore Index In Depth Part 1: Make a bigger AdventureworksDW database for Testing

2. SQL Server 2012 Columnstore Index In Depth Part 2: Columnstore Index Physical Data and Size Calculation   

 

 

SQL Server 2012 Columnstore Index In Depth Part 2: Columnstore Index Physical Data and Size Calculation

April 20, 2012 at 3:41 AMSteven Wang

I my previous blog SQL Server 2012 Columnstore Index In Depth Part 1: Make a bigger AdventureworksDW database for Testing, I have created a bigger FactResellerSalesBig table in AdventureworksDW2012 database and a nonclustered columnstore index IX_FactResellerSalesBig_NCI  is created. Now we can check how a columnstore index is stored and calculate the size of the nonclustered columnstore index as a whole or even an individual column within the columnstore index.

1. How does a Columnstore Index Store Data

A columnstore index is broken into one or multiple segments for each partition (when a table is not partitioned it has only one partition), which consist of multiple physical data pages. A full segment contains 1,048,576 rows. Based on the number of threads used to create a columnstore index, the ending segments group (normally the number of threads used) for a column in each partition might have different row number which is smaller than 1,048,576. If a base table only has less than 1,048,576 rows then only 1 segment per column will be created.

A columnstore index segment is internally stored as a BLOB data type. This can be verified by using DBCC Page command.

Before using DBCC Page command, we need to know the page ids allocated to the columnstore index we are going to examine. As we are using SQL server 2012, we now have 2 ways to get the page ids:

a) use DBCC IND

b) use the new added Dynamic management view in SQL server 2012: sys.dm_db_database_page_allocations

Although they are both undocumented, to use sys.dm_db_database_page_allocations has many advantages over DBCC IND. I am not going to discuss details about this DM view. Kalen Delaney has an article on SQLMag: SQL server's Undocumented Changes, which has a good explanation on how to use this DM view.

For this post, based on the table FactResellerSalesBig I created (refer to my last post), I will use the below query to get the page informations:

 

Use [AdventureWorksDW2012]
Go

Select	P.database_id
		, P.allocated_page_file_id
		, P.allocated_page_page_id
		, P.allocation_unit_type_desc
From	sys.indexes I
			Cross Apply
		sys.dm_db_database_page_allocations(DB_ID(), I.object_id, I.index_id, Null, 'limited') P
Where 
		I.object_id = Object_id ('FactResellerSalesBig')
		And P.is_iam_page = 0
		And P.is_allocated = 1
		And P.is_mixed_page_allocation = 0
		And I.Type = 6  --Type 6 is 'NONCLUSTERED COLUMNSTORE'
Go

 From the demo database AdventureworksDW2012 in my computer, I got the below results:

 From the result, we already can know that the data type used to store the columnstore index is BLOB data type. But we can further delve into the physical page level to see how data is stored.

DBCC page (
			7,				--Database ID
			1,				--Page File ID
			270800,			--Page ID
			3				--Commnad output style
			)

 

The highlighted information from the DBCC Page output has testified that the BLOB data is used to store columnstore indexes. 

 

2. Calculate the Size of A Nonclustered Columnstore Index

Based on the Columnstoreindex book online, We can calculate the size of the nonclustered columnstore index IX_FactResellerSalesBig_NCI as below:

SELECT SUM(on_disk_size_MB) AS TotalSizeInMB
FROM
(
   (SELECT SUM(css.on_disk_size)/(1024.0*1024.0) on_disk_size_MB
    FROM sys.indexes AS i
    JOIN sys.partitions AS p
        ON i.object_id = p.object_id 
    JOIN sys.column_store_segments AS css
        ON css.hobt_id = p.hobt_id
    WHERE i.object_id = object_id('FactResellerSalesBig') 
    AND i.type_desc = 'NONCLUSTERED COLUMNSTORE') 
  UNION ALL
   (SELECT SUM(csd.on_disk_size)/(1024.0*1024.0) on_disk_size_MB
    FROM sys.indexes AS i
    JOIN sys.partitions AS p
        ON i.object_id = p.object_id 
    JOIN sys.column_store_dictionaries AS csd
        ON csd.hobt_id = p.hobt_id
    WHERE i.object_id = object_id('FactResellerSalesBig') 
    AND i.type_desc = 'NONCLUSTERED COLUMNSTORE') 
) AS SegmentsPlusDictionary

The result of this query is: 242.34 MB. However, from the database data storage point of view, this calculation does not truly reflect the size of the index data pages in the database. The data size from both systerm catalog views are the BLOB data size which doesn't account for the metadata and page headers and other overheads for a page to hold the blob data.

Thanks to Eric Hanson, the Principal Progran Manger at Microsoft, he cleared my doubt and confirmed that my calculation below is accurate.

SELECT	SUM(s.used_page_count) / 128.0 on_disk_size_MB
FROM	sys.indexes AS i
		JOIN sys.dm_db_partition_stats AS S
			ON	i.object_id = S.object_id 
				and I.index_id = S.index_id
WHERE	i.object_id = object_id('FactResellerSalesBig') 
		AND i.type_desc = 'NONCLUSTERED COLUMNSTORE'

The result of this query is: 250.37 MB

As the size of base table FactResellerSalesBig table is 2626 MB, we can get that the columnstore index compression ratio is around 10 : 1.

3. Calculate the Individual Column Size of A Nonclustered Columnstore Index

As columnstore indexes store column data separately, we can calculate each compressed column data size for a columnstore index. The column size can be caculated by using the below script. This script was orginally written by Eric Hanson, the Principal Program Manager of Microsoft. I have made a modification to take care of the situation when not all the columns are included in a columnstore index.

-- size per column 
with segment_size_by_column as ( 
	SELECT 
			p.object_id as table_id, 
			css.column_id, 
			SUM (css.on_disk_size)/1024/1024.0 AS segment_size_mb 
	FROM	sys.partitions AS p 
				JOIN 
			sys.column_store_segments AS css 
				ON p.hobt_id = css.hobt_id 
	GROUP BY p.object_id, css.column_id 
), 
dictionary_size_by_column as ( 
	SELECT 
			p.object_id as table_id, 
			csd.column_id, 
			SUM (csd.on_disk_size)/1024/1024.0 AS dictionary_size_mb 
	FROM	sys.partitions AS p 
				JOIN 
			sys.column_store_dictionaries AS csd 
				ON p.hobt_id = csd.hobt_id 
	GROUP BY p.object_id, csd.column_id 
) 

--As in the real world, not all the columns in a table will be or can be included in a nonclustered columnstore index, we need to join
--to the sys.index_columns to get the correct column id.
Select 
		Object_Name(s.table_id) as table_name, 
		C.column_id,	
		col_name(S.table_id, C.column_id) as column_name, 
		s.segment_size_mb, 
		d.dictionary_size_mb, 
		s.segment_size_mb + isnull(d.dictionary_size_mb, 0) total_size_mb 
from	segment_size_by_column s 
			join 
		sys.indexes I -- Join to sys.Indexes system table
			ON I.object_id = s.table_id 
			join 
		sys.index_columns c --Join to sys.Index_columns
			ON	c.object_id = s.table_id 
				And I.index_id = C.index_id
				and c.index_column_Id = s.column_id --Need to join to the index_column_id with the column_id from segment_size_by_column
			left outer join 
		dictionary_size_by_column d 
			on	s.table_id = d.table_id 
				and s.column_id = d.column_id 
Where	I.type_desc =  'NONCLUSTERED COLUMNSTORE'
order by total_size_mb desc

  (Note: If you have problem to copy and paste the script into the SSMS, copy it and paste to Office Word first and then paste to SSMS.)

 

Thanks for reading.

 

 

xVelocity Memory Optimized Columnstore indexes: A Gem of SQL Server 2012

April 18, 2012 at 5:06 AMSteven Wang

I was presenting at Auckland SQL User Group for the titled session.

Abstract:

The columnstar indexes, formerly known as project "Apollo", are new future of SQL server 2012. Columnstore indexes deliver the blazing-fast query performance, particularly for the star-joining queries. It is not surprised to notice a query performance increase up to 100 times faster. In this session, Steven will take you through the nitty-gritty and internals of columnstore indexes; he will also talk though the columnstore index do's and don'ts, limitations, tips and tricks, and the possible impact on the future BI solution considerations.

Thanks for all attendants!

I have attached the slides deck and SQL code in this blog.

 

ColumnstoreIndex_SQLUserGroup.pptx (1.46 mb)

SQL_Scripts.zip (15.09 kb)

SQL Server 2012 Columnstore Index In Depth Part 1: Make a bigger AdventureworksDW database for Testing

April 12, 2012 at 4:00 AMSteven Wang

The new xVelocity memory optimized columnstore index feature in SQL server 2012 are designed to deliver blazing-fast performance for large data warehouse quries. But I believe that for many people who are interested in this fantastic new feature might have a problem to find a big sample data warehouse database to try the new columnstore features, like batch-mode processing, segment elimination, the size of memory grant etc.

The new sample data warehouse database: AdventureworksDW2012, which you can download from msftdbprodsamples.codeplex.com is not big enough for us to experience many features of columnstore index. We need to make it bigger to explore the columnstore index features.

Once the AdventureworksDW2012 Data file is downloaded, use the below script to attach the database:

USE [master]
GO
Declare @Filepath nvarchar(128)
Declare @SQL nvarchar(max)

Set	@Filepath = 'H:\Applications\database\SQL 2012\SampleDBs\adventureworks\' 
				--Change the path to your file path
				--Where your downloaded AdventureWorksDW2012_Data.mdf is located

Set @Filepath = IIF(Right(@Filepath,1) = '\', @Filepath, @Filepath + '\') + 'AdventureWorksDW2012_Data.mdf'

Set	@SQL = 'CREATE DATABASE [AdventureWorksDW2012] ON ( FILENAME = ''' + @Filepath + ''') FOR ATTACH'

Exec (@SQL)
GO

 

THe below diagram shows the table size of the AdventureworksDW2012 sample database: 

The biggest table in AdventureworksDW2012 is FactProductInventory with 776286 rows. The table is too narrow to be used for columnstore index testing. and the row size below 1 million is also too small.

A good candidate for creating the nonclustered columnstore index on will be FactResellerSales. But this table has only got 60,855 rows. In order to get a good performance comparison between the B-Tree index and columnstore index and other columnstore features , a table with about 100 millions of rows will be ideal. However, in most case, we might don't have such capacity in our laptop or home computer. I decide to pump up this table to over 10 millions of rows as this size will be suitable for most of users and it is also good enough for us to experience the most of columnstore indexes.

First of all, we don't want to simply duplicate the existing rows as the repeating data in the table will skew the view of the query performance we get from the columnstore index. By analyzing the FActResellserSales table, the easiest way is to bigger the DimReseller table as a particualar salesordernumber in the table is bound to a reseller. I will repeat the each salesorder line (salesorder number + salesorderlinenumber is the primary key of FactSalestable) 180 times with random picked reseller and randomized order quantity to pump up the table to 10,953,900 rows.

As originally the DimReseller has only 701 rows, I will first make a bigger DimReseller table. The below script will make a DimResellerBig table which is 50 times bigger than DimReseller table:

USE [AdventureworksDW2012]
GO
--Create a bigger table for DimReseller
SELECT 
		Cast(Row_Number() Over(Order by (Select 0)) as int) As [ResellerKey]
		, A.[GeographyKey]
		, Cast('AW' + Format(Row_Number() Over(Order by (Select 0)), '0000000000') as Nvarchar(15)) As [ResellerAlternateKey]
		, A.[Phone]
		, A.[BusinessType]
		, Cast(A.[ResellerName] +  Format(Row_Number() Over(Order by (Select 0)), ' 0000000000') As nvarchar(50)) As ResellerName
		, A.[NumberEmployees]
		, A.[OrderFrequency]
		, A.[OrderMonth]
		, A.[FirstOrderYear]
		, A.[LastOrderYear]
		, A.[ProductLine]
		, A.[AddressLine1]
		, A.[AddressLine2]
		, A.[AnnualSales]
		, A.[BankName]
		, A.[MinPaymentType]
		, A.[MinPaymentAmount]
		, A.[AnnualRevenue]
		, A.[YearOpened]
Into	DimResellerBig
FROM	[dbo].[DimReseller] A
			Cross Join
		Master..spt_values B
Where	B.Type = 'P'
			And
		B.Number Between 1 and 50
GO


--Add indexes and constraints for DimResellerBig
Alter Table dbo.DimResellerBig Alter Column [ResellerKey] Int Not Null;

ALTER TABLE [dbo].[DimResellerBig] ADD  CONSTRAINT [PK_DimResellerBIG_ResellerKey] PRIMARY KEY CLUSTERED 
(
	[ResellerKey] ASC
);

ALTER TABLE [dbo].[DimResellerBig] ADD  CONSTRAINT [AK_DimResellerBig_ResellerAlternateKey] UNIQUE NONCLUSTERED 
(
	[ResellerAlternateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_DimResellerBig_GeographyKey] ON [dbo].[DimResellerBig]
(
	[GeographyKey] ASC
);

GO

 

Once I have the DimResellerBig table, I can use the below script to make my big FactResellerSalesBig table:

USE [AdventureWorksDW2012]
GO
--The script is used to make a bigger FactResellerSales table, called FactResellerSalesBig.
--When a RowMultiplier 180 is used the FactResellerSalesBig table will be 10,953,900 rows
--The query takes 25 minutes to run in my laptop with a 4-core processor, a 7200 rpm hard disk and 8 GB RAM


Declare		@OrderCountTable Table (OrderDateKey INT, OrderCount Int)
Declare		@OrderDateKey Int = 2012
			, @OrderCount int
			, @RowMultiplier int = 180	--180 times more rows than the original FactResellerSales Table
										--you can make it bigger or smaller by change this number

Insert Into	@OrderCountTable
SELECT		Distinct [OrderDateKey]
			, Count(Distinct SalesOrderNumber) As OrderCount
FROM		[dbo].[FactResellerSales]
Group by	[OrderDateKey];


If Object_ID('[dbo].[FactResellerSalesBig]') Is Not Null
		Drop Table [dbo].[FactResellerSalesBig];

Select	Top(0) *
Into	FactResellerSalesBig
From	[dbo].[FactResellerSales];

While Exists (Select * From @OrderCountTable)
	Begin
		Select	Top(1) 
				@OrderDateKey = OrderDateKey
				,  @OrderCount =  OrderCount
		From	@OrderCountTable;

		Insert into FactResellerSalesBig with(Tablock)
		Select
				R.[ProductKey]
				, R.[OrderDateKey]
				, R.[DueDateKey]
				, R.[ShipDateKey]
				, Y.[ResellerKey]
				, R.[EmployeeKey]
				, R.[PromotionKey]
				, R.[CurrencyKey]
				, Y.[SalesTerritoryKey]
				, Cast(R.[SalesOrderNumber] + Format(Y.RowNum, '000') AS nvarchar(20)) As SalesOrderNumber
				, R.[SalesOrderLineNumber]
				, R.[RevisionNumber]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) As OrderQuantity
				, R.[UnitPrice]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] As ExtendedAmount
				, R.[UnitPriceDiscountPct]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * R.[UnitPriceDiscountPct] As DiscountAmount
				, R.[ProductStandardCost]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) *  R.[ProductStandardCost] As TotalProductCost
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * (1 - R.[UnitPriceDiscountPct]) As SalesAmount
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * (1 - R.[UnitPriceDiscountPct]) * 0.08 As TaxAmt
				, R.[Freight]
				, Cast(R.[CarrierTrackingNumber] +  Format(Y.RowNum, '-000') As nvarchar(25)) As CarrierTrackingNumber
				, Cast(R.[CustomerPONumber] + Format(Y.RowNum, '000') AS nvarchar(25)) As CustomerPONumber
				, R.[OrderDate]
				, R.[DueDate]
				, R.[ShipDate]
		From
			(
				SELECT 
						[ProductKey]
						,[OrderDateKey]
						,[DueDateKey]
						,[ShipDateKey]
						,[ResellerKey]
						,[EmployeeKey]
						,[PromotionKey]
						,[CurrencyKey]
						,[SalesTerritoryKey]
						,[SalesOrderNumber]
						,[SalesOrderLineNumber]
						,[RevisionNumber]
						,[OrderQuantity]
						,[UnitPrice]
						,[ExtendedAmount]
						,[UnitPriceDiscountPct]
						,[DiscountAmount]
						,[ProductStandardCost]
						,[TotalProductCost]
						,[SalesAmount]
						,[TaxAmt]
						,[Freight]
						,[CarrierTrackingNumber]
						,[CustomerPONumber]
						,[OrderDate]
						,[DueDate]
						,[ShipDate]
						, Dense_Rank() Over(Partition by [OrderDateKey] Order by SalesOrderNumber) As OrderNumber
				FROM	[dbo].[FactResellerSales] 
				Where	OrderDateKey =  @OrderDateKey
			) R
					Cross Apply
			(

				SELECT	TOP (@RowMultiplier) 
						A.[ResellerKey]
						, B.SalesTerritoryKey
						, Row_Number() Over(Order by Checksum(newid())) As RowNum
						, RAND(CHECKSUM(NEWID())) * 2 As QuantityMultiplier
				FROM	[DimResellerBig] A
							Inner join
						[dbo].[DimGeography] B
							on A.[GeographyKey] = B.GeographyKey
							Cross Join
						Master..spt_values C
				Where	C.Type = 'P'
							And
						C.Number Between 1 and @OrderCount
						and R.OrderNumber = C.number		
			) Y

		Print 'The records for the order date: ' + Cast(@OrderDateKey as nvarchar(8)) + ' has multiplied ' + Cast(@RowMultiplier as nvarchar(6)) + ' times';

		Delete Top(1) From	@OrderCountTable;

	End

Go

Create Clustered Index IX_FactResellerSalesBig_1 On [dbo].[FactResellerSalesBig] ([OrderDateKey] ASC, [ResellerKey] Asc);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_CurrencyKey] ON [dbo].[FactResellerSalesBig]
(
	[CurrencyKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_DueDateKey] ON [dbo].[FactResellerSalesBig]
(
	[DueDateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_EmployeeKey] ON [dbo].[FactResellerSalesBig]
(
	[EmployeeKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_ProductKey] ON [dbo].[FactResellerSalesBig]
(
	[ProductKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_PromotionKey] ON [dbo].[FactResellerSalesBig]
(
	[PromotionKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_ShipDateKey] ON [dbo].[FactResellerSalesBig]
(
	[ShipDateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_CarrierTrackingNumber] ON [dbo].[FactResellerSalesBig]
(
	[CarrierTrackingNumber] ASC
);


Create NonClustered Columnstore Index IX_FactResellerSalesBig_NCI On [dbo].[FactResellerSalesBig]
(
[ProductKey]
,[OrderDateKey]
,[DueDateKey]
,[ShipDateKey]
,[ResellerKey]
,[EmployeeKey]
,[PromotionKey]
,[CurrencyKey]
,[SalesTerritoryKey]
,[SalesOrderNumber]
,[SalesOrderLineNumber]
,[RevisionNumber]
,[OrderQuantity]
,[UnitPrice]
,[ExtendedAmount]
,[UnitPriceDiscountPct]
,[DiscountAmount]
,[ProductStandardCost]
,[TotalProductCost]
,[SalesAmount]
,[TaxAmt]
,[Freight]
,[CarrierTrackingNumber]
,[CustomerPONumber]
,[OrderDate]
,[DueDate]
,[ShipDate]
)

GO

 (Note: If you have problem to copy and paste the script into the SSMS, copy it and paste to Office Word first and then paste to SSMS.)

Now we have a bigger yet still meaningful by randomizing the reseller and quantity. The table size after 2 bigger tables are added in as below:

 

As you can see, now we have a big FactResellerSalesbig table with 10,953,900 rows and 2.6 GB data size. From next few of blogs, I'm going to dig into the columnstore index to see what make it different.

Link:

SQL Server 2012 Columnstore Index In Depth Part 2: Columnstore Index Size Calculation 

 

The Use of Denali 'With Result Sets' Feature in SSIS Data Flow Task

October 15, 2011 at 4:58 AMSteven Wang

The new T-SQL feature 'With Result Sets' in SQL Denali can be very useful for using the stored procedure as the data flow task data source in SSIS.

Before SQL Denali, there are some limitations for SSIS to use stored procedure as data flow task data source, like:

1. if you want the data source to use the different column names, you have to either change the stored procedure or use the derived column component, both are not convenient;
2. The same for the data type change, if you want to change the column data type you again have to either change the stored procedure or use the derived column component;
3. If your stored procedure is ended with dynamic SQL execution, like using Exec (@SQL) or sp_executeSQL  @SQL, the SSIS was not able the return the column. A common work-around is to define a table variable inside the stored procedure, and use the SQL execution to insert the data to the table variable and then select the data from it. This is very awkward and the performance is bad when the data set is big.

 
The new T-SQL feature 'With Result Sets' solves this problem.

The syntax of the With Result Sets is simple and just like below: (More information for the Execute T-SQL, see BOL. link is here)
 
Exec Stored_Proc_Name @Variables
With Result Sets
(
(
Col1 Data_type,
Col2 Data_type,
.....
List All Columns here
)
);
 
A simple example to demo this functionality as SSIS data flow task data source to solve the problem 3 stated in the above.

1. Create a simple SP
USE [AdventureWorksDWDenali]  --Denali sample DW database, free download
GO
CREATE Proc [dbo].[usp_UseWithResultSet_Dynamic_SQL_Denali]
As
Begin
Declare @SQL varchar(Max)

Set @SQL = '
SELECT [GeographyKey]
,[City]
,[StateProvinceName]
,[EnglishCountryRegionName]
FROM[dbo].[DimGeography]
'
Exec (@SQL);

End
GO

2. Create an SSIS data flow task and use exec  [dbo].[usp_UseWithResultSet_Dynamic_SQL_Denali] as data source (SQL Command) while you click preview, it gets error like below:

 

3. By using the With Result sets, you can rename the column name and change the data type (I changed column from nvarchar to varchar), and best of all, there is no error to use this SP as the data source. see the below:

 

4. Terminate with a simple Union All Component and run. It is successful:

Posted in: SQL Server 2012 | SSIS | T-SQL

Tags: , ,