SMAC: Social, Mobility, Analytics, and Cloud -- Bring Technology Together

August 19, 2013 at 4:30 AMSteven Wang
Future Technology
by Wipro.
Explore more infographics like this one on the web's largest information design community - Visually.

Posted in: Big Data

Tags:

SSRS 2012: Report Server cannot load the TERADATA/SQLPDW Extension Error

May 6, 2013 at 7:49 PMSteven Wang

 

In the SQL Server 2012 reporting service server event log, you might have these 2 errors:

1.       Report Server cannot load the SQLPDW Extension

2.       Report Server cannot load the TERADATA Extension

While you are not using the TERADATA/SQLPDW functionalities in your report, basically you can ignore these errors. This is because by default TERADATA and SQLPDW data extension is turned on in the reporting server configuration file.

Even you are not using the TERADATA/SQLPDW functionalities, it is still good idea to suppress the error logged into the event log.

Open the reporting service configuration file rsreportserver.config under your SSRS installation folder. If you installed SSRS 2012 by default, the file will be under:

C:\Program Files\Microsoft SQL Server\MSRS11.MSSQLSERVER\Reporting Services\ReportServer\

If you locate the Data Section you will see the below yellow highlighted data extension TERADATA and SQLPDW is turned on.

If you comment out these 2 extensions as below and save, those error messages will no longer be generated.

 

Posted in:

Tags: ,

SQL Server 2012 MDS Windows Server Minimal Requisites

April 22, 2013 at 6:49 PMSteven Wang

In order to make SQL Server 2012 MDS functioning properly under Windows Server 2008 R2, the minimum windows features must be met.

1.       1. Add Application Server role

The following role servers for Application Role need to be included:

·         .NET Framework 3.5.1

·         Web Server (IIS) support

·         HTTP Activation

As illustrated in the diagram below (ticked services):

 

1.       2. Add Web Server (IIS) role

Make sure that the yellow highlighted services are ticked.

1.     3.  Check additional windows features

Make sure the Features listed below (yellow highlighted) are installed:

Posted in: Master Data Service

Tags:

Script out Indexes and Index Option Settings for a SQL Server Database

December 11, 2012 at 2:58 PMSteven Wang

Sometimes, we need to script out the index definitions and its option settings. If you have a source control then you are lucky, as all index definitions should be kept in your source control system. If you want to just script out indexes for a table, you were also able to do so by selecting all indexes in the object explorer details. But if we have a query can easily script out the index definition and its settings then it will be much easier for testing or index creation automation.

In SQL server, the index definition and option settings are not stored in a single table, you have to join a few system tables or DMVs together to get all those information. Here I will share the script I used to script out all the indexes.

My consideration:

1. Only apply to the user defined indexes;

2. Clustered indexes created with Primary Key constraint is not included though it can be done.

3. Spatial Indexes and XML Indexes are not considered.

4. The Data compression setting for Partitioned indexes is simplified here as it is pretty complicated to be done just in a select statement. 

5. The Drop_Existing and Sort_In_TempDB setting is hard-coded in the script, which depending on your situation you can change it.

The script is as below:

Select		A.[object_id]
			, OBJECT_NAME(A.[object_id]) AS Table_Name
			, A.Index_ID
			, A.[Name] As Index_Name
			, CAST(
					Case	When A.type = 1 AND is_unique = 1 Then 'Create Unique Clustered Index ' 
							When A.type = 1 AND is_unique = 0 Then 'Create Clustered Index ' 
							When A.type = 2 AND is_unique = 1 Then 'Create Unique NonClustered Index '
							When A.type = 2 AND is_unique = 0 Then 'Create NonClustered Index '
					End
					+ quotename(A.[Name]) + ' On ' + quotename(S.name) + '.' + quotename(OBJECT_NAME(A.[object_id])) + ' ('
					+ Stuff(
							(
								Select
										',[' + COL_NAME(A.[object_id],C.column_id)
										+ Case When C.is_descending_key = 1 Then '] Desc' Else '] Asc' End
								From	sys.index_columns C WITH (NOLOCK)
								Where	A.[Object_ID] = C.object_id
										And A.Index_ID = C.Index_ID
										And C.is_included_column = 0
								Order by C.key_Ordinal Asc
								For XML Path('')
							)
							,1,1,'') + ') '

					+ CASE WHEN A.type = 1 THEN ''
							ELSE Coalesce('Include ('
									+ Stuff(
											(
												Select
														',' + QuoteName(COL_NAME(A.[object_id],C.column_id))
												From	sys.index_columns C WITH (NOLOCK)
												Where	A.[Object_ID] = C.object_id
														And A.Index_ID = C.Index_ID
														And C.is_included_column = 1
												Order by C.index_column_id Asc
												For XML Path('')
											)
											,1,1,'') + ') '
						,'') End
					+ Case When A.has_filter = 1 Then 'Where ' + A.filter_definition Else '' End
					+ ' With (Drop_Existing = OFF, SORT_IN_TEMPDB = ON'
					--when the same index exists you'd better to set the Drop_Existing = ON
					--SORT_IN_TEMPDB = ON is recommended but based on your own environment.
					+ ', Fillfactor = ' + Cast(Case When fill_factor = 0 Then 100 Else fill_factor End As varchar(3)) 
					+ Case When A.[is_padded] = 1 Then ', PAD_INDEX = ON' Else ', PAD_INDEX = OFF' END
					+ Case When D.[no_recompute] = 1 Then ', STATISTICS_NORECOMPUTE  = ON' Else ', STATISTICS_NORECOMPUTE  = OFF' End                  
					+ Case When A.[ignore_dup_key] = 1 Then ', IGNORE_DUP_KEY = ON' Else ', IGNORE_DUP_KEY = OFF' End
					+ Case When A.[ALLOW_ROW_LOCKS] = 1 Then ', ALLOW_ROW_LOCKS = ON' Else ', ALLOW_ROW_LOCKS = OFF' END
					+ Case When A.[ALLOW_PAGE_LOCKS] = 1 Then ', ALLOW_PAGE_LOCKS = ON' Else ', ALLOW_PAGE_LOCKS = OFF' End  
					+ Case When P.[data_compression] = 0 Then ', DATA_COMPRESSION = NONE' 
							When P.[data_compression] = 1 Then ', DATA_COMPRESSION = ROW' 
							Else ', DATA_COMPRESSION = PAGE' End                 
					+ ') On ' 
					+ Case when C.type = 'FG' THEN quotename(C.name) 
							ELSE quotename(C.name) + '(' + F.Partition_Column + ')' END + ';' --if it uses partition scheme then need partition column
			As nvarchar(Max)) As Index_Create_Statement
			, C.name AS FileGroupName
			, 'DROP INDEX ' + quotename(A.[Name]) + ' On ' + quotename(S.name) + '.' + quotename(OBJECT_NAME(A.[object_id])) + ';' AS Index_Drop_Statement
From		SYS.Indexes A WITH (NOLOCK)
				INNER JOIN
			sys.objects B WITH (NOLOCK)
				ON	A.object_id = B.object_id
				INNER JOIN 
			SYS.schemas S
				ON	B.schema_id = S.schema_id            
				INNER JOIN
			SYS.data_spaces C WITH (NOLOCK)
				ON	A.data_space_id =   C.data_space_id   
				INNER JOIN
			SYS.stats D WITH (NOLOCK)
				ON	A.object_id = D.object_id
					AND A.index_id = D.stats_id  
				Inner Join
			--The below code is to find out what data compression type was used by the index. If an index is not partitioned, it is easy as only one data compression
			--type can be used. If the index is partitioned, then each partition can be configued to use the different data compression. This is hard to generalize,
			--for simplicity, I just use the data compression type used most for the index partitions for all partitions. You can later rebuild the index partition to
			--the appropriate data compression type you want to use
			(
				select object_id, index_id, Data_Compression, ROW_NUMBER() Over(Partition By object_id, index_id Order by COUNT(*) Desc) As Main_Compression
				From sys.partitions WITH (NOLOCK)
				Group BY object_id, index_id, Data_Compression
			)	P
				ON	A.object_id = P.object_id
					AND A.index_id = P.index_id
					AND P.Main_Compression = 1
				Outer APPLY
			(
				SELECT	COL_NAME(A.object_id, E.column_id) AS Partition_Column
				From	sys.index_columns E WITH (NOLOCK)
				WHERE	E.object_id = A.object_id
						AND E.index_id = A.index_id
						AND E.partition_ordinal = 1
			) F     
Where		A.type IN (1,2) --clustered and nonclustered
			AND B.Type != 'S'
			AND is_primary_key = 0 --this is not for primary key constraint
			AND OBJECT_NAME(A.[object_id]) not like 'queue_messages_%'
			AND OBJECT_NAME(A.[object_id]) not like 'filestream_tombstone_%'
			AND OBJECT_NAME(A.[object_id]) not like 'sys%' --if you have index start with sys then remove it
OPTION (Recompile);

 

I have also attached a sql script file here:

Scriptout_Indexes.sql (4.95 kb)

 

 

 

Advanced Data Cleaning Techniques in SQL Server -- NZ BI User Group

November 9, 2012 at 10:07 PMSteven Wang

I will present at NZ BI User Group for the topic: Advanced Data Cleaning Techniques in SQL Server for NZ BI User Group at 20th November at Microsoft Auckland.

Abstract:

In real word, data sources are too often full of dirty data due to data entry errors and differences in data representation among data sources. The poor data quality has been very negatively impacting business intelligence and decision support systems. The data cleaning process is the paramount important phase of a business intelligence project.

In this session, some practical data cleaning techniques will be discussed, which include but not limited to the list below:

  • Data Profiling
  • SQL Server Window Functions
  • Script Transformation with Regular Expression
  • Fuzzy Grouping
  • Fuzzy Lookup
  • DQS Cleansing (Brief Introduction)
This is a 300 to 400 level session and some fuzzy logic internals will also be touched.
 
Attached the presentation files as below:
 
 

NZBI_User_Group_Data_Cleaning.zip (3.87 mb)

 

 

Posted in: Data Cleaning | DQS | SSIS

Tags:

Using SQL Server 2012 Window Functions to Solve Common T-SQL Challenges

September 10, 2012 at 1:55 AMSteven Wang

Thanks to all who attended my session at the TechED New Zealand 2012. I hope that you have enjoyed the session. I have attached the presentation slides deck and Demo code in this blog.

Session Abstract:

SQL Server 2012 has introduced several new window functions and enhanced support for window aggregate functions by introducing window order and frame clauses, support for offset functions. In this session, the presenter will apply these new functions to solve some most frequently asked questions in MSDN T-SQL forum. If you have ever been faced with a challenge of how to calculate moving average, identify gap between records, combine consecutive and/or overlapped records and calculate running totals etc., then this session is for you.

 

Session Video:

Using SQL Server 2012 Window Functions 

Session Slides:

DBI309_Using_Window_Functions.pptx (1.79 mb)

Demo Code and DB:

TechEdNZ2012.bak (5.27 mb)

TechEdNZ_2012_Demo.zip (258.64 kb)

 

 

An Odd Behaviour to Use SQL Server 2012 Profiler for Analysis Service From SSMS

July 19, 2012 at 7:15 PMSteven Wang

 Environment:

1. Windows server 2008 R2 enterprise edition

2. A new server with multiple SQL server 2012 Analysis Service instances installed.

3. SSMS is installed.

4. SQL Server 2012 database engine is not installed in the server.

5. SQL Server 2012 CU2 is applied

Problem:

When I tried to use the SQl Server Profile on SSMS from the new server to run a trace on any SQL server 2012 SSAS instance as below:

I got the error message below.

Indeed this error message is not new to SQL server 2012. Nikesh Mhatre had an article on MSDB blog for this issue. Please see this link: Unable to read trace definition file Microsoft SQL Server TraceDefinition n.n1.n2.xml

However, the problem I had is different even though the error message is similar.

I then used another server which has both SQL server database engine and SSAS installed, and I used the same way trying to connect to the server I mentioned above. I still got the same error message.

 

Solution:

The solution to this issue is indeed very simple. There are 2 solutions:

Solution 1: Instead launch SQL Server Profiler from SSMS, you launch the Profiler from Start, All Program, Microsoft SQL server 2012, Performance tools, then click SQL Server Profiler as depicted below. By launching the profiler from here, I have no problem to connect to the SSAS instance. 

 

 Solution 2: After I launched the Profiler from SSMS, I used the SQL Server 2012 SQL Server Profiler to connect to any SQL Server either SQL server 2008 R2 or SQL server 2012 by using database Engine as Server Type first as below. It is no problem.

After I connected to a Database engine trace, I was able to launch a new trace on the SSAS instance with no problem. It seems very odd.

 

I’m not sure if this is a bug on launching SQL server profiler from SSMS 2012 for SSAS instances or there is any specific permission needs to be set.

I hope this post helps if you have same problem. If you know the rooting cause of the problem, please let me know. thanks.

Intelligently Backup SQL Server Analysis Service Databases

May 3, 2012 at 9:50 PMSteven Wang

 There are normally 2 main approaches to backup SQL SSAS databases:

1. Backup SSAS Databases at a scheduled time

2. Backup SSAS databases whenever after a process is performed.

Although there is nothing wrong with these 2 approaches, they are both not the best. They are both not very easy to be managed, especially when there are lots of databases under same SSAS instance. A better way is to intelligently backup SSAS databases based on the data change. A job can be set up to check the data change for the SSAS databases at a scheduled interval and if an SSAS database has been detected a change either data or schema then it will be backed up; if it has nothing been changed then a new backup is not needed. In this post, I will use an SSIS package to intelligently manage the SSAS database backup. My solution is based on the logic below:

1. Only modified SSAS databases (Data or schema) will be backed up

2. A new added SSAS database will be always backed up

3. SSAS backups can be traced. The backup history will be kept.

4. backup file retention can be configured (how many backup file need to be kept?)

To start with, I need 2 SSAS maintenance tables to manage the backup solution. I use the below script for my tables: (there are some columns in this script is not specific for the backup purpose, but as they are related to the SSIS package, they will be kept.)

CREATE TABLE [dbo].[Database_Backup_Admin]
(
	[Database_Name] [nvarchar](128) NOT NULL,
	[Data_Modified] [datetime] NULL,
	[Backup_Date] [datetime] NULL,
	[DNS_Instance_Name] [nvarchar](128) NOT NULL,
	[Instance_Type] [nvarchar](50) NOT NULL,
	[Backup_file_to_Keep] [tinyint] NULL,
	[Is_Current] [bit] NULL,
	[Is_Backuped_Latest] [bit] NULL,
	[CacheWarm_Date] [datetime] NULL,
	[Is_CacheWarm_Current] [bit] NULL,
	[Is_CacheWarm_On] [bit] NULL,
	CONSTRAINT [pk_Database_BackUP_Admin] PRIMARY KEY CLUSTERED 
	(
		[Database_Name] ASC,
		[DNS_Instance_Name] ASC,
		[Instance_Type] ASC
	)
);

CREATE TABLE [dbo].[Database_Backup_History]
(
	[Database_Name] [nvarchar](128) NOT NULL,
	[Backup_Date] [datetime] NOT NULL,
	[DNS_Instance_Name] [nvarchar](128) NOT NULL,
	[Instance_Type] [nvarchar](50) NOT NULL,
	[Backup_file_Name] [nvarchar](128) NULL,
	[Backup_file_Size] [int] NULL,
	[Backup_start_Time] [datetime] NULL,
	[Backup_End_Time] [datetime] NULL,
	[Backup_Duration_Second] [int] NULL,
	CONSTRAINT [pk_Database_Backup_History] PRIMARY KEY CLUSTERED 
	(
		[Database_Name] ASC,
		[Backup_Date] ASC,
		[DNS_Instance_Name] ASC,
		[Instance_Type] ASC
	)
);

 

Once the tables are created, I created an SSIS package with the layout as below:

From the package task flow, the first 2 steps are to check if all the databases in our Database_Backup_Admin table are up to date. If a database was in the server but now is removed, then the Is_Current column of the table for the database will be set to 0.  If there is a new database added into the server, it will be loaded into the table. There is a slowly changing dimension data transformation used for for this check. The DBSchema_Catalogs DMV is used to check the SSAS databases in the server instance. The data flow task for the database check as below:

SSAS database data change can be verified by MDschema_cubes DMV. The last_data_update column of MDschema_cubes has a time stamp for the cube data change. The package will update the Data_Modified column for Database_Backup_Admin table based on the last_data_update column of MDschema_Cubes.

After the Data_Modified column is updated, a comparison between Data_Modified and Backup_date will be performed. If the backup_date is older than Data_Modified, then Is_Backuped_Latest will be marked as False, which suggests a new backup is needed.

The last for each loop container is used to backup SSAS database by using XMLA backup command. A logic is also used to remove the older backup files based on the configuration how many backup files to keep. The Scrip used to remove the old backup file:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Imports System.Windows.Forms

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
	Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

	Enum ScriptResults
		Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
		Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
	End Enum
	
    Public Class CompareFileLastWriteTime
        Implements IComparer
        Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements IComparer.Compare
            'this "Compare" code is bollowed from: http://geekswithblogs.net/ntsmith/archive/2006/08/17/88250.aspx
            Dim File1 As FileInfo
            Dim File2 As FileInfo

            File1 = DirectCast(x, FileInfo)
            File2 = DirectCast(y, FileInfo)

            Compare = DateTime.Compare(File2.LastWriteTime, File1.LastWriteTime)
        End Function
    End Class

	Public Sub Main()
		'
		' Add your code here

        Dim BackupDir As String = Dts.Variables("BackupDir").Value.ToString
        Dim DatabaseName As String = Dts.Variables("DatabaseName").Value.ToString
        Dim FilesToKeep As Integer = CInt(Dts.Variables("FilesToKeep").Value)
        Dim NewBackupFile = New FileInfo(BackupDir & Dts.Variables("BackupFileName").Value.ToString)
        Dim directory = New DirectoryInfo(BackupDir)
        Dim AllBackups = directory.GetFiles(DatabaseName & "_*")
        Dim filename As String = ""
        Dim oldesttime As Date = Now
        Dim FilesCount As Integer = 0

        Array.Sort(AllBackups, New CompareFileLastWriteTime)
   
        Dts.Variables("BackupEndTime").Value = Now

        For Each entry In AllBackups
            If oldesttime > entry.LastWriteTime Then
                oldesttime = entry.LastWriteTime
                filename = entry.FullName
                FilesCount += 1
                If FilesCount > FilesToKeep AndAlso filename <> "" Then
                    File.Delete(filename)
                    FilesCount -= 1
                End If
            End If
        Next

        Dts.Variables("BackupFileSize").Value = CInt(NewBackupFile.Length / 1024)

		Dts.TaskResult = ScriptResults.Success
	End Sub

End Class

 

The package has the below variables:

 For these variables, these variables need to be configured:

1. InstanceName

2. DatabaseName

3. BackupDir (Olap backup directory, please note that you need to replace the single \ with \\ if you set value through sql agent job)

4. TimeZone (As the SSAS DMVs record the times as the UTC time, so you need to configure this TimeZone to your local time, for instance, New Zealand is 12.)

5. BackupFilesToKeep (How many backup files need to be kept)

The BackupXMLA variable is an expression, which will be changed dynamically for each loop:

"<Backup xmlns=\"http://schemas.microsoft.com/analysisservices/2003/engine\">
  <Object>
    <DatabaseID>" + @[User::DatabaseName]  + "</DatabaseID>
  </Object>
  <File>" + @[User::BackupFileName]  + "</File>
</Backup>"

 This is the command to do the SSAS database backup.

 I have attached the package here.

Intelligent_OLAPDB_Backup_BICOE.dtsx (289.11 kb)

 

An SQL agent job can set up to perform the backup and remember to set the variables below to your own values:

 

Use 4 Different Ways to Monitor Database Log Size and its Usage

April 28, 2012 at 3:02 AMSteven Wang

In SQL server, there are often different ways to achieve the same goal. This is also true for monitoring the database log size and its usage.

1. DBCC SQLPERF(LOGSPACE)

Typically, we use DBCC SQLPERF(LOGSPACE) to monitor the log size and percent of space used for all databases. This script is very simple and straightforward to use. However, if you want to progmatically use the information output by the DBCC command, then there is a little bit work to do. The script below is one example which can give you the log space information programtically by using DBCC SQLPERF(LOGSPACE):

Declare @LogUsage Table (
							Database_Name sysname
							, Log_Size_MB Float
							, Log_Used_Percent Decimal(8,5)
							, Status tinyint
						);

Insert Into @LogUsage
Exec ('DBCC SQLPERF (LOGSPACE);');

Select	Database_Name
		, Log_Size_MB
		, Log_Size_MB * Log_Used_Percent / 100 As Log_Used_Size_MB
		, Log_Used_Percent
From	@LogUsage

2. sys.dm_os_performance_counters  

The second and less commonly used technique to monitor the log space usage is to use the sys.dm_os_performance_counters dynamic management view. This view keeps the records of the log file size and also data file size as counter name for all databases. As I only concern the log file size for this purpose, I use the script below to monitor the log space usage:

Select		Database_Name
			, [Log File(s) Size (KB)] / 1024.0 As Log_Size_MB
			, [Log File(s) Used Size (KB)] / 1024.0 As Log_Used_Size_MB
			,  ([Log File(s) Used Size (KB)] * 1.0 / [Log File(s) Size (KB)]) * 100 As Log_Used_Percent
From
(
select	instance_Name as Database_Name, counter_name, cntr_value as Log_Size
From	sys.dm_os_performance_counters
Where	counter_name in		(
								'Log File(s) Size (KB)'
								, 'Log File(s) Used Size (KB)'
							)     
			And
		instance_name not in	(
									'mssqlsystemresource'
									, '_Total'
								)  
) X
Pivot	(	Max(Log_Size)
			For Counter_Name 
			In
				(
					[Log File(s) Size (KB)]
					, [Log File(s) Used Size (KB)]
				)  
		) myPiv

If you only concern one database then you can limit the database name in instance_name. Although I calculated the log usage percent, it has indeed a count name 'Percent Log Used' for the sys.dm_os_performance_counters view, but it has no decimal points for the percent value. 

 3. sys.database_files

 The sys.database_files catalog view has information for data files and log file of the current refferenced database. This is different with the preceding 2 ways that can return back the log information for all databases.  As there is no space used information in this view, we need to use the fileproperty function to get the log spaced used value as below:

Select	DB_Name()
		, size / 128.0 As Log_Size_MB
		, FILEPROPERTY(name, 'spaceused') / 128.0 As Log_Used_Size_MB
		, (FILEPROPERTY(name, 'spaceused') * 1.0 / size) * 100 As Log_Used_Percent
From	sys.database_files
Where	type_desc = 'LOG'

Although this script is for log space usage only, it can also be used to monitoring data files size by taking off the where clause.

4. sys.dm_db_log_space_usage (SQL 2012 only)

This dynamic management view is not even documented in the BOL, but this is very simple view. As the view return back the log size in bytes, it may be more useful if converting the bytes to MBs as below:

Select	DB_Name(Database_id)
		, total_log_size_in_bytes / 1024 / 1024.0 As Log_Size_MB
		, used_log_space_in_bytes / 1024 / 1024.0 As Log_Used_Size_MB
		, used_log_space_in_percent AS Log_Used_Percent
From		sys.dm_db_log_space_usage

Given that, this view is so simple I believe it will be easily adopted. However, this view only returns the log space statistics for the current database only.

In SQL server 2012, there is indeed another DMV addded for the data file space as well which is: sys.dm_db_file_space_usage.

 

 

SQL Server 2012 Columnstore Index In Depth Part 3: the Myth of an Extra Column in sys.column_store_segments

April 22, 2012 at 4:47 AMSteven Wang

There 2 new system catalog views added into SQL server 2012 for columnstore indexes:

1) sys.column_store_segments

2) sys.column_store_dictionaries

(Note: there was another view: sys.column_store_index_stats before the RTM version, but it is no long available in the final release. Not sure why this is dropped.)

These 2 system views are very useful to find out more about an columnstore index in your database, like segments, datasize etc.

For both system catalog views, the BOL states that 'Contains a row for each column in a columnstore index', which means each column in a columnstore index will have a row in these system viewss. While this statement is true, the reverse is not always. There are some times that you might notice that there is an extra column in  sys.column_store_segments view. What is this extra column? let's go to find it out.

As the original FactResellerSales table is a small table, we can use this table to perform the test for this purpose. I will create a nonclustered columnstore index directly on this table for all the columns (27 columns)

Use [AdventureWorksDW2012]
Go

If Exists (	Select *
			From sys.indexes
			Where	type = 6
					And Object_id = Object_id('dbo.FactResellerSales')
			)
	Begin
		Declare @IndexName sysname
				, @SQL nvarchar(max)

		Select	@IndexName = Name
		From	sys.indexes
		Where	type = 6
				And Object_id = Object_id('dbo.FactResellerSales')
		Set @SQL = 'DROP INDEX ' + quotename(@IndexName) + ' ON [dbo].[FactResellerSales];'
		
		Exec (@SQL);

		--DROP INDEX [idx_FactResellerSales_CI] ON [dbo].[FactResellerSales];
	End
--columns

CREATE NONCLUSTERED COLUMNSTORE INDEX [idx_FactResellerSales_CI] 
ON [dbo].[FactResellerSales]
(
	[ProductKey],
	[OrderDateKey],
	[DueDateKey],
	[ShipDateKey],
	[ResellerKey],
	[EmployeeKey],
	[PromotionKey],
	[CurrencyKey],
	[SalesTerritoryKey],
	[SalesOrderNumber],
	[SalesOrderLineNumber],
	[RevisionNumber],
	[OrderQuantity],
	[UnitPrice],
	[ExtendedAmount],
	[UnitPriceDiscountPct],
	[DiscountAmount],
	[ProductStandardCost],
	[TotalProductCost],
	[SalesAmount],
	[TaxAmt],
	[Freight],
	[CarrierTrackingNumber],
	[CustomerPONumber],
	[OrderDate],
	[DueDate],
	[ShipDate]
);
GO

Select	S.*
From	sys.column_store_segments S
			Inner Join
		sys.Partitions P
			On	S.hobt_id = P.hobt_id
			Inner Join
		sys.indexes I
			On	i.object_id = P.object_id
				And I.index_id = P.index_id
Where	I.Object_id = Object_Id('FactResellerSales')
		And I.Type = 6

From the results we can see that there is exactly 27 columns in the sys.column_store_segments table. So far so good. now let's create an exctaly same structure table from FactResellerSales without any B-Tree indexes, and create a noclustered columnstore index on it:

Use [AdventureWorksDW2012]
Go
Select	*
Into	FactResellerSales_Heap
From	[dbo].[FactResellerSales];

CREATE NONCLUSTERED COLUMNSTORE INDEX [idx_FactResellerSales_Heap_CI] 
ON [dbo].[FactResellerSales_Heap]
(
	[ProductKey],
	[OrderDateKey],
	[DueDateKey],
	[ShipDateKey],
	[ResellerKey],
	[EmployeeKey],
	[PromotionKey],
	[CurrencyKey],
	[SalesTerritoryKey],
	[SalesOrderNumber],
	[SalesOrderLineNumber],
	[RevisionNumber],
	[OrderQuantity],
	[UnitPrice],
	[ExtendedAmount],
	[UnitPriceDiscountPct],
	[DiscountAmount],
	[ProductStandardCost],
	[TotalProductCost],
	[SalesAmount],
	[TaxAmt],
	[Freight],
	[CarrierTrackingNumber],
	[CustomerPONumber],
	[OrderDate],
	[DueDate],
	[ShipDate]
);
GO

Select	S.*
From	sys.column_store_segments S
			Inner Join
		sys.Partitions P
			On	S.hobt_id = P.hobt_id
			Inner Join
		sys.indexes I
			On	i.object_id = P.object_id
				And I.index_id = P.index_id
Where	I.Object_id = Object_Id('FactResellerSales_Heap')
		And I.Type = 6

This query returns back 28 rows and the maximun column_id value of sys.column_store_segments is 28. This 28th column in the sys.column_store_segments table is an extra column than the base table. Before we try to conclude what this column is, I'm going to do another test.  I will create an exctaly same structure table from FactResellerSales and create an non-unique clustered index on the new created table, and then create a noclustered columnstore index on it:

Use [AdventureWorksDW2012]
Go
Select	*
Into	FactResellerSales_Clustered_nonunique
From	[dbo].[FactResellerSales];

Create clustered index idx_FactResellerSales_Clustered_nonunique_1 on  [dbo].[FactResellerSales_Clustered_nonunique] ([OrderDateKey]);

CREATE NONCLUSTERED COLUMNSTORE INDEX [idx_FactResellerSales_Clustered_nonunique_CI] 
ON [dbo].[FactResellerSales_Clustered_nonunique]
(
	[ProductKey],
	[OrderDateKey],
	[DueDateKey],
	[ShipDateKey],
	[ResellerKey],
	[EmployeeKey],
	[PromotionKey],
	[CurrencyKey],
	[SalesTerritoryKey],
	[SalesOrderNumber],
	[SalesOrderLineNumber],
	[RevisionNumber],
	[OrderQuantity],
	[UnitPrice],
	[ExtendedAmount],
	[UnitPriceDiscountPct],
	[DiscountAmount],
	[ProductStandardCost],
	[TotalProductCost],
	[SalesAmount],
	[TaxAmt],
	[Freight],
	[CarrierTrackingNumber],
	[CustomerPONumber],
	[OrderDate],
	[DueDate],
	[ShipDate]
);
GO

Select	S.*
From	sys.column_store_segments S
			Inner Join
		sys.Partitions P
			On	S.hobt_id = P.hobt_id
			Inner Join
		sys.indexes I
			On	i.object_id = P.object_id
				And I.index_id = P.index_id
Where	I.Object_id = Object_Id('FactResellerSales_Clustered_nonunique')
		And I.Type = 6

Again the result returns 28 rows and the 28th column in the sys.column_store_segments table is an extra column than the base table.

Now we have a clear picture:

So why the nonclustered columnstore index on a heap table will have an extra column in sys.column_store_segments?

It seems that the nonclucstered columnstore indexes like a B-Tree nonclustered index need a RID (Row ID) to refer back to the base heap table.

For a table with a none unique colustered index, a 4 byte integer uniquifier is added into the table. We can examin this in three different ways.

If you use the old sysindexes table there is a keycnt column will tell you exactly how many fields are used to form a key for the index.

Select	'FactResellerSales_Clustered_Nonunique' As Table_Name
		, Keycnt as clustered_index_keys
From	sysindexes A, sys.indexes B
Where	A.id = Object_id ('FactResellerSales_Clustered_Nonunique')
		And A.id = b.object_id
		And A.indid = B.index_id
		And	B.Type = 6

This returns 2. as we only used ([OrderDateKey]) to create the idx_FactResellerSales_Clustered_nonunique_1 clustered index, which indicate that an unique identifier is added by system. The sysindexes is an SQL server 2000 system table, which is included as a view for backward compatibility only. Although it is still available now, it is better to avoid use this and this view merely tells the count of keys with no more further information.

There is a new undocumented system view: sys.system_internals_partition_columns, which has much detailed information about columns in a table partition. From this view, I can clearly get the aditional uniquifier column is added for this nonunique clustered index. Run the script below:

Select	Object_Name(B.object_id) As Table_Name
		, partition_column_id
		, Col_Name(B.object_id, partition_column_id) As Column_Name
		, C.Name AS Data_Type
		, Key_ordinal
		, Count(Key_Ordinal) Over() As Key_Count --Total number of keys for the clustered index
		, is_uniqueifier
From	sys.system_internals_partition_columns A
			Inner Join
		sys.partitions B
			On	A.partition_id = B.partition_id
			Inner Join
		sys.types C
			On	A.system_type_id = C.system_type_id
Where	B.object_id = object_id('FactResellerSales_Clustered_nonunique')
		And B.index_id = 1		--Clustered index
		And key_ordinal != 0	--Only interested in the Clustered index key columns

 

From the above result, we can see that a uniquifier is added and the data type is Integer and there is no column name associated with it.

We also can use the DBCC Page to delve into the physical data page to see the uniquifier is added in the page level.

Use [AdventureWorksDW2012]
Go

Select	P.database_id
		, P.allocated_page_file_id
		, P.allocated_page_page_id
		, P.allocation_unit_type_desc
From	sys.indexes I
			Cross Apply
		sys.dm_db_database_page_allocations(DB_ID(), I.object_id, I.index_id, Null, 'limited') P
Where 
		I.object_id = Object_id ('FactResellerSales_clustered_nonunique')
		And P.is_iam_page = 0
		And P.is_allocated = 1
		And P.is_mixed_page_allocation = 0
		And I.Type = 1  --Type 0 is heap
Go

DBCC traceon(3604, -1)
Go

DBCC page (
			7,				--Database ID
			1,				--Page File ID
			1033784,		--Page ID, check with your own pageid
			3				--Commnad output style
			)

The highlighted page output tells that a 4 byte integer uniquifier is added.

 

This finishes this post. Thanks for reading.

Previous Related posts:

1. SQL Server 2012 Columnstore Index In Depth Part 1: Make a bigger AdventureworksDW database for Testing

2. SQL Server 2012 Columnstore Index In Depth Part 2: Columnstore Index Physical Data and Size Calculation