SQL Server 2012 Columnstore Index In Depth Part 2: Columnstore Index Physical Data and Size Calculation

April 20, 2012 at 3:41 AMSteven Wang

I my previous blog SQL Server 2012 Columnstore Index In Depth Part 1: Make a bigger AdventureworksDW database for Testing, I have created a bigger FactResellerSalesBig table in AdventureworksDW2012 database and a nonclustered columnstore index IX_FactResellerSalesBig_NCI  is created. Now we can check how a columnstore index is stored and calculate the size of the nonclustered columnstore index as a whole or even an individual column within the columnstore index.

1. How does a Columnstore Index Store Data

A columnstore index is broken into one or multiple segments for each partition (when a table is not partitioned it has only one partition), which consist of multiple physical data pages. A full segment contains 1,048,576 rows. Based on the number of threads used to create a columnstore index, the ending segments group (normally the number of threads used) for a column in each partition might have different row number which is smaller than 1,048,576. If a base table only has less than 1,048,576 rows then only 1 segment per column will be created.

A columnstore index segment is internally stored as a BLOB data type. This can be verified by using DBCC Page command.

Before using DBCC Page command, we need to know the page ids allocated to the columnstore index we are going to examine. As we are using SQL server 2012, we now have 2 ways to get the page ids:

a) use DBCC IND

b) use the new added Dynamic management view in SQL server 2012: sys.dm_db_database_page_allocations

Although they are both undocumented, to use sys.dm_db_database_page_allocations has many advantages over DBCC IND. I am not going to discuss details about this DM view. Kalen Delaney has an article on SQLMag: SQL server's Undocumented Changes, which has a good explanation on how to use this DM view.

For this post, based on the table FactResellerSalesBig I created (refer to my last post), I will use the below query to get the page informations:

 

Use [AdventureWorksDW2012]
Go

Select	P.database_id
		, P.allocated_page_file_id
		, P.allocated_page_page_id
		, P.allocation_unit_type_desc
From	sys.indexes I
			Cross Apply
		sys.dm_db_database_page_allocations(DB_ID(), I.object_id, I.index_id, Null, 'limited') P
Where 
		I.object_id = Object_id ('FactResellerSalesBig')
		And P.is_iam_page = 0
		And P.is_allocated = 1
		And P.is_mixed_page_allocation = 0
		And I.Type = 6  --Type 6 is 'NONCLUSTERED COLUMNSTORE'
Go

 From the demo database AdventureworksDW2012 in my computer, I got the below results:

 From the result, we already can know that the data type used to store the columnstore index is BLOB data type. But we can further delve into the physical page level to see how data is stored.

DBCC page (
			7,				--Database ID
			1,				--Page File ID
			270800,			--Page ID
			3				--Commnad output style
			)

 

The highlighted information from the DBCC Page output has testified that the BLOB data is used to store columnstore indexes. 

 

2. Calculate the Size of A Nonclustered Columnstore Index

Based on the Columnstoreindex book online, We can calculate the size of the nonclustered columnstore index IX_FactResellerSalesBig_NCI as below:

SELECT SUM(on_disk_size_MB) AS TotalSizeInMB
FROM
(
   (SELECT SUM(css.on_disk_size)/(1024.0*1024.0) on_disk_size_MB
    FROM sys.indexes AS i
    JOIN sys.partitions AS p
        ON i.object_id = p.object_id 
    JOIN sys.column_store_segments AS css
        ON css.hobt_id = p.hobt_id
    WHERE i.object_id = object_id('FactResellerSalesBig') 
    AND i.type_desc = 'NONCLUSTERED COLUMNSTORE') 
  UNION ALL
   (SELECT SUM(csd.on_disk_size)/(1024.0*1024.0) on_disk_size_MB
    FROM sys.indexes AS i
    JOIN sys.partitions AS p
        ON i.object_id = p.object_id 
    JOIN sys.column_store_dictionaries AS csd
        ON csd.hobt_id = p.hobt_id
    WHERE i.object_id = object_id('FactResellerSalesBig') 
    AND i.type_desc = 'NONCLUSTERED COLUMNSTORE') 
) AS SegmentsPlusDictionary

The result of this query is: 242.34 MB. However, from the database data storage point of view, this calculation does not truly reflect the size of the index data pages in the database. The data size from both systerm catalog views are the BLOB data size which doesn't account for the metadata and page headers and other overheads for a page to hold the blob data.

Thanks to Eric Hanson, the Principal Progran Manger at Microsoft, he cleared my doubt and confirmed that my calculation below is accurate.

SELECT	SUM(s.used_page_count) / 128.0 on_disk_size_MB
FROM	sys.indexes AS i
		JOIN sys.dm_db_partition_stats AS S
			ON	i.object_id = S.object_id 
				and I.index_id = S.index_id
WHERE	i.object_id = object_id('FactResellerSalesBig') 
		AND i.type_desc = 'NONCLUSTERED COLUMNSTORE'

The result of this query is: 250.37 MB

As the size of base table FactResellerSalesBig table is 2626 MB, we can get that the columnstore index compression ratio is around 10 : 1.

3. Calculate the Individual Column Size of A Nonclustered Columnstore Index

As columnstore indexes store column data separately, we can calculate each compressed column data size for a columnstore index. The column size can be caculated by using the below script. This script was orginally written by Eric Hanson, the Principal Program Manager of Microsoft. I have made a modification to take care of the situation when not all the columns are included in a columnstore index.

-- size per column 
with segment_size_by_column as ( 
	SELECT 
			p.object_id as table_id, 
			css.column_id, 
			SUM (css.on_disk_size)/1024/1024.0 AS segment_size_mb 
	FROM	sys.partitions AS p 
				JOIN 
			sys.column_store_segments AS css 
				ON p.hobt_id = css.hobt_id 
	GROUP BY p.object_id, css.column_id 
), 
dictionary_size_by_column as ( 
	SELECT 
			p.object_id as table_id, 
			csd.column_id, 
			SUM (csd.on_disk_size)/1024/1024.0 AS dictionary_size_mb 
	FROM	sys.partitions AS p 
				JOIN 
			sys.column_store_dictionaries AS csd 
				ON p.hobt_id = csd.hobt_id 
	GROUP BY p.object_id, csd.column_id 
) 

--As in the real world, not all the columns in a table will be or can be included in a nonclustered columnstore index, we need to join
--to the sys.index_columns to get the correct column id.
Select 
		Object_Name(s.table_id) as table_name, 
		C.column_id,	
		col_name(S.table_id, C.column_id) as column_name, 
		s.segment_size_mb, 
		d.dictionary_size_mb, 
		s.segment_size_mb + isnull(d.dictionary_size_mb, 0) total_size_mb 
from	segment_size_by_column s 
			join 
		sys.indexes I -- Join to sys.Indexes system table
			ON I.object_id = s.table_id 
			join 
		sys.index_columns c --Join to sys.Index_columns
			ON	c.object_id = s.table_id 
				And I.index_id = C.index_id
				and c.index_column_Id = s.column_id --Need to join to the index_column_id with the column_id from segment_size_by_column
			left outer join 
		dictionary_size_by_column d 
			on	s.table_id = d.table_id 
				and s.column_id = d.column_id 
Where	I.type_desc =  'NONCLUSTERED COLUMNSTORE'
order by total_size_mb desc

  (Note: If you have problem to copy and paste the script into the SSMS, copy it and paste to Office Word first and then paste to SSMS.)

 

Thanks for reading.

 

 

xVelocity Memory Optimized Columnstore indexes: A Gem of SQL Server 2012

April 18, 2012 at 5:06 AMSteven Wang

I was presenting at Auckland SQL User Group for the titled session.

Abstract:

The columnstar indexes, formerly known as project "Apollo", are new future of SQL server 2012. Columnstore indexes deliver the blazing-fast query performance, particularly for the star-joining queries. It is not surprised to notice a query performance increase up to 100 times faster. In this session, Steven will take you through the nitty-gritty and internals of columnstore indexes; he will also talk though the columnstore index do's and don'ts, limitations, tips and tricks, and the possible impact on the future BI solution considerations.

Thanks for all attendants!

I have attached the slides deck and SQL code in this blog.

 

ColumnstoreIndex_SQLUserGroup.pptx (1.46 mb)

SQL_Scripts.zip (15.09 kb)

SQL Server 2012 Columnstore Index In Depth Part 1: Make a bigger AdventureworksDW database for Testing

April 12, 2012 at 4:00 AMSteven Wang

The new xVelocity memory optimized columnstore index feature in SQL server 2012 are designed to deliver blazing-fast performance for large data warehouse quries. But I believe that for many people who are interested in this fantastic new feature might have a problem to find a big sample data warehouse database to try the new columnstore features, like batch-mode processing, segment elimination, the size of memory grant etc.

The new sample data warehouse database: AdventureworksDW2012, which you can download from msftdbprodsamples.codeplex.com is not big enough for us to experience many features of columnstore index. We need to make it bigger to explore the columnstore index features.

Once the AdventureworksDW2012 Data file is downloaded, use the below script to attach the database:

USE [master]
GO
Declare @Filepath nvarchar(128)
Declare @SQL nvarchar(max)

Set	@Filepath = 'H:\Applications\database\SQL 2012\SampleDBs\adventureworks\' 
				--Change the path to your file path
				--Where your downloaded AdventureWorksDW2012_Data.mdf is located

Set @Filepath = IIF(Right(@Filepath,1) = '\', @Filepath, @Filepath + '\') + 'AdventureWorksDW2012_Data.mdf'

Set	@SQL = 'CREATE DATABASE [AdventureWorksDW2012] ON ( FILENAME = ''' + @Filepath + ''') FOR ATTACH'

Exec (@SQL)
GO

 

THe below diagram shows the table size of the AdventureworksDW2012 sample database: 

The biggest table in AdventureworksDW2012 is FactProductInventory with 776286 rows. The table is too narrow to be used for columnstore index testing. and the row size below 1 million is also too small.

A good candidate for creating the nonclustered columnstore index on will be FactResellerSales. But this table has only got 60,855 rows. In order to get a good performance comparison between the B-Tree index and columnstore index and other columnstore features , a table with about 100 millions of rows will be ideal. However, in most case, we might don't have such capacity in our laptop or home computer. I decide to pump up this table to over 10 millions of rows as this size will be suitable for most of users and it is also good enough for us to experience the most of columnstore indexes.

First of all, we don't want to simply duplicate the existing rows as the repeating data in the table will skew the view of the query performance we get from the columnstore index. By analyzing the FActResellserSales table, the easiest way is to bigger the DimReseller table as a particualar salesordernumber in the table is bound to a reseller. I will repeat the each salesorder line (salesorder number + salesorderlinenumber is the primary key of FactSalestable) 180 times with random picked reseller and randomized order quantity to pump up the table to 10,953,900 rows.

As originally the DimReseller has only 701 rows, I will first make a bigger DimReseller table. The below script will make a DimResellerBig table which is 50 times bigger than DimReseller table:

USE [AdventureworksDW2012]
GO
--Create a bigger table for DimReseller
SELECT 
		Cast(Row_Number() Over(Order by (Select 0)) as int) As [ResellerKey]
		, A.[GeographyKey]
		, Cast('AW' + Format(Row_Number() Over(Order by (Select 0)), '0000000000') as Nvarchar(15)) As [ResellerAlternateKey]
		, A.[Phone]
		, A.[BusinessType]
		, Cast(A.[ResellerName] +  Format(Row_Number() Over(Order by (Select 0)), ' 0000000000') As nvarchar(50)) As ResellerName
		, A.[NumberEmployees]
		, A.[OrderFrequency]
		, A.[OrderMonth]
		, A.[FirstOrderYear]
		, A.[LastOrderYear]
		, A.[ProductLine]
		, A.[AddressLine1]
		, A.[AddressLine2]
		, A.[AnnualSales]
		, A.[BankName]
		, A.[MinPaymentType]
		, A.[MinPaymentAmount]
		, A.[AnnualRevenue]
		, A.[YearOpened]
Into	DimResellerBig
FROM	[dbo].[DimReseller] A
			Cross Join
		Master..spt_values B
Where	B.Type = 'P'
			And
		B.Number Between 1 and 50
GO


--Add indexes and constraints for DimResellerBig
Alter Table dbo.DimResellerBig Alter Column [ResellerKey] Int Not Null;

ALTER TABLE [dbo].[DimResellerBig] ADD  CONSTRAINT [PK_DimResellerBIG_ResellerKey] PRIMARY KEY CLUSTERED 
(
	[ResellerKey] ASC
);

ALTER TABLE [dbo].[DimResellerBig] ADD  CONSTRAINT [AK_DimResellerBig_ResellerAlternateKey] UNIQUE NONCLUSTERED 
(
	[ResellerAlternateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_DimResellerBig_GeographyKey] ON [dbo].[DimResellerBig]
(
	[GeographyKey] ASC
);

GO

 

Once I have the DimResellerBig table, I can use the below script to make my big FactResellerSalesBig table:

USE [AdventureWorksDW2012]
GO
--The script is used to make a bigger FactResellerSales table, called FactResellerSalesBig.
--When a RowMultiplier 180 is used the FactResellerSalesBig table will be 10,953,900 rows
--The query takes 25 minutes to run in my laptop with a 4-core processor, a 7200 rpm hard disk and 8 GB RAM


Declare		@OrderCountTable Table (OrderDateKey INT, OrderCount Int)
Declare		@OrderDateKey Int = 2012
			, @OrderCount int
			, @RowMultiplier int = 180	--180 times more rows than the original FactResellerSales Table
										--you can make it bigger or smaller by change this number

Insert Into	@OrderCountTable
SELECT		Distinct [OrderDateKey]
			, Count(Distinct SalesOrderNumber) As OrderCount
FROM		[dbo].[FactResellerSales]
Group by	[OrderDateKey];


If Object_ID('[dbo].[FactResellerSalesBig]') Is Not Null
		Drop Table [dbo].[FactResellerSalesBig];

Select	Top(0) *
Into	FactResellerSalesBig
From	[dbo].[FactResellerSales];

While Exists (Select * From @OrderCountTable)
	Begin
		Select	Top(1) 
				@OrderDateKey = OrderDateKey
				,  @OrderCount =  OrderCount
		From	@OrderCountTable;

		Insert into FactResellerSalesBig with(Tablock)
		Select
				R.[ProductKey]
				, R.[OrderDateKey]
				, R.[DueDateKey]
				, R.[ShipDateKey]
				, Y.[ResellerKey]
				, R.[EmployeeKey]
				, R.[PromotionKey]
				, R.[CurrencyKey]
				, Y.[SalesTerritoryKey]
				, Cast(R.[SalesOrderNumber] + Format(Y.RowNum, '000') AS nvarchar(20)) As SalesOrderNumber
				, R.[SalesOrderLineNumber]
				, R.[RevisionNumber]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) As OrderQuantity
				, R.[UnitPrice]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] As ExtendedAmount
				, R.[UnitPriceDiscountPct]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * R.[UnitPriceDiscountPct] As DiscountAmount
				, R.[ProductStandardCost]
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) *  R.[ProductStandardCost] As TotalProductCost
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * (1 - R.[UnitPriceDiscountPct]) As SalesAmount
				, Ceiling(R.[OrderQuantity] * Y.QuantityMultiplier) * R.[UnitPrice] * (1 - R.[UnitPriceDiscountPct]) * 0.08 As TaxAmt
				, R.[Freight]
				, Cast(R.[CarrierTrackingNumber] +  Format(Y.RowNum, '-000') As nvarchar(25)) As CarrierTrackingNumber
				, Cast(R.[CustomerPONumber] + Format(Y.RowNum, '000') AS nvarchar(25)) As CustomerPONumber
				, R.[OrderDate]
				, R.[DueDate]
				, R.[ShipDate]
		From
			(
				SELECT 
						[ProductKey]
						,[OrderDateKey]
						,[DueDateKey]
						,[ShipDateKey]
						,[ResellerKey]
						,[EmployeeKey]
						,[PromotionKey]
						,[CurrencyKey]
						,[SalesTerritoryKey]
						,[SalesOrderNumber]
						,[SalesOrderLineNumber]
						,[RevisionNumber]
						,[OrderQuantity]
						,[UnitPrice]
						,[ExtendedAmount]
						,[UnitPriceDiscountPct]
						,[DiscountAmount]
						,[ProductStandardCost]
						,[TotalProductCost]
						,[SalesAmount]
						,[TaxAmt]
						,[Freight]
						,[CarrierTrackingNumber]
						,[CustomerPONumber]
						,[OrderDate]
						,[DueDate]
						,[ShipDate]
						, Dense_Rank() Over(Partition by [OrderDateKey] Order by SalesOrderNumber) As OrderNumber
				FROM	[dbo].[FactResellerSales] 
				Where	OrderDateKey =  @OrderDateKey
			) R
					Cross Apply
			(

				SELECT	TOP (@RowMultiplier) 
						A.[ResellerKey]
						, B.SalesTerritoryKey
						, Row_Number() Over(Order by Checksum(newid())) As RowNum
						, RAND(CHECKSUM(NEWID())) * 2 As QuantityMultiplier
				FROM	[DimResellerBig] A
							Inner join
						[dbo].[DimGeography] B
							on A.[GeographyKey] = B.GeographyKey
							Cross Join
						Master..spt_values C
				Where	C.Type = 'P'
							And
						C.Number Between 1 and @OrderCount
						and R.OrderNumber = C.number		
			) Y

		Print 'The records for the order date: ' + Cast(@OrderDateKey as nvarchar(8)) + ' has multiplied ' + Cast(@RowMultiplier as nvarchar(6)) + ' times';

		Delete Top(1) From	@OrderCountTable;

	End

Go

Create Clustered Index IX_FactResellerSalesBig_1 On [dbo].[FactResellerSalesBig] ([OrderDateKey] ASC, [ResellerKey] Asc);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_CurrencyKey] ON [dbo].[FactResellerSalesBig]
(
	[CurrencyKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_DueDateKey] ON [dbo].[FactResellerSalesBig]
(
	[DueDateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_EmployeeKey] ON [dbo].[FactResellerSalesBig]
(
	[EmployeeKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_ProductKey] ON [dbo].[FactResellerSalesBig]
(
	[ProductKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_PromotionKey] ON [dbo].[FactResellerSalesBig]
(
	[PromotionKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_ShipDateKey] ON [dbo].[FactResellerSalesBig]
(
	[ShipDateKey] ASC
);

CREATE NONCLUSTERED INDEX [IX_FactResellerSalesBig_CarrierTrackingNumber] ON [dbo].[FactResellerSalesBig]
(
	[CarrierTrackingNumber] ASC
);


Create NonClustered Columnstore Index IX_FactResellerSalesBig_NCI On [dbo].[FactResellerSalesBig]
(
[ProductKey]
,[OrderDateKey]
,[DueDateKey]
,[ShipDateKey]
,[ResellerKey]
,[EmployeeKey]
,[PromotionKey]
,[CurrencyKey]
,[SalesTerritoryKey]
,[SalesOrderNumber]
,[SalesOrderLineNumber]
,[RevisionNumber]
,[OrderQuantity]
,[UnitPrice]
,[ExtendedAmount]
,[UnitPriceDiscountPct]
,[DiscountAmount]
,[ProductStandardCost]
,[TotalProductCost]
,[SalesAmount]
,[TaxAmt]
,[Freight]
,[CarrierTrackingNumber]
,[CustomerPONumber]
,[OrderDate]
,[DueDate]
,[ShipDate]
)

GO

 (Note: If you have problem to copy and paste the script into the SSMS, copy it and paste to Office Word first and then paste to SSMS.)

Now we have a bigger yet still meaningful by randomizing the reseller and quantity. The table size after 2 bigger tables are added in as below:

 

As you can see, now we have a big FactResellerSalesbig table with 10,953,900 rows and 2.6 GB data size. From next few of blogs, I'm going to dig into the columnstore index to see what make it different.

Link:

SQL Server 2012 Columnstore Index In Depth Part 2: Columnstore Index Size Calculation 

 

Automatically Transfer and Synchronize SQL Server logins, SIDs, Passwords and Permissions

April 5, 2012 at 3:54 AMSteven Wang

Subject: to transfer and synchronize SQL server logins, SIDs, SQL Login Passwords and permissions from one server instance to another by using SSIS package

This technical applies to: SQL server migration, Primary server and second server login synchronization including SIDs, Passwords and permissions transfer. SQL server 2008 and later is applicable.

  

1. Real world Issues

It is too often when a principal server failover to a mirror server, the SQL server logins are found is not totally in sync with the principal server. This is same to the log-shipping servers, replicated sites and other DR scenarios like Backup/restore databases from one server to another.

As I mentioned in my last post: A simple Way to Migrate SQL Logins from One Instance to Another, it is not convenient to transfer server logins from one server to another. 

2. Login Synchronization Consideration

It will be nice if we can have an automated process or script to transfer logins from one server to another including SIDs, passwords and permissions.

Typically, to transfer sql server logins from one server to another automatically, the following needs to be considered:

A) Login SIDs (either windows logins or SQL logins) in Destination server but not in source, these logins in destination server need to be dropped

B) SQL logins have different passwords between Source server and destination server, the SQL Logins in destination server needs to be altered

C) Login names in Destination server not in source server need to be altered with same name as in source

D) Logins in Source server but not in destination server, these logins in source server need to be synchronized from source to destination server.

E) Logins with different Is_Disabled value between source and destination. The destination logins need to be altered.

F) Logins’ server role assignments need to be synchronized from source to destination.

G) Login permissions need to be synchronized from source to destination.

H) There are others need to be considered like default language etc. but the above mentioned are most important ones.

I have transformed these requirements into the SSIS tasks as below:

A) Drop logins in destination where SIDs are different

B) Drop logins in destination where passwords are different (Drop is simpler than alter.)

C) Drop logins in destination where names are different (Drop is simpler than alter)

D) Check and Sync Logins

E) Alter logins in destination where Is_Disabled are different

F) Check and Sync role members

G) Check and sync permissions

The SSIS package lay out as below:

 3. SSIS Package design

The package contains 7 containers and each container has contained 3 tasks: a Data flow task, A ForEach Loop task, and a Execute SQL script task.

The general idea is to use the data flow task to generate the SQL script and store the query text in a recordset destination which assigns to an object variable: Query_Dataset; the ForEach task use an ADO enumerator to loop through the Query script records in the Query_Dataset variable and assign it to a string variable:SQL_Query; the Execute SQl script task runs the string query variable to apply the change.

The variables have been setup as below:

 A) Drop logins in destination where SIDs are different Task Design

The data flow task: Get login with SIDs diff task has the below design:

This is use merge join to find out any SIDs in Destination but not in Source. The conditional split is to split out any NULL SID value from the Login source in the merge Join component.

The derived column component is used to generate the DDL query like: “Drop Login MyLoginName;”

The query is stored in the Query_dataset where Query_Dataset variable is configured.

 

B) Drop logins in destination where passwords are different task design

The data flow task: Get Login with Password diff task design is depicted as below:

C) Drop logins in destination where names are different Task Design

A Login with same SID in Source and destination server could have different name if Login name in source server is altered. Therefore, these logins need to be synced to destination. To simplify the process, I simply drop those logins in destination and reimport from the source again later.

The data flow task: Get login with Name diff is depicted as below:

 

D) Check and Sync Logins task design

In this task I will use the SQL script I have created in my last post to generate the Create Login DDL queries.

The query will script out the source login SIDs and also passwords if they are SQL logins.

The data flow task: get server login diff task is depicted as below:

 

The script will script out both windows logins and SQL logins but will leave out the R = Server role, C = Login mapped to a certificate, and K = Login mapped to an asymmetric key. As in SQL server 2008, server role is fixed. The C and K are more complicated and I just want to keep things simple here.

The script in Login source is as below:

Select Upper(A.name) As Name,
Case When A.type IN ('G', 'U')
Then
'Create Login ' + QUOTENAME(A.name) 
+ ' From Windows With DEFAULT_DATABASE=[Master]' --+ quotename(A.default_database_Name)
+ ', DEFAULT_LANGUAGE=' + quotename(A.default_language_Name)
When A.type IN ('S')
Then
'Create Login ' + QUOTENAME(A.name) 
+ ' With Password=' + CONVERT(varchar(max), B.password_hash, 1) + ' hashed'
+ ', DEFAULT_DATABASE=[Master]' --+ quotename(B.default_database_Name)
+ ', DEFAULT_LANGUAGE=' + quotename(B.default_language_Name)
+ ', CHECK_POLICY=' + Case B.is_policy_checked when 0 then 'OFF' When 1 Then 'On' End 
+ ', CHECK_EXPIRATION=' + Case B.is_expiration_checked when 0 then 'OFF' When 1 Then 'On' End
+ ', SID=' + CONVERT(varchar(max), A.SID, 1)
End As SRVLogin
From sys.server_principals A
Left Outer Join
sys.sql_logins B
On A.name = B.Name
Where A.type  Not In ('R', 'C', 'K')
And A.name Not like '##%##'
And A.sid != 0x01

 E) Alter logins in destination where Is_Disabled are different

This task is used to keep the is_disabled value synced between source and destination. The data flow task: get login with IsDisable diff is depicted as below:

F) Check and Sync role members task design

This task is used to sync the server role members between source and destination. I get both role name and its members from source and destination servers. A full join component is used to find out which role members for a server role is in source but not in destination and vice versa. For those logins in a server role in source but not in destination, then an sp_addsrvrolemember script will be created. In contrast, for those logins in a server role in destination but not in source, an sp_dropsrvrolemember script will be created.

Both scripts will be combined and save into the Query_Dataset variable. The data flow task: Check role Member task is depicted as below:

 

G) Check and sync permissions task design

This task is used to sync the login permissions from source to destination. This query is used to script out the login permissions:

Select distinct
A.name
, B.Permission_Name
, B.state
, B.state_desc
From sys.server_principals A
Inner Join
sys.server_permissions B
On 
A.principal_id = B.grantee_principal_id
Where A.type  Not In ('R', 'C', 'K')
And A.name Not like '##%##'
And A.sid != 0x01

A full join is used to find out which permissions are applied in source for a login but not in destination and vice versa.

For those permissions are applied in source for a login but not in destination then those logins will go to the Permission_Applied route. And the below expression is used to create the permission grant script:

(DT_WSTR,2000)([state_desc source] + " " + [Permission_Name source] + " TO [" + [name source] + ([state source] == "W" ? "] WITH GRANT OPTION;" : "];"))

For those permissions are applied in destination for a login but not in source then those logins will go to the Permission_Removed route. And the below expression is used to create the permission grant script:

(DT_WSTR,2000)("REVOKE " + [Permission_Name dest] + " TO [" + [name dest] + "];")

The data flow task: Check Permissions task is depicted as below:

 

4. SSIS Package Automation

An SQL agent Integration service job can be used to automate the Login synchronization process. The procedure is to create a job to use the package. 

For example, a job name can be: Apply Login Changes from Source to Destination.

And Remember to set your source and destination server variable value to your server names like the below:

 

Once the job is set up you can schedule the job to run automatically in your preferred time intervals.

You can also create a server level DDL trigger to automatically run the job whenever an DDL command is performed on logins. An example of the trigger like:

Create Trigger trigger_ApplyLoginsToDestination
On All Server
After Create_Login, Alter_Login, Drop_Login
As
Exec msdb..sp_start_job 'Apply Login Changes from Source to Destination' --change to your own job name
Print 'You have synced your login changes from source to destination server.'

You can download this package from the attachment in this post.

Thanks for reading.

 

SQLServer_Login_Sync_Blog.zip (47.20 kb)

A simple Way to Migrate SQL Logins from One Instance to Another

March 18, 2012 at 7:39 PMSteven Wang

Subject: Trasfer SQL Logins with original passwords and SIDs

Scope: the technical applies to SQL server 2008 and later.


When we upgrade or migrate an SQL server instance to a new instance, if we have lots of SQL logins the migration process will get bogged down in 2 thorny issues:

  • SQL Login SIDs
  • SQL Login passwords

 

SSIS Transfer Logins Task can easily transfer the windows logins and groups but not for SQL logins. When using SSIS Transfer Logins Task to transfer the SQL logins it will generate new passwords and SIDs for the transfered SQL logins and these logins are disabled in the destination SQL server instance. This is not very useful for the real work.

 

When an SQL login is created in the new SQL server instance, a new SID is created and bound to the Login. In this case, the database user in the migrated database is no longer mapped into the new created SQL login as they are using different SID now.

Generally speaking, the SQL user can be remapped by using Alter User with login = [login_Name] or sp_change_users_login (deprecated SP, better to use Alter User).  However, when there are lots of SQL logins this technical becomes cumbersome. Particularly, for some environment, a production database might frequently be restored to a test environment then you need to fix the SQL user mapping all the time.

For the SQL login password, it is also every awkward for us to retype the password by creating a new SQL login in the new server instance.

Although the SQL login SIDs and password are stored in the old instance master database, before SQL server 2008 there is no an easy way to script out this binary information and transfer to a new instance. There are a few very intelligent techniques available in the SQL community and mainly it will use a complicated function to convert the binary data into string information and then convert it back to binary data in the destination instance. But in SQL server 2008 or later, these techniques became kinds of overkill.

In SQL server 2008, the COVERT function has added a new feature to convert binary data type to other data type. (check the book online for more details) The syntax as below:

CONVERT ( data_type [ ( length ) ] , expression [ , style ] )

Example: 

CONVERT(varchar(max), 0xE4305FD31D353546B4EC6E56C906E912, 1)

 

When Expression is binary(n) or varbinary(n), 3 style options are available 0 (default), 1, and 2.

When use 0 (default), binary bytes will be translated to ASCII characters. Each byte is converted 1:1.

When use 1, binary bytes will be translated to character string and the characters 0x will be added to the left of the converted result

When use 2, binary bytes will be translated to character string and 0x prefix will not be used.

SQL login SIDs and password_hash are both use 0x prefix, so we can use the binary style 1 to script out the SIDs and Hashed password from the old server instance and apply to the new instance.

The code to script out the SQL logins with the convert function becomes very simple, an example script as below:

Select 
'Create Login ' + QUOTENAME(A.name) 
+ ' With Password=' + CONVERT(varchar(max), A.password_hash, 1) + ' hashed'		--script out the passwrod
+ ', DEFAULT_DATABASE=' + quotename(A.default_database_Name) --if you apply the scrip to a mirroed server or log-shipped server user master database, as user database is not in useable state
+ ', DEFAULT_LANGUAGE=' + quotename(A.default_language_Name)
+ ', CHECK_POLICY=' + Case A.is_policy_checked when 0 then 'OFF' When 1 Then 'On' End 
+ ', CHECK_EXPIRATION=' + Case A.is_expiration_checked when 0 then 'OFF' When 1 Then 'On' End
+ ', SID=' + CONVERT(varchar(max), A.SID, 1)		--script out the SIDs
 As SQLLogin
From 
sys.sql_logins A
Where A.name Not like '##%##'  --remove those system generated sql logins
And A.sid != 0x01 --SA sid is always same

 (Note: If you have problem to copy and paste the script into the SSMS, copy it and paste to Office Word first and then paste to SSMS.)

You can use this scrip to store the SQL Login creation script to a table in the target server and use a cursor or loop to execute these scripts to transfer the SQL logins.

In the next blog, I will talk more depth on how to use an SSIS package to transfer the Logins and SQL server permissions from one server to another.

The Problem of '##xp_cmdshell_proxy_account##' credential could not be created

December 27, 2011 at 5:59 PMSteven Wang

When a user that is not a member of the sysadmin fixed server role tries to use xp_cmeshell command, we need to set up a credential ##xp_cmdshell_proxy_account##. If this proxy credential does not exist, xp_cmdshell will fail.

We use this system stored procedure to set up the credential:

EXEC sp_xp_cmdshell_proxy_account 'DomainName\AccountName', 'Password';

While you are trying to set up the XP_CMDShell proxy account in SQL SSMS on Windows Server 2008 R2, you might get the error message below:

 Msg 15137, Level 16, State 1, Procedure sp_xp_cmdshell_proxy_account, Line 1

An error occurred during the execution of sp_xp_cmdshell_proxy_account. Possible reasons: the provided account was invalid or the '##xp_cmdshell_proxy_account##' credential could not be created. Error code: '5'.

This error message indeed didn't tell you what is the cause of the problem, as you might use the valid windows account and password but you were still not able to create the credential.

From what I experienced, the problem comes from the windows server UAC. It is because the user account you are running SSMS doesn't have the permission to create the credential, you need to run SSMS as Administrator as below:

After SSMS is running at as administrator, I rerun the same stored procedure. it is successful. I hope this helps you as well.

Posted in: Database Administration

Tags: ,

Bringing ETL to the Masses - Microsoft Codename "Data Explorer"

October 25, 2011 at 4:32 AMSteven Wang

One of the new projects Microsoft announced in this year's PASS summit, Data Explorer, is very interesting and possibly will have a big impact on business user and information workers' way to perform BI analysis.

According to the project lead, Tim Mallalieu, the project was named as Montego (which sounds more like a usual codename.). While in the PASS summit, the cloud version of the project was presented, according to Tim that they do have a client version of Data Explorer, which can run as an Excel add-in, pretty much operates at the similar way as you do with Powerpivot. And from the first look of the interface, it was used the name Powerimport. (Please note the diagrams used here are from early Montego project, not the new Data Explorer interface. Wait for the November 2011, maybe there will be a new demo available.)

 

 

After I looked at the Tim's workthough on the Montego client - the mashup component of Data Explorer, I was very impressed with this project. Basically this excel add-in can integrate different data sources, from Cloud Databases, RSS feeds, websites, excel workbooks, SQL databases etc. You can do the data transformations on the data source (table). The ribbon style of the transformation tools are very easy to understand and use:


The workflow and the concept look very easy to catch for the experienced Execl users. The data sources (tables) can be easily merged, nested, transposed and pivoted:

Data manipulations, De-duplication, Grouping, Aggregations, can be performed:

 

Once you complete this data mashup, it can be pushed back to Excel exactly similar way as what you do for the Powerpivot.

To me, the whole Montego mashup process likes a very simplified and easy to use yet very powerful version of an SSIS data flow task. While to effectively use the SSIS needs a steep learning curve, this mashup is very easy to use and understand.

According to Tim, this mashup can be reused and automatically refreshed from the data sources. They are also working on the integration of the tool with the Powerpivot and the new announced Power View (Previously known as Crescent).

Given that the Excel is still the most popular End user BI tool, I believe the new tool will Bring ETL to the Masses!

Looking forward to the trial version.

You may take some time to watch Tim's Demo video: Early walkthrough of the Montego client

Posted in: Data Explorer

Tags:

Using Trace Flag 3042 for Backup Compression

October 19, 2011 at 8:56 PMSteven Wang

While we were using the SQL Server 2008 R2 native backup compression to replace the Redgate SQL Backup, we noticed a huge backup file size increase at the beginning of the backup which basically nearlly full up of our backup disk volume.

The database we were backing up close to 2 Terabyte and by using the SQL backup with compression, the final backup size was roughly 376GB as below:

When we used the SQL server native backup with compression

 

Backup database MyDB to disk = N'I:\Backups\CompressionBackup_Without_TF.bak' With init, compression

 

the initial file size has gone up to 638GB, which is roughly one third of the original database size, as below:

This is a huge 262GB backup file size increase, which was totally thrown away our Backup disk capacity plan based on the Redgate compression ratio.

Surprisingly, once the backup completed the file size shrank back to around 380GB which is similar size as the SQL backup compressed file.

This is indeed a default behaviour of the SQL Server Native backup compression. for a compressed backup, it is hard for SQL server to determine what is the final size of the backup. simply, it creates a initial backup files size with 1/3 of the database size (pre-allocation algorithm). In case during the course of the backup more disk space is needed, it will dynamycally extend the file as needed. In case the final backup size is smaller thant the initail 1/3 of the database size, it will truncate the file to the actual.

This is not good enough for us as I mentioned above that we have already planned the capacity based on the Redgate SQL Backup. In some point that we may not have enough disk space to hold the initial backup files. We really want to have the actual size required for the compressed backup.

This issue was solved by introduce a trace flag 3042. The trace flag 3042 bypasses the “pre-allocation algorithm” and grows the file as needed with small performance penalty.

This trace flag was not offically documented but a few month ago it was announced in micorosoft support see the articale Space requirements for backup devices in SQL Server

By using the trace flag, you can notice that the initil backup file size is 0 and will grow up to the final size.

Backup compression with trace falg 3042 on:

 

dbcc traceon(3042)
Go

Backup database BiW to disk = N'I:\Backups\CompressionBackup_With_TF.bak' With init, compression
Go

dbcc traceoff(3042)
Go

 

The initial file size was 0 as below:

You can turn on this trace flag globally by adding the -T3042 to the startup parameters.

Posted in: Backup | Trace Flag

Tags: ,

The Use of Denali 'With Result Sets' Feature in SSIS Data Flow Task

October 15, 2011 at 4:58 AMSteven Wang

The new T-SQL feature 'With Result Sets' in SQL Denali can be very useful for using the stored procedure as the data flow task data source in SSIS.

Before SQL Denali, there are some limitations for SSIS to use stored procedure as data flow task data source, like:

1. if you want the data source to use the different column names, you have to either change the stored procedure or use the derived column component, both are not convenient;
2. The same for the data type change, if you want to change the column data type you again have to either change the stored procedure or use the derived column component;
3. If your stored procedure is ended with dynamic SQL execution, like using Exec (@SQL) or sp_executeSQL  @SQL, the SSIS was not able the return the column. A common work-around is to define a table variable inside the stored procedure, and use the SQL execution to insert the data to the table variable and then select the data from it. This is very awkward and the performance is bad when the data set is big.

 
The new T-SQL feature 'With Result Sets' solves this problem.

The syntax of the With Result Sets is simple and just like below: (More information for the Execute T-SQL, see BOL. link is here)
 
Exec Stored_Proc_Name @Variables
With Result Sets
(
(
Col1 Data_type,
Col2 Data_type,
.....
List All Columns here
)
);
 
A simple example to demo this functionality as SSIS data flow task data source to solve the problem 3 stated in the above.

1. Create a simple SP
USE [AdventureWorksDWDenali]  --Denali sample DW database, free download
GO
CREATE Proc [dbo].[usp_UseWithResultSet_Dynamic_SQL_Denali]
As
Begin
Declare @SQL varchar(Max)

Set @SQL = '
SELECT [GeographyKey]
,[City]
,[StateProvinceName]
,[EnglishCountryRegionName]
FROM[dbo].[DimGeography]
'
Exec (@SQL);

End
GO

2. Create an SSIS data flow task and use exec  [dbo].[usp_UseWithResultSet_Dynamic_SQL_Denali] as data source (SQL Command) while you click preview, it gets error like below:

 

3. By using the With Result sets, you can rename the column name and change the data type (I changed column from nvarchar to varchar), and best of all, there is no error to use this SP as the data source. see the below:

 

4. Terminate with a simple Union All Component and run. It is successful:

Posted in: SQL Server 2012 | SSIS | T-SQL

Tags: , ,

Minimal Logging & Data Manoeuvring On very Large Tables

August 28, 2011 at 12:53 AMSteven Wang

Thanks all for attending my session at Auckland MS CodeCamp Summit 2011.

Please see the attached slides and code presented in the session.

Minimal_logging_maneuvering_data.pptx (2.50 mb)

01_Full_vs_Minimal.sql (3.02 kb)

02_Heap_Tablock_On_Off.sql (1.09 kb)

03_Clustered_Tablock_On_Off.sql (1.21 kb)

04_Clustered_trace610.sql (1.46 kb)

Posted in: Data Load | Data warehouse | Transaction Log

Tags: , , ,