Prediction of Correctness on Barbell Lifting Exercise Based on Accelerometer Data

November 22, 2015 at 1:45 AMSteven Wang

Posted in: Data Cleaning | Big Data | Data Load | Machine Learning | Random Forest | RPART | Predictive Analytics | R

Tags: , ,

Embedded Power View Report Rendering Issue in Different Browsers

March 17, 2015 at 3:19 PMSteven Wang

Power View Report embedded view is indeed a very nice feature and opens up a big opportunities for ISVs and companies and individuals to share data and visualizations. Please check this blog, Embed Power View Interactive Reports in Your Blogs and Websites, for more details about how to embed a power view report.

While this is great news, I also found that there are some frustrations. The embedded view is rendered nicely in IE 9 but there are issues with IE 11 and other main browsers. 

I have put together a few mains issues I found. If you have same issue and different issue please share your experience with me. Thanks.

Posted in: Data Explorer | Power BI | Power View


Top Australian Primary Schools: Its Ranking, Its location and Its Change

February 28, 2015 at 10:40 PMSteven Wang

Just for fun, I create this AU top Primary school ranking dashboard. Just want to clear that I bear no responsibility for the accuracy of the school ranking. The main data source is from here -

For separate web page view, click Australian Top Primary School

Posted in:


SSRS 2012: Report Server cannot load the TERADATA/SQLPDW Extension Error

May 6, 2013 at 7:49 PMSteven Wang


In the SQL Server 2012 reporting service server event log, you might have these 2 errors:

1.       Report Server cannot load the SQLPDW Extension

2.       Report Server cannot load the TERADATA Extension

While you are not using the TERADATA/SQLPDW functionalities in your report, basically you can ignore these errors. This is because by default TERADATA and SQLPDW data extension is turned on in the reporting server configuration file.

Even you are not using the TERADATA/SQLPDW functionalities, it is still good idea to suppress the error logged into the event log.

Open the reporting service configuration file rsreportserver.config under your SSRS installation folder. If you installed SSRS 2012 by default, the file will be under:

C:\Program Files\Microsoft SQL Server\MSRS11.MSSQLSERVER\Reporting Services\ReportServer\

If you locate the Data Section you will see the below yellow highlighted data extension TERADATA and SQLPDW is turned on.

If you comment out these 2 extensions as below and save, those error messages will no longer be generated.


Posted in:

Tags: ,

SQL Server 2012 MDS Windows Server Minimal Requisites

April 22, 2013 at 6:49 PMSteven Wang

In order to make SQL Server 2012 MDS functioning properly under Windows Server 2008 R2, the minimum windows features must be met.

1.       1. Add Application Server role

The following role servers for Application Role need to be included:

·         .NET Framework 3.5.1

·         Web Server (IIS) support

·         HTTP Activation

As illustrated in the diagram below (ticked services):


1.       2. Add Web Server (IIS) role

Make sure that the yellow highlighted services are ticked.

1.     3.  Check additional windows features

Make sure the Features listed below (yellow highlighted) are installed:

Posted in: Master Data Service


Script out Indexes and Index Option Settings for a SQL Server Database

December 11, 2012 at 2:58 PMSteven Wang

Sometimes, we need to script out the index definitions and its option settings. If you have a source control then you are lucky, as all index definitions should be kept in your source control system. If you want to just script out indexes for a table, you were also able to do so by selecting all indexes in the object explorer details. But if we have a query can easily script out the index definition and its settings then it will be much easier for testing or index creation automation.

In SQL server, the index definition and option settings are not stored in a single table, you have to join a few system tables or DMVs together to get all those information. Here I will share the script I used to script out all the indexes.

My consideration:

1. Only apply to the user defined indexes;

2. Clustered indexes created with Primary Key constraint is not included though it can be done.

3. Spatial Indexes and XML Indexes are not considered.

4. The Data compression setting for Partitioned indexes is simplified here as it is pretty complicated to be done just in a select statement. 

5. The Drop_Existing and Sort_In_TempDB setting is hard-coded in the script, which depending on your situation you can change it.

The script is as below:

Select		A.[object_id]
			, OBJECT_NAME(A.[object_id]) AS Table_Name
			, A.Index_ID
			, A.[Name] As Index_Name
			, CAST(
					Case	When A.type = 1 AND is_unique = 1 Then 'Create Unique Clustered Index ' 
							When A.type = 1 AND is_unique = 0 Then 'Create Clustered Index ' 
							When A.type = 2 AND is_unique = 1 Then 'Create Unique NonClustered Index '
							When A.type = 2 AND is_unique = 0 Then 'Create NonClustered Index '
					+ quotename(A.[Name]) + ' On ' + quotename( + '.' + quotename(OBJECT_NAME(A.[object_id])) + ' ('
					+ Stuff(
										',[' + COL_NAME(A.[object_id],C.column_id)
										+ Case When C.is_descending_key = 1 Then '] Desc' Else '] Asc' End
								From	sys.index_columns C WITH (NOLOCK)
								Where	A.[Object_ID] = C.object_id
										And A.Index_ID = C.Index_ID
										And C.is_included_column = 0
								Order by C.key_Ordinal Asc
								For XML Path('')
							,1,1,'') + ') '

					+ CASE WHEN A.type = 1 THEN ''
							ELSE Coalesce('Include ('
									+ Stuff(
														',' + QuoteName(COL_NAME(A.[object_id],C.column_id))
												From	sys.index_columns C WITH (NOLOCK)
												Where	A.[Object_ID] = C.object_id
														And A.Index_ID = C.Index_ID
														And C.is_included_column = 1
												Order by C.index_column_id Asc
												For XML Path('')
											,1,1,'') + ') '
						,'') End
					+ Case When A.has_filter = 1 Then 'Where ' + A.filter_definition Else '' End
					+ ' With (Drop_Existing = OFF, SORT_IN_TEMPDB = ON'
					--when the same index exists you'd better to set the Drop_Existing = ON
					--SORT_IN_TEMPDB = ON is recommended but based on your own environment.
					+ ', Fillfactor = ' + Cast(Case When fill_factor = 0 Then 100 Else fill_factor End As varchar(3)) 
					+ Case When A.[is_padded] = 1 Then ', PAD_INDEX = ON' Else ', PAD_INDEX = OFF' END
					+ Case When D.[no_recompute] = 1 Then ', STATISTICS_NORECOMPUTE  = ON' Else ', STATISTICS_NORECOMPUTE  = OFF' End                  
					+ Case When A.[ignore_dup_key] = 1 Then ', IGNORE_DUP_KEY = ON' Else ', IGNORE_DUP_KEY = OFF' End
					+ Case When A.[ALLOW_ROW_LOCKS] = 1 Then ', ALLOW_ROW_LOCKS = ON' Else ', ALLOW_ROW_LOCKS = OFF' END
					+ Case When A.[ALLOW_PAGE_LOCKS] = 1 Then ', ALLOW_PAGE_LOCKS = ON' Else ', ALLOW_PAGE_LOCKS = OFF' End  
					+ Case When P.[data_compression] = 0 Then ', DATA_COMPRESSION = NONE' 
							When P.[data_compression] = 1 Then ', DATA_COMPRESSION = ROW' 
							Else ', DATA_COMPRESSION = PAGE' End                 
					+ ') On ' 
					+ Case when C.type = 'FG' THEN quotename( 
							ELSE quotename( + '(' + F.Partition_Column + ')' END + ';' --if it uses partition scheme then need partition column
			As nvarchar(Max)) As Index_Create_Statement
			, AS FileGroupName
			, 'DROP INDEX ' + quotename(A.[Name]) + ' On ' + quotename( + '.' + quotename(OBJECT_NAME(A.[object_id])) + ';' AS Index_Drop_Statement
From		SYS.Indexes A WITH (NOLOCK)
			sys.objects B WITH (NOLOCK)
				ON	A.object_id = B.object_id
			SYS.schemas S
				ON	B.schema_id = S.schema_id            
			SYS.data_spaces C WITH (NOLOCK)
				ON	A.data_space_id =   C.data_space_id   
			SYS.stats D WITH (NOLOCK)
				ON	A.object_id = D.object_id
					AND A.index_id = D.stats_id  
				Inner Join
			--The below code is to find out what data compression type was used by the index. If an index is not partitioned, it is easy as only one data compression
			--type can be used. If the index is partitioned, then each partition can be configued to use the different data compression. This is hard to generalize,
			--for simplicity, I just use the data compression type used most for the index partitions for all partitions. You can later rebuild the index partition to
			--the appropriate data compression type you want to use
				select object_id, index_id, Data_Compression, ROW_NUMBER() Over(Partition By object_id, index_id Order by COUNT(*) Desc) As Main_Compression
				From sys.partitions WITH (NOLOCK)
				Group BY object_id, index_id, Data_Compression
			)	P
				ON	A.object_id = P.object_id
					AND A.index_id = P.index_id
					AND P.Main_Compression = 1
				Outer APPLY
				SELECT	COL_NAME(A.object_id, E.column_id) AS Partition_Column
				From	sys.index_columns E WITH (NOLOCK)
				WHERE	E.object_id = A.object_id
						AND E.index_id = A.index_id
						AND E.partition_ordinal = 1
			) F     
Where		A.type IN (1,2) --clustered and nonclustered
			AND B.Type != 'S'
			AND is_primary_key = 0 --this is not for primary key constraint
			AND OBJECT_NAME(A.[object_id]) not like 'queue_messages_%'
			AND OBJECT_NAME(A.[object_id]) not like 'filestream_tombstone_%'
			AND OBJECT_NAME(A.[object_id]) not like 'sys%' --if you have index start with sys then remove it
OPTION (Recompile);


I have also attached a sql script file here:

Scriptout_Indexes.sql (4.95 kb)




Advanced Data Cleaning Techniques in SQL Server -- NZ BI User Group

November 9, 2012 at 10:07 PMSteven Wang

I will present at NZ BI User Group for the topic: Advanced Data Cleaning Techniques in SQL Server for NZ BI User Group at 20th November at Microsoft Auckland.


In real word, data sources are too often full of dirty data due to data entry errors and differences in data representation among data sources. The poor data quality has been very negatively impacting business intelligence and decision support systems. The data cleaning process is the paramount important phase of a business intelligence project.

In this session, some practical data cleaning techniques will be discussed, which include but not limited to the list below:

  • Data Profiling
  • SQL Server Window Functions
  • Script Transformation with Regular Expression
  • Fuzzy Grouping
  • Fuzzy Lookup
  • DQS Cleansing (Brief Introduction)
This is a 300 to 400 level session and some fuzzy logic internals will also be touched.
Attached the presentation files as below: (3.87 mb)



Posted in: Data Cleaning | DQS | SSIS


Using SQL Server 2012 Window Functions to Solve Common T-SQL Challenges

September 10, 2012 at 1:55 AMSteven Wang

Thanks to all who attended my session at the TechED New Zealand 2012. I hope that you have enjoyed the session. I have attached the presentation slides deck and Demo code in this blog.

Session Abstract:

SQL Server 2012 has introduced several new window functions and enhanced support for window aggregate functions by introducing window order and frame clauses, support for offset functions. In this session, the presenter will apply these new functions to solve some most frequently asked questions in MSDN T-SQL forum. If you have ever been faced with a challenge of how to calculate moving average, identify gap between records, combine consecutive and/or overlapped records and calculate running totals etc., then this session is for you.


Session Video:

Using SQL Server 2012 Window Functions 

Session Slides:

DBI309_Using_Window_Functions.pptx (1.79 mb)

Demo Code and DB:

TechEdNZ2012.bak (5.27 mb) (258.64 kb)



Posted in: SQL Server 2012 | T-SQL | Window Functions

Tags: , , ,

An Odd Behaviour to Use SQL Server 2012 Profiler for Analysis Service From SSMS

July 19, 2012 at 7:15 PMSteven Wang


1. Windows server 2008 R2 enterprise edition

2. A new server with multiple SQL server 2012 Analysis Service instances installed.

3. SSMS is installed.

4. SQL Server 2012 database engine is not installed in the server.

5. SQL Server 2012 CU2 is applied


When I tried to use the SQl Server Profile on SSMS from the new server to run a trace on any SQL server 2012 SSAS instance as below:

I got the error message below.

Indeed this error message is not new to SQL server 2012. Nikesh Mhatre had an article on MSDB blog for this issue. Please see this link: Unable to read trace definition file Microsoft SQL Server TraceDefinition n.n1.n2.xml

However, the problem I had is different even though the error message is similar.

I then used another server which has both SQL server database engine and SSAS installed, and I used the same way trying to connect to the server I mentioned above. I still got the same error message.



The solution to this issue is indeed very simple. There are 2 solutions:

Solution 1: Instead launch SQL Server Profiler from SSMS, you launch the Profiler from Start, All Program, Microsoft SQL server 2012, Performance tools, then click SQL Server Profiler as depicted below. By launching the profiler from here, I have no problem to connect to the SSAS instance. 


 Solution 2: After I launched the Profiler from SSMS, I used the SQL Server 2012 SQL Server Profiler to connect to any SQL Server either SQL server 2008 R2 or SQL server 2012 by using database Engine as Server Type first as below. It is no problem.

After I connected to a Database engine trace, I was able to launch a new trace on the SSAS instance with no problem. It seems very odd.


I’m not sure if this is a bug on launching SQL server profiler from SSMS 2012 for SSAS instances or there is any specific permission needs to be set.

I hope this post helps if you have same problem. If you know the rooting cause of the problem, please let me know. thanks.

Posted in: SQL Server 2012 | SQL Server Profiler


Intelligently Backup SQL Server Analysis Service Databases

May 3, 2012 at 9:50 PMSteven Wang

 There are normally 2 main approaches to backup SQL SSAS databases:

1. Backup SSAS Databases at a scheduled time

2. Backup SSAS databases whenever after a process is performed.

Although there is nothing wrong with these 2 approaches, they are both not the best. They are both not very easy to be managed, especially when there are lots of databases under same SSAS instance. A better way is to intelligently backup SSAS databases based on the data change. A job can be set up to check the data change for the SSAS databases at a scheduled interval and if an SSAS database has been detected a change either data or schema then it will be backed up; if it has nothing been changed then a new backup is not needed. In this post, I will use an SSIS package to intelligently manage the SSAS database backup. My solution is based on the logic below:

1. Only modified SSAS databases (Data or schema) will be backed up

2. A new added SSAS database will be always backed up

3. SSAS backups can be traced. The backup history will be kept.

4. backup file retention can be configured (how many backup file need to be kept?)

To start with, I need 2 SSAS maintenance tables to manage the backup solution. I use the below script for my tables: (there are some columns in this script is not specific for the backup purpose, but as they are related to the SSIS package, they will be kept.)

CREATE TABLE [dbo].[Database_Backup_Admin]
	[Database_Name] [nvarchar](128) NOT NULL,
	[Data_Modified] [datetime] NULL,
	[Backup_Date] [datetime] NULL,
	[DNS_Instance_Name] [nvarchar](128) NOT NULL,
	[Instance_Type] [nvarchar](50) NOT NULL,
	[Backup_file_to_Keep] [tinyint] NULL,
	[Is_Current] [bit] NULL,
	[Is_Backuped_Latest] [bit] NULL,
	[CacheWarm_Date] [datetime] NULL,
	[Is_CacheWarm_Current] [bit] NULL,
	[Is_CacheWarm_On] [bit] NULL,
		[Database_Name] ASC,
		[DNS_Instance_Name] ASC,
		[Instance_Type] ASC

CREATE TABLE [dbo].[Database_Backup_History]
	[Database_Name] [nvarchar](128) NOT NULL,
	[Backup_Date] [datetime] NOT NULL,
	[DNS_Instance_Name] [nvarchar](128) NOT NULL,
	[Instance_Type] [nvarchar](50) NOT NULL,
	[Backup_file_Name] [nvarchar](128) NULL,
	[Backup_file_Size] [int] NULL,
	[Backup_start_Time] [datetime] NULL,
	[Backup_End_Time] [datetime] NULL,
	[Backup_Duration_Second] [int] NULL,
		[Database_Name] ASC,
		[Backup_Date] ASC,
		[DNS_Instance_Name] ASC,
		[Instance_Type] ASC


Once the tables are created, I created an SSIS package with the layout as below:

From the package task flow, the first 2 steps are to check if all the databases in our Database_Backup_Admin table are up to date. If a database was in the server but now is removed, then the Is_Current column of the table for the database will be set to 0.  If there is a new database added into the server, it will be loaded into the table. There is a slowly changing dimension data transformation used for for this check. The DBSchema_Catalogs DMV is used to check the SSAS databases in the server instance. The data flow task for the database check as below:

SSAS database data change can be verified by MDschema_cubes DMV. The last_data_update column of MDschema_cubes has a time stamp for the cube data change. The package will update the Data_Modified column for Database_Backup_Admin table based on the last_data_update column of MDschema_Cubes.

After the Data_Modified column is updated, a comparison between Data_Modified and Backup_date will be performed. If the backup_date is older than Data_Modified, then Is_Backuped_Latest will be marked as False, which suggests a new backup is needed.

The last for each loop container is used to backup SSAS database by using XMLA backup command. A logic is also used to remove the older backup files based on the configuration how many backup files to keep. The Scrip used to remove the old backup file:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO
Imports System.Windows.Forms

<System.AddIn.AddIn("ScriptMain", Version:="1.0", Publisher:="", Description:="")> _
<System.CLSCompliantAttribute(False)> _
Partial Public Class ScriptMain
	Inherits Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase

	Enum ScriptResults
		Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success
		Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
	End Enum
    Public Class CompareFileLastWriteTime
        Implements IComparer
        Public Function Compare(ByVal x As Object, ByVal y As Object) As Integer Implements IComparer.Compare
            'this "Compare" code is bollowed from:
            Dim File1 As FileInfo
            Dim File2 As FileInfo

            File1 = DirectCast(x, FileInfo)
            File2 = DirectCast(y, FileInfo)

            Compare = DateTime.Compare(File2.LastWriteTime, File1.LastWriteTime)
        End Function
    End Class

	Public Sub Main()
		' Add your code here

        Dim BackupDir As String = Dts.Variables("BackupDir").Value.ToString
        Dim DatabaseName As String = Dts.Variables("DatabaseName").Value.ToString
        Dim FilesToKeep As Integer = CInt(Dts.Variables("FilesToKeep").Value)
        Dim NewBackupFile = New FileInfo(BackupDir & Dts.Variables("BackupFileName").Value.ToString)
        Dim directory = New DirectoryInfo(BackupDir)
        Dim AllBackups = directory.GetFiles(DatabaseName & "_*")
        Dim filename As String = ""
        Dim oldesttime As Date = Now
        Dim FilesCount As Integer = 0

        Array.Sort(AllBackups, New CompareFileLastWriteTime)
        Dts.Variables("BackupEndTime").Value = Now

        For Each entry In AllBackups
            If oldesttime > entry.LastWriteTime Then
                oldesttime = entry.LastWriteTime
                filename = entry.FullName
                FilesCount += 1
                If FilesCount > FilesToKeep AndAlso filename <> "" Then
                    FilesCount -= 1
                End If
            End If

        Dts.Variables("BackupFileSize").Value = CInt(NewBackupFile.Length / 1024)

		Dts.TaskResult = ScriptResults.Success
	End Sub

End Class


The package has the below variables:

 For these variables, these variables need to be configured:

1. InstanceName

2. DatabaseName

3. BackupDir (Olap backup directory, please note that you need to replace the single \ with \\ if you set value through sql agent job)

4. TimeZone (As the SSAS DMVs record the times as the UTC time, so you need to configure this TimeZone to your local time, for instance, New Zealand is 12.)

5. BackupFilesToKeep (How many backup files need to be kept)

The BackupXMLA variable is an expression, which will be changed dynamically for each loop:

"<Backup xmlns=\"\">
    <DatabaseID>" + @[User::DatabaseName]  + "</DatabaseID>
  <File>" + @[User::BackupFileName]  + "</File>

 This is the command to do the SSAS database backup.

 I have attached the package here.

Intelligent_OLAPDB_Backup_BICOE.dtsx (289.11 kb)


An SQL agent job can set up to perform the backup and remember to set the variables below to your own values:


Posted in: Backup | SSAS | SSIS

Tags: , , , , ,