Bringing ETL to the Masses - Microsoft Codename "Data Explorer"

October 25, 2011 at 4:32 AMSteven Wang

One of the new projects Microsoft announced in this year's PASS summit, Data Explorer, is very interesting and possibly will have a big impact on business user and information workers' way to perform BI analysis.

According to the project lead, Tim Mallalieu, the project was named as Montego (which sounds more like a usual codename.). While in the PASS summit, the cloud version of the project was presented, according to Tim that they do have a client version of Data Explorer, which can run as an Excel add-in, pretty much operates at the similar way as you do with Powerpivot. And from the first look of the interface, it was used the name Powerimport. (Please note the diagrams used here are from early Montego project, not the new Data Explorer interface. Wait for the November 2011, maybe there will be a new demo available.)

 

 

After I looked at the Tim's workthough on the Montego client - the mashup component of Data Explorer, I was very impressed with this project. Basically this excel add-in can integrate different data sources, from Cloud Databases, RSS feeds, websites, excel workbooks, SQL databases etc. You can do the data transformations on the data source (table). The ribbon style of the transformation tools are very easy to understand and use:


The workflow and the concept look very easy to catch for the experienced Execl users. The data sources (tables) can be easily merged, nested, transposed and pivoted:

Data manipulations, De-duplication, Grouping, Aggregations, can be performed:

 

Once you complete this data mashup, it can be pushed back to Excel exactly similar way as what you do for the Powerpivot.

To me, the whole Montego mashup process likes a very simplified and easy to use yet very powerful version of an SSIS data flow task. While to effectively use the SSIS needs a steep learning curve, this mashup is very easy to use and understand.

According to Tim, this mashup can be reused and automatically refreshed from the data sources. They are also working on the integration of the tool with the Powerpivot and the new announced Power View (Previously known as Crescent).

Given that the Excel is still the most popular End user BI tool, I believe the new tool will Bring ETL to the Masses!

Looking forward to the trial version.

You may take some time to watch Tim's Demo video: Early walkthrough of the Montego client

Posted in: Data Explorer

Tags:

Using Trace Flag 3042 for Backup Compression

October 19, 2011 at 8:56 PMSteven Wang

While we were using the SQL Server 2008 R2 native backup compression to replace the Redgate SQL Backup, we noticed a huge backup file size increase at the beginning of the backup which basically nearlly full up of our backup disk volume.

The database we were backing up close to 2 Terabyte and by using the SQL backup with compression, the final backup size was roughly 376GB as below:

When we used the SQL server native backup with compression

 

Backup database MyDB to disk = N'I:\Backups\CompressionBackup_Without_TF.bak' With init, compression

 

the initial file size has gone up to 638GB, which is roughly one third of the original database size, as below:

This is a huge 262GB backup file size increase, which was totally thrown away our Backup disk capacity plan based on the Redgate compression ratio.

Surprisingly, once the backup completed the file size shrank back to around 380GB which is similar size as the SQL backup compressed file.

This is indeed a default behaviour of the SQL Server Native backup compression. for a compressed backup, it is hard for SQL server to determine what is the final size of the backup. simply, it creates a initial backup files size with 1/3 of the database size (pre-allocation algorithm). In case during the course of the backup more disk space is needed, it will dynamycally extend the file as needed. In case the final backup size is smaller thant the initail 1/3 of the database size, it will truncate the file to the actual.

This is not good enough for us as I mentioned above that we have already planned the capacity based on the Redgate SQL Backup. In some point that we may not have enough disk space to hold the initial backup files. We really want to have the actual size required for the compressed backup.

This issue was solved by introduce a trace flag 3042. The trace flag 3042 bypasses the “pre-allocation algorithm” and grows the file as needed with small performance penalty.

This trace flag was not offically documented but a few month ago it was announced in micorosoft support see the articale Space requirements for backup devices in SQL Server

By using the trace flag, you can notice that the initil backup file size is 0 and will grow up to the final size.

Backup compression with trace falg 3042 on:

 

dbcc traceon(3042)
Go

Backup database BiW to disk = N'I:\Backups\CompressionBackup_With_TF.bak' With init, compression
Go

dbcc traceoff(3042)
Go

 

The initial file size was 0 as below:

You can turn on this trace flag globally by adding the -T3042 to the startup parameters.

Posted in: Backup | Trace Flag

Tags: ,

The Use of Denali 'With Result Sets' Feature in SSIS Data Flow Task

October 15, 2011 at 4:58 AMSteven Wang

The new T-SQL feature 'With Result Sets' in SQL Denali can be very useful for using the stored procedure as the data flow task data source in SSIS.

Before SQL Denali, there are some limitations for SSIS to use stored procedure as data flow task data source, like:

1. if you want the data source to use the different column names, you have to either change the stored procedure or use the derived column component, both are not convenient;
2. The same for the data type change, if you want to change the column data type you again have to either change the stored procedure or use the derived column component;
3. If your stored procedure is ended with dynamic SQL execution, like using Exec (@SQL) or sp_executeSQL  @SQL, the SSIS was not able the return the column. A common work-around is to define a table variable inside the stored procedure, and use the SQL execution to insert the data to the table variable and then select the data from it. This is very awkward and the performance is bad when the data set is big.

 
The new T-SQL feature 'With Result Sets' solves this problem.

The syntax of the With Result Sets is simple and just like below: (More information for the Execute T-SQL, see BOL. link is here)
 
Exec Stored_Proc_Name @Variables
With Result Sets
(
(
Col1 Data_type,
Col2 Data_type,
.....
List All Columns here
)
);
 
A simple example to demo this functionality as SSIS data flow task data source to solve the problem 3 stated in the above.

1. Create a simple SP
USE [AdventureWorksDWDenali]  --Denali sample DW database, free download
GO
CREATE Proc [dbo].[usp_UseWithResultSet_Dynamic_SQL_Denali]
As
Begin
Declare @SQL varchar(Max)

Set @SQL = '
SELECT [GeographyKey]
,[City]
,[StateProvinceName]
,[EnglishCountryRegionName]
FROM[dbo].[DimGeography]
'
Exec (@SQL);

End
GO

2. Create an SSIS data flow task and use exec  [dbo].[usp_UseWithResultSet_Dynamic_SQL_Denali] as data source (SQL Command) while you click preview, it gets error like below:

 

3. By using the With Result sets, you can rename the column name and change the data type (I changed column from nvarchar to varchar), and best of all, there is no error to use this SP as the data source. see the below:

 

4. Terminate with a simple Union All Component and run. It is successful:

Posted in: SQL Server 2012 | SSIS | T-SQL

Tags: , ,