I my previous blog SQL Server 2012 Columnstore Index In Depth Part 1: Make a bigger AdventureworksDW database for Testing, I have created a bigger FactResellerSalesBig table in AdventureworksDW2012 database and a nonclustered columnstore index IX_FactResellerSalesBig_NCI is created. Now we can check how a columnstore index is stored and calculate the size of the nonclustered columnstore index as a whole or even an individual column within the columnstore index.
1. How does a Columnstore Index Store Data
A columnstore index is broken into one or multiple segments for each partition (when a table is not partitioned it has only one partition), which consist of multiple physical data pages. A full segment contains 1,048,576 rows. Based on the number of threads used to create a columnstore index, the ending segments group (normally the number of threads used) for a column in each partition might have different row number which is smaller than 1,048,576. If a base table only has less than 1,048,576 rows then only 1 segment per column will be created.
A columnstore index segment is internally stored as a BLOB data type. This can be verified by using DBCC Page command.
Before using DBCC Page command, we need to know the page ids allocated to the columnstore index we are going to examine. As we are using SQL server 2012, we now have 2 ways to get the page ids:
a) use DBCC IND
b) use the new added Dynamic management view in SQL server 2012: sys.dm_db_database_page_allocations
Although they are both undocumented, to use sys.dm_db_database_page_allocations has many advantages over DBCC IND. I am not going to discuss details about this DM view. Kalen Delaney has an article on SQLMag: SQL server's Undocumented Changes, which has a good explanation on how to use this DM view.
For this post, based on the table FactResellerSalesBig I created (refer to my last post), I will use the below query to get the page informations:
Use [AdventureWorksDW2012]
Go
Select P.database_id
, P.allocated_page_file_id
, P.allocated_page_page_id
, P.allocation_unit_type_desc
From sys.indexes I
Cross Apply
sys.dm_db_database_page_allocations(DB_ID(), I.object_id, I.index_id, Null, 'limited') P
Where
I.object_id = Object_id ('FactResellerSalesBig')
And P.is_iam_page = 0
And P.is_allocated = 1
And P.is_mixed_page_allocation = 0
And I.Type = 6 --Type 6 is 'NONCLUSTERED COLUMNSTORE'
Go
From the demo database AdventureworksDW2012 in my computer, I got the below results:

From the result, we already can know that the data type used to store the columnstore index is BLOB data type. But we can further delve into the physical page level to see how data is stored.
DBCC page (
7, --Database ID
1, --Page File ID
270800, --Page ID
3 --Commnad output style
)

The highlighted information from the DBCC Page output has testified that the BLOB data is used to store columnstore indexes.
2. Calculate the Size of A Nonclustered Columnstore Index
Based on the Columnstoreindex book online, We can calculate the size of the nonclustered columnstore index IX_FactResellerSalesBig_NCI as below:
SELECT SUM(on_disk_size_MB) AS TotalSizeInMB
FROM
(
(SELECT SUM(css.on_disk_size)/(1024.0*1024.0) on_disk_size_MB
FROM sys.indexes AS i
JOIN sys.partitions AS p
ON i.object_id = p.object_id
JOIN sys.column_store_segments AS css
ON css.hobt_id = p.hobt_id
WHERE i.object_id = object_id('FactResellerSalesBig')
AND i.type_desc = 'NONCLUSTERED COLUMNSTORE')
UNION ALL
(SELECT SUM(csd.on_disk_size)/(1024.0*1024.0) on_disk_size_MB
FROM sys.indexes AS i
JOIN sys.partitions AS p
ON i.object_id = p.object_id
JOIN sys.column_store_dictionaries AS csd
ON csd.hobt_id = p.hobt_id
WHERE i.object_id = object_id('FactResellerSalesBig')
AND i.type_desc = 'NONCLUSTERED COLUMNSTORE')
) AS SegmentsPlusDictionary
The result of this query is: 242.34 MB. However, from the database data storage point of view, this calculation does not truly reflect the size of the index data pages in the database. The data size from both systerm catalog views are the BLOB data size which doesn't account for the metadata and page headers and other overheads for a page to hold the blob data.
Thanks to Eric Hanson, the Principal Progran Manger at Microsoft, he cleared my doubt and confirmed that my calculation below is accurate.
SELECT SUM(s.used_page_count) / 128.0 on_disk_size_MB
FROM sys.indexes AS i
JOIN sys.dm_db_partition_stats AS S
ON i.object_id = S.object_id
and I.index_id = S.index_id
WHERE i.object_id = object_id('FactResellerSalesBig')
AND i.type_desc = 'NONCLUSTERED COLUMNSTORE'
The result of this query is: 250.37 MB
As the size of base table FactResellerSalesBig table is 2626 MB, we can get that the columnstore index compression ratio is around 10 : 1.
3. Calculate the Individual Column Size of A Nonclustered Columnstore Index
As columnstore indexes store column data separately, we can calculate each compressed column data size for a columnstore index. The column size can be caculated by using the below script. This script was orginally written by Eric Hanson, the Principal Program Manager of Microsoft. I have made a modification to take care of the situation when not all the columns are included in a columnstore index.
-- size per column
with segment_size_by_column as (
SELECT
p.object_id as table_id,
css.column_id,
SUM (css.on_disk_size)/1024/1024.0 AS segment_size_mb
FROM sys.partitions AS p
JOIN
sys.column_store_segments AS css
ON p.hobt_id = css.hobt_id
GROUP BY p.object_id, css.column_id
),
dictionary_size_by_column as (
SELECT
p.object_id as table_id,
csd.column_id,
SUM (csd.on_disk_size)/1024/1024.0 AS dictionary_size_mb
FROM sys.partitions AS p
JOIN
sys.column_store_dictionaries AS csd
ON p.hobt_id = csd.hobt_id
GROUP BY p.object_id, csd.column_id
)
--As in the real world, not all the columns in a table will be or can be included in a nonclustered columnstore index, we need to join
--to the sys.index_columns to get the correct column id.
Select
Object_Name(s.table_id) as table_name,
C.column_id,
col_name(S.table_id, C.column_id) as column_name,
s.segment_size_mb,
d.dictionary_size_mb,
s.segment_size_mb + isnull(d.dictionary_size_mb, 0) total_size_mb
from segment_size_by_column s
join
sys.indexes I -- Join to sys.Indexes system table
ON I.object_id = s.table_id
join
sys.index_columns c --Join to sys.Index_columns
ON c.object_id = s.table_id
And I.index_id = C.index_id
and c.index_column_Id = s.column_id --Need to join to the index_column_id with the column_id from segment_size_by_column
left outer join
dictionary_size_by_column d
on s.table_id = d.table_id
and s.column_id = d.column_id
Where I.type_desc = 'NONCLUSTERED COLUMNSTORE'
order by total_size_mb desc
(Note: If you have problem to copy and paste the script into the SSMS, copy it and paste to Office Word first and then paste to SSMS.)
Thanks for reading.