Pages

Thursday, May 1, 2014

Clustered Columnstore Index - Delta Store explained

SQL Server 2014 is using a combination of data structures for storing the data of a Clustered Columnstore Index. As depicted in the below diagram it has two parts:
  • Compressed Column Segments
  • Delta Store



Now let us see how the delta stores work.
Lets start with creating a simple table:


CREATE TABLE Int_Table (
 ID INT IDENTITY(1, 1)
 ,EmpId INT
 ,PinCode INT
 )
Insert a million records into the table:
SET NOCOUNT ON

DECLARE @Counter INT = 0

WHILE (@Counter < 1000000)
BEGIN
 INSERT INTO Int_Table (
  EmpId
  ,PinCode
  )
 SELECT ABS(CHECKSUM(NewId())) % 1000000
  ,ABS(CHECKSUM(NewId())) % 100000

 SET @Counter = @Counter + 1
END
The important tables that give the details of Columnstore indexes are:

SELECT *
FROM sys.column_store_segments

SELECT *
FROM sys.column_store_dictionaries

SELECT *
FROM sys.column_store_row_groups


You will not find anything in these as yet since we have not created a CCI (Clustered Columnstore Index)
Lets create a CCI now:

CREATE CLUSTERED COLUMNSTORE INDEX IDX ON INT_TABLE
 WITH (DROP_EXISTING = ON)

You will find 3 segments in column_store_segments - basically one for each column.
And nothing will be found in the column_store_dictionaries as we are dealing with integer columns here , no string types.
When the column_store_row_groups is queried the following is shown :




This things to notice here are:
  • the delta_store_hobt_id is NULL - This says the entire data is residing as a compressed column store.
  • the deleted_rows is 0. This is because SQL Server maintains a delete bitmap for the column store we will discuss about this a little more
Inserts:

When new data is inserted the data is not instantly ,compressed on the fly, put into the column store. 
The data goes into the Delta Store.SQL server runs a deamon periodically , the tuple mover, which moves the data from Delta Store. 
 Lets check that by inserting an additional 1000 rows. we use the same script as above but this time we will insert only 1000 rows.

Now if you query the row_groups you will see :



You will find an additional row with a non-NULL delta_store_hobt_id with the newly inserted 1000 rows which is our DeltaStore.

Deletes: 
The Deletes are tracked in a separate data structure called the DeleteBitMapdata
This is can be partially demonstrated.
Lets delete a bunch of rows. Since we have an ID column which is sequential, lets delete the last 1100 rows 

DELETE
FROM Int_table
WHERE id > 999900
Now when you query row_groups you will find 1100 in the deleted_rows column


Updates = deletes + inserts
For the Columnstore index case the updates are handled as Deletes + Inserts

UPDATE Int_Table
SET EmpID = ABS(CHECKSUM(NewId())) % 1000000
WHERE id > 900000
You will notice the number of rows updated will now be in the Delta store ( 99900) and also the deleted count will increase by the same amount:

Move things from Delta Store:
Now the tuple mover moves the data only when the delta store is big enough ( May be a million rows big).
If you want to force the SQL server to move the data to a column store you can re-build the index.

Other Links in the Clustered Column Store Series:
Internals of Clustered Column Store Index
"Clustered Columnstore Indexes" Limitations

No comments:

Post a Comment