SQL Server 2014 is using a combination of data structures for storing the data of a Clustered Columnstore Index. As depicted in the below diagram it has two parts:
- Compressed Column Segments
- Delta Store
Now let us see how the delta stores work.
Lets start with creating a simple table:
CREATE TABLE Int_Table ( ID INT IDENTITY(1, 1) ,EmpId INT ,PinCode INT )
Insert a million records into the table:
SET NOCOUNT ON DECLARE @Counter INT = 0 WHILE (@Counter < 1000000) BEGIN INSERT INTO Int_Table ( EmpId ,PinCode ) SELECT ABS(CHECKSUM(NewId())) % 1000000 ,ABS(CHECKSUM(NewId())) % 100000 SET @Counter = @Counter + 1 END
The important tables that give the details of Columnstore indexes are:
SELECT * FROM sys.column_store_segments SELECT * FROM sys.column_store_dictionaries SELECT * FROM sys.column_store_row_groups
You will not find anything in these as yet since we have not created a CCI (Clustered Columnstore Index)
Lets create a CCI now:
Lets create a CCI now:
CREATE CLUSTERED COLUMNSTORE INDEX IDX ON INT_TABLE WITH (DROP_EXISTING = ON)
You will find 3 segments in column_store_segments - basically one for each column.
And nothing will be found in the column_store_dictionaries as we are dealing with integer columns here , no string types.
When the column_store_row_groups is queried the following is shown :
And nothing will be found in the column_store_dictionaries as we are dealing with integer columns here , no string types.
When the column_store_row_groups is queried the following is shown :
This things to notice here are:
- the delta_store_hobt_id is NULL - This says the entire data is residing as a compressed column store.
- the deleted_rows is 0. This is because SQL Server maintains a delete bitmap for the column store we will discuss about this a little more
Inserts:
When new data is inserted the data is not instantly ,compressed on the fly, put into the column store.
The data goes into the Delta Store.SQL server runs a deamon periodically , the tuple mover, which moves the data from Delta Store.
Lets check that by inserting an additional 1000 rows. we use the same script as above but this time we will insert only 1000 rows.
Now if you query the row_groups you will see :
You will find an additional row with a non-NULL delta_store_hobt_id with the newly inserted 1000 rows which is our DeltaStore.
Deletes:
The Deletes are tracked in a separate data structure called the DeleteBitMapdata
This is can be partially demonstrated.
Lets delete a bunch of rows. Since we have an ID column which is sequential, lets delete the last 1100 rows
This is can be partially demonstrated.
Lets delete a bunch of rows. Since we have an ID column which is sequential, lets delete the last 1100 rows
DELETE FROM Int_table WHERE id > 999900
Now when you query row_groups you will find 1100 in the deleted_rows column
Updates = deletes + inserts
For the Columnstore index case the updates are handled as Deletes + Inserts
Updates = deletes + inserts
For the Columnstore index case the updates are handled as Deletes + Inserts
UPDATE Int_Table SET EmpID = ABS(CHECKSUM(NewId())) % 1000000 WHERE id > 900000
You will notice the number of rows updated will now be in the Delta store ( 99900) and also the deleted count will increase by the same amount:
Move things from Delta Store:
Now the tuple mover moves the data only when the delta store is big enough ( May be a million rows big).
If you want to force the SQL server to move the data to a column store you can re-build the index.
Other Links in the Clustered Column Store Series:
Internals of Clustered Column Store Index
"Clustered Columnstore Indexes" Limitations
Move things from Delta Store:
Now the tuple mover moves the data only when the delta store is big enough ( May be a million rows big).
If you want to force the SQL server to move the data to a column store you can re-build the index.
Other Links in the Clustered Column Store Series:
Internals of Clustered Column Store Index
"Clustered Columnstore Indexes" Limitations
No comments:
Post a Comment