POWER BI —Direct Query on Clustered Columnstore Index
When you don’t want to load your model into memory, you use a Direct Query mode. As you now, gap between an in-memory and DQ performance can be immense. We can use a Direct Query mode and attain a good effectiveness using a DQ on a Columnstore Clustered Indexed table.
I’ve prepared two identical tables.
The first: [GHCN_CCI] — Clustered Columnstore Index
CREATE CLUSTERED COLUMNSTORE INDEX CCI_GHC_CCI ON dbo.GHCN_CI
The second: [GHCN_RSI] — a rowstore containing two indexes: Clustered and Non-Clustered.
CREATE CLUSTERED INDEX IX_StationID_Alpha2 ON dbo.GHCN_RSI(StationID, Alpha2)
CREATE NONCLUSTERED INDEX IX_Date ON dbo.GHCN_RSI([Date])
Both tables contain 0.7 billion rows.
A Power BI model:
A first query:
DEFINE
MEASURE GHCN_CCI[COUNT_CCI] = COUNTROWS(GHCN_CCI)
MEASURE GHCN_CCI[SUM_CCI] = SUM(GHCN_CCI[Value])/10
MEASURE GHCN_CCI[AVG_CCI] = AVERAGE(GHCN_CCI[VALUE])/10
EVALUATE
SUMMARIZECOLUMNS(
'ghcnd-countries'[CountryName],
CALCULATETABLE(
GHCN_CCI,
GHCN_CCI[Type] = "TAVG",
'ghcnd-countries'[Alpha2] in {"PL"}
),
"COUNT", GHCN_CCI[COUNT_CCI],
"SUM", GHCN_CCI[SUM_CCI],
"AVG", GHCN_CCI[AVG_CCI]
)
The rowstore table is partly covered by indexes: 24.5 sec.
The columnstore table: 0.7 sec.
A second query, the rowstore table is covered by indexes:
DEFINE
MEASURE GHCN_CCI[COUNT_CCI] = COUNTROWS(GHCN_CCI)
MEASURE GHCN_CCI[SUM_CCI] = SUM(GHCN_CCI[Value])/10
MEASURE GHCN_CCI[AVG_CCI] = AVERAGE(GHCN_CCI[VALUE])/10
EVALUATE
SUMMARIZECOLUMNS(
'ghcnd-countries'[CountryName],
CALCULATETABLE(
GHCN_CCI,
DATESBETWEEN(GHCN_CCI[Date], "2020-01-01", "2020-10-01")
),
"COUNT", GHCN_CCI[COUNT_CCI],
"SUM", GHCN_CCI[SUM_CCI],
"AVG", GHCN_CCI[AVG_CCI]
)
The rowstore table: 37.7 sec. vs the columnstore table: 1.2 sec.
A third one, without filters:
DEFINE
MEASURE GHCN_CCI[COUNT_CCI] = COUNTROWS(GHCN_CCI)
MEASURE GHCN_CCI[SUM_CCI] = SUM(GHCN_CCI[Value])/10
MEASURE GHCN_CCI[AVG_CCI] = AVERAGE(GHCN_CCI[VALUE])/10
EVALUATE
SUMMARIZECOLUMNS(
'ghcnd-countries'[CountryName],
GHCN_CCI,
"COUNT", GHCN_CCI[COUNT_CCI],
"SUM", GHCN_CCI[SUM_CCI],
"AVG", GHCN_CCI[AVG_CCI]
)
The rowstore table : 39 sec. vs the columnstore table: 2.3 sec.
As you can see. During developing Power BI solutions. You may take an advantage of a columnstore index in combination with a Direct Query.