Tabular model — datetime columns optimization
I want to show you how you can optimize your model size by splitting datetime column into separate date and time columns and how it affects on model size.
I have Posts table in my StackOverflow SQL Server Database. It size is 120GB and 46 947 633 rows.
We don’t need all columns like Body, Tags, Title for analysis purposes, so I deleted a few.
I put that table into a SSAS model. It size grew to 6.2GB, not so good.
But we can notice that the largest columns are CreationDate, LastActivityDate and LastEditDate. The size of these three columns is 5.9GB.
What can we do?
Best approach is spitting datetime column into separate date and time column.
After we imported view into tabular model our model size is 3.9GB.
As we can see date columns are smaller, but time columns are large. Why? Because they contains milliseconds which implicit large cardinality. Lets try to trim milliseconds and see and what happens.
Our model size is 780MB.
On this stage we can start celebrationg our success. We deceased model size from 6.2GB to 780MB. But, do we need seconds in time columns? Seconds are rarely useful in reports and measures, so we can round time to full minutes. By the way lets do the same thing with CommunityOwnedDate and ClosedDate fields.
Model size is 597MB. We reduced model size by 90%.