Michal Molka
3 min readAug 28, 2020

--

Tabular model — datetime columns optimization

I want to show you how you can optimize your model size by splitting a datetime column into separate date and time columns and how it affects a model size.

Our example is the [Posts] table placed in the Stack Overflow SQL Server Database. Its size is 120GB and 46 947 633 rows.

We don’t need all columns like a [Body], a [Tags], a [Title] for analysis purposes, so I deleted a few.

This is our table. It’s size is 3.5GB in a database.

I placed that table into a SSAS model. Its initial size is 6.2 GB.

We can notice that the largest columns are a [CreationDate], a [LastActivityDate] and a [LastEditDate]. These three columns size is 5.9GB.

What can we do?

A better approach is to split the datetime column into separate date and time columns.

CREATE OR ALTER VIEW dbo.PostsSplitDateTime
AS
SELECT
[AnswerCount]
,[ClosedDate]
,[CommentCount]
,[CommunityOwnedDate]
,CONVERT(date, [CreationDate]) AS [CreationDate]
,CONVERT(time, [CreationDate]) AS [CreationTime]
,[FavoriteCount]
,CONVERT(date, [LastActivityDate]) AS [LastActivityDate]
,CONVERT(time, [LastActivityDate]) AS [LastActivityTime]
,CONVERT(date, [LastEditDate]) AS [LastEditDate]
,CONVERT(time, [LastEditDate]) AS [LastEditTime]
,[OwnerUserId]
,[PostTypeId]
,[Score]
,[ViewCount]
FROM [StackOverflow].[dbo].[PostsTabular]

After the view is imported into the tabular model our model equals 3.9GB.

As we can see date columns are smaller, but time columns are very large. Why? Because they contain milliseconds which implicates large cardinality. Let’s try to trim milliseconds and see what will happen.

CREATE OR ALTER VIEW dbo.PostsSplitDateTimeMilisecondsTrimmed
AS
SELECT
[AnswerCount]
,[ClosedDate]
,[CommentCount]
,[CommunityOwnedDate]
,CONVERT(date, [CreationDate]) AS [CreationDate]
,CONVERT(time(0), [CreationDate]) AS [CreationTime]
,[FavoriteCount]
,CONVERT(date, [LastActivityDate]) AS [LastActivityDate]
,CONVERT(time(0), [LastActivityDate]) AS [LastActivityTime]
,CONVERT(date, [LastEditDate]) AS [LastEditDate]
,CONVERT(time(0), [LastEditDate]) AS [LastEditTime]
,[OwnerUserId]
,[PostTypeId]
,[Score]
,[ViewCount]
FROM [StackOverflow].[dbo].[PostsTabular]

Our model size is 780MB.

On this stage we can start celebrating our success. We decreased the model size from 6.2GB to 780MB. But, do we need seconds in the time columns? Seconds are rarely useful in reports and measures, so we can round time to full minutes. You can attain this following this pattern:

CAST(CONVERT(time(0), [CreationDate]) AS smalldatetime) AS [CreationTime]

By the way, let’s apply all the previous steps to a [CommunityOwnedDate] and a [ClosedDate] fields.

Now the model size is 597MB. We reduced the model volume by 90%. 6.2 -> 0.6 GB

--

--