Power BI — decimal column optimization — the sad story

Michal Molka
3 min readAug 5, 2022

When you deal with large Power BI models every optimalization method is at a premium. But sometimes you can go into a dead end. Instead of a performance and a model size improvement you can make things worse.

In a context of large datasets. A common practice in some companies is splitting a decimal type column into an integer and a decimal column. The following article will prove this practice is sometimes wrong.

Of course, it isn’t a rock solid rule. Everything depends on the context and dataset properties like amount of data, columns types, a cardinality, etc. Compression algorithms and the engine can behave in various ways depending on a particular data structure.

On order to do this exercise; I’ve created an exemplary spark data frame inside an Azure Synapse Spark Pool.

The data frame contains 1000 random categories created by the following code:

The next step is create a data frame…

…and generate random decimal values.

First column is a decimal type. Next two are a prior column simple split into two: an integer type and a decimal type.

So, we have the data frame which contains 50 000 000 rowa and four columns.

In order to use this data inside Power BI we need to save this data frame as a table.

Let’s switch over to Power BI. Get data using the Azure Synapse Analytics workspace data source.

The following screen shows two imported tables. Both data is perfectly identical. Only one difference are columns. Decimal vs Integer + Decimal.

As you can see, the table containing divided columns is bigger than the second one which holds one decimal column, 168 MB vs 138 MB. This particular case is only 50 000 000 rows and 2|3 columns . The more rows the difference is bigger.

A second argument against this practice is a greater complexity of DAX or Power Query code if applied. Because you need to operate on two columns instead of one. This is a trivial example, but in a more complex code it can elevate a possibility of errors and a code complexity which might be more incomprehensive.

Here is the Jupyter notebook used in this post: pbi_compression_decimal.ipynb

Of course there are a lot of examples where you can achieve a better compression ratio using this technique. This one is a picture that it isn’t always a good solution and needs to implemented carefully.

--

--