Power BI - looking for a correlation

Michal Molka
2 min readOct 30, 2020

--

When it comes to Python or R. You can calculate a correlation coefficient pretty easy. But, how to do it in Power BI?

We will be looking for a correlation coefficient for a [Reputation] and a [Score] columns in the Stack Overflow database.

Here is the data structure.

As we see bellow, the correlation coefficient for the [Score] and the [Reputation] columns is 0.86

from sys import path
import pandas as pd
path.append("\\Program Files\\Microsoft.NET\\ADOMD.NET\\150")
from pyadomd import Pyadomd

CONNECTION_STRING = r"Provider=MSOLAP;Data Source=localhost:53914;Catalog=8c9e2314-8ba6-413f-8ba9-48527f217b26;"
QUERY_STRING = """EVALUATE StackOverflowCorrelation """

with Pyadomd(CONNECTION_STRING) as conn:
with conn.cursor().execute(QUERY_STRING) as cur:
users_df = pd.DataFrame(cur.fetchone(), columns=[i.name for i in cur.description])

users_df = users_df.fillna(0)
users_df.drop("StackOverflowCorrelation[UserId]", axis=1).corr()

Now, we can implement a correlation coefficient formula in the DAX.

Correlation = 
VAR FirstAvg = AVERAGE(StackOverflowCorrelation[Reputation])
VAR SecondAvg = AVERAGE(StackOverflowCorrelation[Score])
VAR Numerator =
SUMX(
StackOverflowCorrelation,
(StackOverflowCorrelation[Reputation] - FirstAvg)*(StackOverflowCorrelation[Score] - SecondAvg)
)
VAR Denumerator01 =
SUMX(
StackOverflowCorrelation,
POWER(
StackOverflowCorrelation[Reputation] - FirstAvg,
2
)
)
VAR Denumerator02 =
SUMX(
StackOverflowCorrelation,
POWER(
StackOverflowCorrelation[Score] - SecondAvg,
2
)
)


RETURN Numerator/SQRT(Denumerator01*Denumerator02)

And there is our result.

--

--

Michal Molka
Michal Molka

Written by Michal Molka

Architect | Azure | Power BI | Fabric | Power Platform | Infrastructure | Security | M365

No responses yet