PySpark: Python and SQL — queries

Michal Molka
2 min readAug 19, 2022

Today’s post is a quick comparison between SQL and Python code inside Databricks. So that, if you are a SQL like developer you can find out how to query data through Python. If you are a Python developer you can check out how to do this in a SQL way.

Let’s go into Synapse Databricks Notebook.

At the beginning we need to create some material in order to do our exercise.

Create a new Spark database.

Load a first file and save it to a table and next to a data frame.

Load a second file and save as a data frame. It needs a refinement, so it will be saved to a table a bit later.

After data is cleaned we can save it to a table.

Here is a part of the structure for the first data frame.

And the second one.

Now, it is the time to go to the main topic.

Here is a Python example:

And a SQL example:

And here is data returned:

A notebook containing the code: PySparkSQLComparison

--

--