PySpark: Python and SQL — queries
Today’s post is a quick comparison between SQL and Python code inside Databricks. So that, if you are a SQL like developer you can find out how to query data through Python. If you are a Python developer you can check out how to do this in a SQL way.
Let’s go into Synapse Databricks Notebook.
At the beginning we need to create some material in order to do our exercise.
Create a new Spark database.
Load a first file and save it to a table and next to a data frame.
Load a second file and save as a data frame. It needs a refinement, so it will be saved to a table a bit later.
After data is cleaned we can save it to a table.
Here is a part of the structure for the first data frame.
And the second one.
Now, it is the time to go to the main topic.
Here is a Python example:
And a SQL example:
And here is data returned:
A notebook containing the code: PySparkSQLComparison