Databricks — Delta Live Tables
If you use Databricks, you want to have your ETL/ELT simplified and more convenient. If you want to have tables always coherent with the source you can use this cool functionality.
Delta Live tables are built from two main parts: a notebook where the magic happens, and the continuous job. Firstly, we cover a notebook.
We have three DLT tables. The first one a [dlt_source_table] simply takes data from a source — in this case it is an [iowa.iowa_delta] delta table. The first table might be omitted in the code and the source can be referenced directly in a [iowa_corrected] table.
The second table, the [iowa_corrected] is a DLT table where a value +200 is added to a [Sale_Dollars] column. And the third one, a [dlt_iowa_aggregated] table filters out data and aggregates mentioned column by [County] column — the source is the previous table, the [iowa_corrected] table.
You can set a quality constraints — lines 10, 11, 12. Where you can set how the flow should behave if a particular constraint violates rules. You can set up multiple rules, for example.
Here is the code: delta_live_tables
We have a notebook, so it is the time to do something with it. Go to Workflows -> Delta Live Tables and create a new pipeline. Setting up the pipeline is pretty simple. I emphasize two things. The pipeline has to be set as continuous. And if you want to have your DLT tables visible in a particular database or be able to call them from another notebooks, you can set a target database.
When you have created a pipeline hit the RUN button. The entire setup configures a while…
…and then you can monitor the pipeline state.
As you can see, DLT tables are present inside the database.
Let’s add a new record to the source: the [iowa_delta] table.
After a few seconds a pipeline status has changed.
The new record has appeared in the synched table [dlt_iowa_corrected].