From the course: Databricks Certified Data Engineer Associate Cert Prep
Unlock this course with a free trial
Join today to access over 24,900 courses taught by industry experts.
Z-ordering optimization
From the course: Databricks Certified Data Engineer Associate Cert Prep
Z-ordering optimization
- [Instructor] What is Z-ordering? This is an important technique for Delta Lake, and it collates related information in the same set of files. So this co-locality is automatically used by Delta Lake on Databricks' data skipping algorithms, and this means that it can dramatically reduce the amount of data that Delta Lake on Databricks needs to read. So if we take a look at this SQL query here, you can specify the columns to order in the Z-order by clause. So we see optimize events, where date, greater than or equal to, current timestamp interval, one day, and here we get into the magic. Let's go ahead and use Z-order by event type. So some of the things that you can expect is that when you're using a query prediction and the column has high cardinality, so a large number of distinct values, then the Z-order really comes into play and it helps you out in terms of being able to skip certain parts of the data. So…
Contents
-
-
-
-
-
(Locked)
Efficient data transformation with Spark SQL5m 44s
-
(Locked)
Using Catalog Explorer4m 44s
-
(Locked)
Creating tables from files5m 12s
-
(Locked)
Querying external data sources2m 28s
-
(Locked)
Inspecting tables3m 26s
-
(Locked)
Reliable data pipelines with Delta Lake2m 6s
-
(Locked)
ACID transactions2m 5s
-
(Locked)
Z-ordering optimization2m 45s
-
(Locked)
-
-
-