Delta Lake¶
Delta Lake is an open table format that extends Parquet data files with a file-based transaction log. This provides ACID transactions and scalable metadata handling for your data lakes.
The Denodo Embedded MPP includes a predefined catalog named delta connected to the Embedded Hive Metastore, enabling you to access Delta Lake tables immediately.
Here are the default properties for the delta catalog:
connector.name=delta
# connection to the Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083
hive.metastore-timeout=20s
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
hive.parquet-batch-read-optimization-enabled=true
hive.pushdown-filter-enabled=true
Note
The delta catalog is a restricted catalog. This means it is not listed on the From MPP Catalogs tab of the Embedded MPP data source.
To query Delta Lake tables using Denodo, you have to create them using the Embedded MPP data source. This data source provides a graphical interface to explore Delta Lake tables, register them in the Embedded MPP and then create the base views in Denodo.
To graphically explore and register Delta Lake tables from Object Storage follow these steps:
In Denodo, open the Embedded MPP data source.
Connect it to your Object Storage in the
Read Writetab (e.g., Amazon S3, Azure Data Lake Storage). For detailed steps, refer to the Object Storage Data in Open Table Formats section.Go to the
From object storagetab.You will then be able to explore the Delta Lake tables stored in your Object Storage.
Select the desired tables to register them in the Denodo Embedded MPP and create corresponding base views in Denodo.
Explore Delta Lake tables¶
Important
To explore Delta Lake tables graphically, you need the Denodo subscription bundle Enterprise Plus.
Features¶
The Denodo Embedded MPP data source provides the following features when treating with Delta Lake tables:
Graphical exploration and view creation: Explore Data Lake datasets, create tables in the MPP and base views in Denodo via the
From object storagetab of the Embedded MPP data source.External Metastore Integration: Create base views over existing Delta Lake tables managed by an External Metastore via the
From MPP Catalogstab of the Embedded MPP data source.Note: To achieve this, you have to create a new catalog in the Embedded MPP that connects to the External Metastore. This is because the predefined
deltacatalog is restricted and cannot be accessed from theFrom MPP Catalogstab.Querying
Delta protocol version (3, 7) is supported starting with Denodo Embedded MPP 20241007
Limitations¶
When working with Delta Lake tables through the Denodo Embedded MPP, be aware of the following current limitations:
No bulk data load
No full cache mode
No remote tables
No write operations
Supported Operations by Metastore Type¶
The following table summarizes the Delta Lake operations currently supported by the Denodo Embedded MPP for various Metastore types:
Operation |
Hive Metastore |
AWS Glue Data Catalog |
Unity Catalog (*) |
|---|---|---|---|
Read |
Yes |
Yes |
Yes |
Create/Insert |
No |
No |
No |
Update |
No |
No |
No |
Delete |
No |
No |
No |
(*) You can read both managed and external Databricks Delta Lake tables with the Embedded MPP connected to the Unity Catalog, by enabling Iceberg reads via Uniform in Databricks.
