Delta Lake¶
Delta Lake is an open table format that extends Parquet data files with a file-based transaction log. This provides ACID transactions and scalable metadata handling for your data lakes.
The Denodo Lakehouse Accelerator includes a predefined catalog named delta connected to the Embedded Hive Metastore, enabling you to access Delta Lake tables immediately.
Here are the default properties for the delta catalog:
connector.name=delta
# connection to the Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083
hive.metastore-timeout=20s
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
hive.parquet-batch-read-optimization-enabled=true
hive.pushdown-filter-enabled=true
Note
The delta catalog is a restricted catalog. This means it is not listed on the External Catalogs tab of the Denodo Lakehouse Accelerator data source.
Note
Presto on Velox Compatibility
Currently, Presto on Velox does not support operations on the Delta Lake catalog. To query Delta Lake tables, you must use the Presto Java engine (ensuring prestoOnVelox.enabled is set to false in your values.yaml).
For a detailed list of supported features in Presto on Velox engine, see the Presto On Velox section.
To query Delta Lake tables using Denodo, you have to create them using the Denodo Lakehouse Accelerator data source. This data source provides a graphical interface to explore Delta Lake tables, register them in the Denodo Lakehouse Accelerator and then create the base views in Denodo.
To graphically explore and register Delta Lake tables from Object Storage Routes follow these steps:
In Denodo, open the Denodo Lakehouse Accelerator data source.
Connect it to your Object Storage in the
Read Writetab (e.g., Amazon S3, Azure Data Lake Storage). For detailed steps, refer to the Object Storage Data in Open Table Formats section.Go to the
From Object Storage Routestab.You will then be able to explore the Delta Lake tables stored in your Object Storage.
Select the desired tables to register them in the Denodo Denodo Lakehouse Accelerator and create corresponding base views in Denodo.
Note
You can use DISCOVER_OBJECT_STORAGE_MPP_PROCEDURE stored procedure to automatize this process.
Explore Delta Lake tables¶
Important
To explore Delta Lake tables graphically, you need the Denodo subscription bundle Enterprise Plus.
Features¶
The Denodo Lakehouse Accelerator data source provides the following features when treating with Delta Lake tables:
Graphical exploration and view creation: Explore Data Lake datasets, create tables in the Denodo Lakehouse Accelerator and base views in Denodo via the
From Object Storage Routestab of the Denodo Lakehouse Accelerator data source.External Metastore Integration: Create base views over existing Delta Lake tables managed by an External Metastore via the
External Catalogstab of the Denodo Lakehouse Accelerator data source.Note: To achieve this, you have to create a new catalog in the Denodo Lakehouse Accelerator that connects to the External Metastore. This is because the predefined
deltacatalog is restricted and cannot be accessed from theExternal Catalogstab.Querying
Delta protocol version (3, 7) is supported starting with Denodo Lakehouse Accelerator 20241007
Limitations¶
When working with Delta Lake tables through the Denodo Lakehouse Accelerator, be aware of the following current limitations:
No bulk data load
No full cache mode
No remote tables
No write operations
Supported Operations by Metastore Type¶
The following table summarizes the Delta Lake operations currently supported by the Denodo Lakehouse Accelerator for various Metastore types:
Operation |
Hive Metastore |
AWS Glue Data Catalog |
Unity Catalog (*) |
|---|---|---|---|
Read |
Yes |
Yes |
Yes |
Create/Insert |
No |
No |
No |
Update |
No |
No |
No |
Merge |
No |
No |
No |
Delete |
No |
No |
No |
(*) You can read both managed and external Databricks Delta Lake tables with the Denodo Lakehouse Accelerator connected to the Unity Catalog, by enabling Iceberg reads via Uniform in Databricks.
