USER MANUALS

Delta Lake

Delta Lake is an open table format that extends Parquet data files with a file-based transaction log. This provides ACID transactions and scalable metadata handling for your data lakes.

The Denodo Embedded MPP includes a predefined catalog named delta connected to the Embedded Hive Metastore, enabling you to access Delta Lake tables immediately.

Here are the default properties for the delta catalog:

connector.name=delta

# connection to the Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083
hive.metastore-timeout=20s

hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml

hive.parquet-batch-read-optimization-enabled=true
hive.pushdown-filter-enabled=true

Note

The delta catalog is a restricted catalog. This means it is not listed on the From MPP Catalogs tab of the Embedded MPP data source.

To query Delta Lake tables using Denodo, you have to create them using the Embedded MPP data source. This data source provides a graphical interface to explore Delta Lake tables, register them in the Embedded MPP and then create the base views in Denodo.

To graphically explore and register Delta Lake tables from Object Storage follow these steps:

  1. In Denodo, open the Embedded MPP data source.

  2. Connect it to your Object Storage in the Read Write tab (e.g., Amazon S3, Azure Data Lake Storage). For detailed steps, refer to the Object Storage Data in Open Table Formats section.

  3. Go to the From object storage tab.

  4. You will then be able to explore the Delta Lake tables stored in your Object Storage.

  5. Select the desired tables to register them in the Denodo Embedded MPP and create corresponding base views in Denodo.

Explore Delta Lake tables

Explore Delta Lake tables

Important

To explore Delta Lake tables graphically, you need the Denodo subscription bundle Enterprise Plus.

Features

The Denodo Embedded MPP data source provides the following features when treating with Delta Lake tables:

  • Graphical exploration and view creation: Explore Data Lake datasets, create tables in the MPP and base views in Denodo via the From object storage tab of the Embedded MPP data source.

  • External Metastore Integration: Create base views over existing Delta Lake tables managed by an External Metastore via the From MPP Catalogs tab of the Embedded MPP data source.

    Note: To achieve this, you have to create a new catalog in the Embedded MPP that connects to the External Metastore. This is because the predefined delta catalog is restricted and cannot be accessed from the From MPP Catalogs tab.

  • Querying

    • Delta protocol version (3, 7) is supported starting with Denodo Embedded MPP 20241007

  • Embedded MPP Acceleration

Limitations

When working with Delta Lake tables through the Denodo Embedded MPP, be aware of the following current limitations:

  • No bulk data load

  • No full cache mode

  • No remote tables

  • No write operations

Supported Operations by Metastore Type

The following table summarizes the Delta Lake operations currently supported by the Denodo Embedded MPP for various Metastore types:

Operation

Hive Metastore

AWS Glue Data Catalog

Unity Catalog (*)

Read

Yes

Yes

Yes

Create/Insert

No

No

No

Update

No

No

No

Delete

No

No

No

(*) You can read both managed and external Databricks Delta Lake tables with the Embedded MPP connected to the Unity Catalog, by enabling Iceberg reads via Uniform in Databricks.

Add feedback