USER MANUALS

Delta Lake

Delta Lake is an open table format that extends Parquet data files with a file-based transaction log. This provides ACID transactions and scalable metadata handling for your data lakes.

The Denodo Lakehouse Accelerator includes a predefined catalog named delta connected to the Embedded Hive Metastore, enabling you to access Delta Lake tables immediately.

Here are the default properties for the delta catalog:

connector.name=delta

# connection to the Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083
hive.metastore-timeout=20s

hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml

hive.parquet-batch-read-optimization-enabled=true
hive.pushdown-filter-enabled=true

Note

The delta catalog is a restricted catalog. This means it is not listed on the External Catalogs tab of the Denodo Lakehouse Accelerator data source.

Note

Presto on Velox Compatibility

Currently, Presto on Velox does not support operations on the Delta Lake catalog. To query Delta Lake tables, you must use the Presto Java engine (ensuring prestoOnVelox.enabled is set to false in your values.yaml).

For a detailed list of supported features in Presto on Velox engine, see the Presto On Velox section.

To query Delta Lake tables using Denodo, you have to create them using the Denodo Lakehouse Accelerator data source. This data source provides a graphical interface to explore Delta Lake tables, register them in the Denodo Lakehouse Accelerator and then create the base views in Denodo.

To graphically explore and register Delta Lake tables from Object Storage Routes follow these steps:

  1. In Denodo, open the Denodo Lakehouse Accelerator data source.

  2. Connect it to your Object Storage in the Read Write tab (e.g., Amazon S3, Azure Data Lake Storage). For detailed steps, refer to the Object Storage Data in Open Table Formats section.

  3. Go to the From Object Storage Routes tab.

  4. You will then be able to explore the Delta Lake tables stored in your Object Storage.

  5. Select the desired tables to register them in the Denodo Denodo Lakehouse Accelerator and create corresponding base views in Denodo.

Note

You can use DISCOVER_OBJECT_STORAGE_MPP_PROCEDURE stored procedure to automatize this process.

Explore Delta Lake tables

Explore Delta Lake tables

Important

To explore Delta Lake tables graphically, you need the Denodo subscription bundle Enterprise Plus.

Features

The Denodo Lakehouse Accelerator data source provides the following features when treating with Delta Lake tables:

  • Graphical exploration and view creation: Explore Data Lake datasets, create tables in the Denodo Lakehouse Accelerator and base views in Denodo via the From Object Storage Routes tab of the Denodo Lakehouse Accelerator data source.

  • External Metastore Integration: Create base views over existing Delta Lake tables managed by an External Metastore via the External Catalogs tab of the Denodo Lakehouse Accelerator data source.

    Note: To achieve this, you have to create a new catalog in the Denodo Lakehouse Accelerator that connects to the External Metastore. This is because the predefined delta catalog is restricted and cannot be accessed from the External Catalogs tab.

  • Querying

    • Delta protocol version (3, 7) is supported starting with Denodo Lakehouse Accelerator 20241007

  • Embedded MPP Acceleration

Limitations

When working with Delta Lake tables through the Denodo Lakehouse Accelerator, be aware of the following current limitations:

  • No bulk data load

  • No full cache mode

  • No remote tables

  • No write operations

Supported Operations by Metastore Type

The following table summarizes the Delta Lake operations currently supported by the Denodo Lakehouse Accelerator for various Metastore types:

Operation

Hive Metastore

AWS Glue Data Catalog

Unity Catalog (*)

Read

Yes

Yes

Yes

Create/Insert

No

No

No

Update

No

No

No

Merge

No

No

No

Delete

No

No

No

(*) You can read both managed and external Databricks Delta Lake tables with the Denodo Lakehouse Accelerator connected to the Unity Catalog, by enabling Iceberg reads via Uniform in Databricks.

Add feedback