USER MANUALS

Delta Lake

Delta Lake is table format that extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling.

The Denodo Embedded MPP is distributed with a predefined catalog named delta connected to the Embedded Hive Metastore for accessing Delta Lake tables.

delta.properties
connector.name=delta

# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083

To query Delta Lake tables you have to manually register those tables in the Embedded MPP’s Metastore through the CREATE TABLE sentence. Since the schema and the data file list are located in the Delta Log at the table’s location, you need to provide a dummy column as the schema of the Delta Lake table, to avoid the no columns error in the Metastore:

Register a Delta Lake table in the MPP
CREATE TABLE delta.default.orders (
   dummy bigint
) WITH (
  external_location = 'abfs://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>',
  format = 'PARQUET'
);

The WITH clause of the CREATE TABLE can also be used to set other properties on the table. See Delta Lake Tables Properties.

Once the Delta Lake table is registered, you can use the embedded data source in Denodo to create a base view on top of the table using the From MPP Catalogs tab.

Explore Delta Lake tables

Explore Delta Lake tables

Query Delta Lakes tables directly

Another option is to query the table directly using the table location as the table name without registering it in the Metastore.

Query a Delta Lake table from MPP
SELECT * FROM
   delta."$path$"."abfs://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>";

Features

The Denodo Embedded MPP provides the following features when treating with Delta Lake tables:

  • Create base views over existing Delta Lake tables in the Embedded or External Metastore (From MPP Catalogs tab of the Embedded MPP data source)

  • Querying

Note

Delta protocol version (3, 7) is supported from Denodo Embedded MPP 20241007

Limitations

  • Graphically explore Delta Lake datasets, create tables in the MPP and base views in Denodo (From object storage tab of the Embedded MPP data source)

  • Bulk data load

  • Caching: full cache mode

  • Remote tables

Supported Operation by Metastore Type

Operation

Embedded Hive Metastore

External Hive Metastore

AWS Glue Data Catalog

Read

Yes

Yes

Yes

Create/Insert

No

No

No

Update

No

No

No

Delete

No

No

No

Add feedback