USER MANUALS

Delta Lake

Delta Lake is table format that extends Parquet data files with a file-based transaction log for ACID transactions and scalable metadata handling.

The Denodo Embedded MPP allows reading data stored in Delta Lake tables. For this, it needs a Metastore as a metadata catalog, that can be the Embedded Hive Metastore or an External Metastore. And a catalog of type delta.

delta.properties
connector.name=delta

# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083

To query Delta Lake tables you have to manually register those tables in the Embedded MPP’s Metastore through the CREATE TABLE sentence. Since the schema and the data file list are located in the Delta Log at the table’s location, you need to provide a dummy column as the schema of the Delta Lake table, to avoid the no columns error in the Metastore:

Register a Delta Lake table in the MPP
CREATE TABLE delta.default.orders (
   dummy bigint
) WITH (
  external_location = 'abfs://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>',
  format = 'PARQUET'
);

The WITH clause of the CREATE TABLE can also be used to set other properties on the table. See Delta Lake Tables Properties.

Once the Delta Lake table is registered, you can use the embedded data source in Denodo to create a Denodo base view on top of the table using the From MPP Catalogs tab.

Explore Delta Lake tables

Explore Delta Lake tables

Query Delta Lakes tables directly

Another option is to query the table directly using the table location as the table name without registering it in the Metastore.

Query a Delta Lake table from MPP
SELECT * FROM
   delta."$path$"."abfs://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>";

Features

The Denodo Embedded MPP provides the following features when treating with Delta Lake tables:

  • Create base views over existing Delta Lake tables in the Metastore (Embedded or External)

  • Querying

  • Embedded MPP Acceleration

Limitations

  • Create base views over data stored in Delta Lake format

  • Bulk data load

  • Caching: full cache mode

  • Remote tables

Add feedback