USER MANUALS

Iceberg

Apache Iceberg is a high-performance table format for large analytic datasets. Iceberg tables allow schema evolution, partition evolution and table version rollback, without the need to rewrite or migrate tables.

The Denodo Embedded MPP is distributed with a predefined catalog named iceberg connected to the Embedded Hive Metastore for accessing Iceberg tables.

iceberg.properties
connector.name=iceberg

# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083

Starting with Denodo Embedded MPP 8.0.20240506, Iceberg tables for which table data and metadata already exist in the Object Storage can be registered with the catalog using the register_table procedure by supplying the target schema, the desired table name and the location of the table metadata:

Register existing Iceberg table
CALL iceberg.system.register_table(
   'default',
   'denodo_iceberg_table',
   's3a://bucket/path/to/iceberg/table/')

Note: The S3 URI protocol used in the location field of the Iceberg table metadata must be the same as the one indicated in the register statement

Once the Iceberg table is registered, you can use the embedded data source in Denodo to create a Denodo base view on top of the table using the From MPP Catalogs tab.

Explore Iceberg tables

Explore Iceberg tables

Iceberg and the Embedded MPP Iceberg connector support time travel via table snapshots identified by unique snapshot IDs. The snapshot IDs are stored in the $snapshots metadata table. You can rollback the state of a table to a previous snapshot ID using the ROLLBACK_ICEBERG_VIEW_TO_SNAPSHOT and GET_ICEBERG_VIEW_SNAPSHOTS stored procedures. These stored procedure have been included in the 8.0u20240306 update.

Features

The Denodo Embedded MPP provides the following features when treating with Iceberg tables:

  • Create base views over existing Iceberg tables in the Embedded or External Metastore (From MPP Catalogs tab of the Embedded MPP data source)

  • Querying

  • Embedded MPP Acceleration

Limitations

  • Graphically explore Iceberg datasets, create tables in the MPP and base views in Denodo (From object storage tab of the Embedded MPP data source)

  • Bulk data load

  • Caching: full cache mode

  • Remote tables

Supported Operation by Metastore Type

Operation

Embedded Hive Metastore

External Hive Metastore

AWS Glue Data Catalog

Read

Yes

Yes

Yes

Create/Insert

No

No

No

Update

No

No

No

Merge

No

No

No

Delete

No

No

No

Add feedback