Iceberg¶
Apache Iceberg is a high-performance table format for large analytic datasets. Iceberg tables allow schema evolution, partition evolution and table version rollback, without the need to rewrite or migrate tables.
The Denodo Embedded MPP is distributed with a predefined catalog named iceberg
connected to the Embedded Hive Metastore for accessing Iceberg tables.
connector.name=iceberg
# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083
Starting with Denodo Embedded MPP 8.0.20240506, Iceberg tables for which table data and metadata already exist in the Object Storage can be registered with the catalog using the register_table procedure by supplying the target schema, the desired table name and the location of the table metadata:
CALL iceberg.system.register_table(
'default',
'denodo_iceberg_table',
's3a://bucket/path/to/iceberg/table/')
Note: The S3 URI protocol used in the location
field of the Iceberg table metadata must be the same as the one indicated in the register statement
Once the Iceberg table is registered, you can use the embedded data source in Denodo to create a Denodo base view on top of
the table using the From MPP Catalogs
tab.

Explore Iceberg tables¶
Iceberg and the Embedded MPP Iceberg connector support time travel via table snapshots identified by unique snapshot IDs.
The snapshot IDs are stored in the $snapshots
metadata table.
You can rollback the state of a table to a previous snapshot ID
using the ROLLBACK_ICEBERG_VIEW_TO_SNAPSHOT and GET_ICEBERG_VIEW_SNAPSHOTS stored procedures.
These stored procedure have been included in the 8.0u20240306 update.
Features¶
The Denodo Embedded MPP provides the following features when treating with Iceberg tables:
Create base views over existing Iceberg tables in the Embedded or External Metastore (
From MPP Catalogs
tab of the Embedded MPP data source)Querying
Limitations¶
Graphically explore Iceberg datasets, create tables in the MPP and base views in Denodo (
From object storage
tab of the Embedded MPP data source)Bulk data load
Caching: full cache mode
Remote tables
Supported Operation by Metastore Type¶
Operation |
Embedded Hive Metastore |
External Hive Metastore |
AWS Glue Data Catalog |
---|---|---|---|
Read |
Yes |
Yes |
Yes |
Create/Insert |
No |
No |
No |
Update |
No |
No |
No |
Merge |
No |
No |
No |
Delete |
No |
No |
No |