USER MANUALS

Iceberg

Apache Iceberg is a high-performance table format for large analytic datasets. Iceberg tables allow schema evolution, partition evolution and table version rollback, without the need to rewrite or migrate tables.

The Denodo Embedded MPP is distributed with a predefined catalog named iceberg connected to the Embedded Hive Metastore for accessing Iceberg tables.

iceberg.properties
connector.name=iceberg

# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083

To query Iceberg tables you have to create those tables in the Denodo Embedded MPP. You can use the embedded data source in Denodo to graphically explore Iceberg tables, register them in the Denodo Embedded MPP and create the base views in Denodo.

Explore Iceberg tables

Explore Iceberg tables

Important

To graphically explore Iceberg tables you need the Denodo subscription bundle Enterprise Plus. You can check Object Storage Data in Parquet, Delta and Iceberg format for more details on how to connect an Object Storage graphically.

Note

Iceberg is a restricted catalog, so it is not listed on the From MPP Catalogs tab of the Embedded MPP data source.

Iceberg and the Embedded MPP Iceberg connector support time travel via table snapshots identified by unique snapshot IDs. The snapshot IDs are stored in the $snapshots metadata table. You can rollback the state of a table to a previous snapshot ID using the ROLLBACK_ICEBERG_VIEW_TO_SNAPSHOT and GET_ICEBERG_VIEW_SNAPSHOTS stored procedures.

Features

The Denodo Embedded MPP provides the following features when treating with Iceberg tables:

Limitations

  • Inserts: it is not possible to insert data into views created from Iceberg tables using the From Object Storage tab of the Embedded MPP data source. See Manage Views Created from Object Storage for details.

Connecting to a Polaris Internal Catalog

The Polaris Catalog is an open-source catalog for Iceberg that implements Iceberg’s open REST Catalog specification. The goal of Polaris Catalog is to create a shared data layer that enables multiple engines to read and write to the same data sets, including Presto, Spark, Snowflake, etc. The Embedded MPP can connect to Polaris internal catalogs to read and write Iceberg tables.

You can define a new catalog to connect to Polaris for accessing Iceberg tables by adding a catalog properties file in prestocluster\presto\conf\catalog:

Polaris catalog config
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest.uri=https://xxxxx.snowflakecomputing.com/polaris/api/catalog
iceberg.rest.auth.type=OAUTH2
iceberg.rest.auth.oauth2.token=<generatedToken>
iceberg.catalog.warehouse=your_catalog_name
  • iceberg.catalog.type: rest, since the Polaris catalog implements the Iceberg’s open REST Catalog specification.

  • iceberg.rest.uri: Polaris catalog endpoint.

  • iceberg.rest.auth.oauth2.token The token to use for OAUTH2 authentication. You can obtain it with the following request:

    curl -i -X POST https://xxxxxx.snowflakecomputing.com/polaris/api/catalog/v1/oauth/tokens
    -d 'grant_type=client_credentials&client_id=<client_id>=&client_secret=<client_secret>&scope=PRINCIPAL_ROLE:ALL'
    

    where client_id and client_secret are the OAuth2 credentials returned by Polaris when creating a new principal.

  • iceberg.catalog.warehouse The name of the Polaris catalog we want to connect.

Troubleshooting

ERROR: Presto or VDP receives an error: Invalid namespace

Cause

The OAuth2 token set in the Iceberg connector configuration is expired.

Solution

Regenerate the token by following the instructions in Connecting to a Polaris internal catalog section

ERROR: Presto or VDP receives an error Request was not authenticated

Cause

The OAuth2 token set in the Iceberg connector configuration is expired.

Solution

Regenerate the token by following the instructions in Connecting to a Polaris internal catalog section

Add feedback