Iceberg¶
Apache Iceberg is a high-performance table format for large analytic datasets. Iceberg tables allow schema evolution, partition evolution and table version rollback, without the need to rewrite or migrate tables.
The Denodo Embedded MPP is distributed with a predefined catalog named iceberg
connected to the Embedded Hive Metastore for accessing Iceberg tables.
connector.name=iceberg
# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083
To query Iceberg tables you have to create those tables in the Denodo Embedded MPP. You can use the embedded data source in Denodo to graphically explore Iceberg tables, register them in the Denodo Embedded MPP and create the base views in Denodo.
Important
To graphically explore Iceberg tables you need the Denodo subscription bundle Enterprise Plus. You can check Object Storage Data in Parquet, Delta and Iceberg format for more details on how to connect an Object Storage graphically.
Note
Iceberg is a restricted catalog, so it is not listed on the From MPP Catalogs
tab of the Embedded MPP data source.
Iceberg and the Embedded MPP Iceberg connector support time travel via table snapshots identified by unique snapshot IDs.
The snapshot IDs are stored in the $snapshots
metadata table. You can rollback the state of a table to a previous snapshot ID
using the ROLLBACK_ICEBERG_VIEW_TO_SNAPSHOT and GET_ICEBERG_VIEW_SNAPSHOTS stored procedures.
Features¶
The Denodo Embedded MPP provides the following features when treating with Iceberg tables:
Graphically explore Iceberg datasets, create tables in the MPP and base views in Denodo (
From object storage
tab of the Embedded MPP data source)Create base views over existing Iceberg tables in an External Metastore (
From MPP Catalogs
tab of the Embedded MPP data source)Querying
Limitations¶
Inserts: it is not possible to insert data into views created from Iceberg tables using the
From Object Storage
tab of the Embedded MPP data source. See Manage Views Created from Object Storage for details.
Connecting to a Polaris Internal Catalog¶
The Polaris Catalog is an open-source catalog for Iceberg that implements Iceberg’s open REST Catalog specification. The goal of Polaris Catalog is to create a shared data layer that enables multiple engines to read and write to the same data sets, including Presto, Spark, Snowflake, etc. The Embedded MPP can connect to Polaris internal catalogs to read and write Iceberg tables.
You can define a new catalog to connect to Polaris for accessing Iceberg tables by adding a catalog properties file in prestocluster\presto\conf\catalog
:
connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest.uri=https://xxxxx.snowflakecomputing.com/polaris/api/catalog
iceberg.rest.auth.type=OAUTH2
iceberg.rest.auth.oauth2.token=<generatedToken>
iceberg.catalog.warehouse=your_catalog_name
iceberg.catalog.type: rest, since the Polaris catalog implements the Iceberg’s open REST Catalog specification.
iceberg.rest.uri: Polaris catalog endpoint.
iceberg.rest.auth.oauth2.token The token to use for OAUTH2 authentication. You can obtain it with the following request:
curl -i -X POST https://xxxxxx.snowflakecomputing.com/polaris/api/catalog/v1/oauth/tokens -d 'grant_type=client_credentials&client_id=<client_id>=&client_secret=<client_secret>&scope=PRINCIPAL_ROLE:ALL'
where
client_id
andclient_secret
are the OAuth2 credentials returned by Polaris when creating a new principal.iceberg.catalog.warehouse The name of the Polaris catalog we want to connect.
Troubleshooting¶
ERROR: Presto or VDP receives an error: Invalid namespace
- Cause
The OAuth2 token set in the Iceberg connector configuration is expired.
- Solution
Regenerate the token by following the instructions in Connecting to a Polaris internal catalog section
ERROR: Presto or VDP receives an error Request was not authenticated
- Cause
The OAuth2 token set in the Iceberg connector configuration is expired.
- Solution
Regenerate the token by following the instructions in Connecting to a Polaris internal catalog section