USER MANUALS

Unity Catalog

If you are using Databricks Unity Catalog to manage yout Iceberg or Delta Lake tables, you can configure the Denodo Embedded MPP to access them directly as an external Metastore.

Recommendation for Delta Lake Tables: The recommended way to access Delta Lake tables through Databricks Unity Catalog is by enabling Iceberg reads via Uniform within Databricks. Uniform is a feature that exposes Delta Lake tables with an Iceberg-compatible metadata layer, allowing the Denodo Embedded MPP’s Iceberg catalogs to read them. For instructions on enabling Uniform, please refer to the official Databricks documentation: Iceberg reads via Uniform.

Limitations when using Iceberg reads via Uniform:

  • Iceberg reads do not work on tables with deletion vectors enabled.

  • Delta tables with Iceberg reads enabled do not support VOID types.

  • All access through the Iceberg catalogs for Unity Catalog is read-only. Write operations are not supported.

To connect to Unity Catalog as an external Metastore, you must define a new catalog within your Denodo Embedded MPP configuration. The recommended method for defining new catalogs is by using the presto.catalog property in your values.yaml file. This approach simplifies management and upgrades. Once configured, this new catalog will be accessible from the From MPP Catalogs tab in the Denodo Embedded MPP data source.

Create Unity Views From MPP Catalog

Create Unity Views From MPP Catalog

Here is an example of an Iceberg catalog named unity configured to connect to Unity Catalog within your values.yaml:

  catalog:
    unity: |-
      connector.name=iceberg
      iceberg.catalog.type=rest
      iceberg.rest.uri=https://adb-xxxxxxxx.azuredatabricks.net/api/2.1/unity-catalog/iceberg-rest
      iceberg.rest.auth.type=OAUTH2
      iceberg.rest.auth.oauth2.token=YOUR_DATABRICKS_PAT
      iceberg.catalog.warehouse=external_catalog
      iceberg.hadoop.config.resources=/opt/presto-server/etc/catalog/core-site.xml
      hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
      hive.pushdown-filter-enabled=true
      hive.parquet-batch-read-optimization-enabled=true

As an alternative to using values.yaml, you can define a new catalog by creating a separate properties file directly in the presto/conf/catalog/ folder of the Embedded MPP Helm chart (e.g., presto/conf/catalog/unity.properties). The file name, unity in this example, will become the catalog name in Embedded MPP.

Properties

Property Name

Description

iceberg.rest.uri

REST API endpoint URI (required). Example: https://adb-xxxx.azuredatabricks.net/api/2.1/unity-catalog/iceberg

iceberg.rest.auth.type

The authentication type to use. Available values are NONE or OAUTH2 (default: NONE). OAUTH2 requires either a credential or token.

iceberg.rest.auth.oauth2.token

The Databricks personal access tokens (PATs)

iceberg.catalog.warehouse

The name of the Unity Catalog you want to connect to.

Note

Unity Catalog manages the metadata for your Delta Lake and Iceberg tables, but the actual data files are stored in an object storage (e.g., Azure Data Lake Storage Gen2). Depending on the object storage location accessed by Unity Catalog, you may need to provide credentials for that storage to the Denodo Embedded MPP cluster.

If your tables are stored in Azure Data Lake Storage Gen2 and requires Shared key authentication, you have to create a Kubernetes secret like this:

kubectl create secret generic mpp-credentials
--from-literal=METASTORE_DB_PASSWORD=hive
--from-literal=ABFS_STORAGE_KEY=abfsstoragekey

Additionally, you need to configure the Azure credentials in your values.yaml by setting:

objectStorage:
  azure:
    sharedKey:
      enabled: true
      account: "YOUR_ACCOUNT"

You will also need to ensure that the necessary Hadoop file system implementation properties are added to your the prestocluster/presto/conf/core-site.xml:

<property>
 <name>fs.abfs.impl</name>
 <value>org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem</value>
</property>

<property>
 <name>fs.abfss.impl</name>
 <value>org.apache.hadoop.fs.azurebfs.SecureAzureBlobFileSystem</value>
</property>

Supported Operations

The following table summarizes the operations supported by the Denodo Embedded MPP when connecting to Unity Catalog:

Operation

Delta Lake (*)

Iceberg (**)

Read

Yes

Yes

Create/Insert

No

Yes (only Databricks Managed Iceberg )

Update

No

Yes (only Databricks Managed Iceberg )

Merge

No

No

Delete

No

Yes (only Databricks Managed Iceberg )

(*) You can read both managed and external Databricks Delta Lake tables with the Embedded MPP connected to the Unity Catalog, by enabling Iceberg reads via Uniform in Databricks.

(**) You can read both managed and foreign Databricks Iceberg tables with the Embedded MPP connected to the Unity Catalog.

Add feedback