USER MANUALS

Catalogs

In the Denodo Embedded MPP, catalogs serve as the equivalent of data sources in Denodo Virtual DataPort (VDP). They define the connection parameters used to access various file and table formats and metastores.

The Denodo Embedded MPP is distributed with three predefined catalogs, each configured to connect to the Embedded Hive Metastore:

Note

These three predefined catalogs (hive, delta, and iceberg) are considered restricted catalogs. Their purpose is to facilitate the creation of tables when Denodo graphically explores datasets using the From object storage tab of the Embedded MPP data source.

Because these catalogs are only managed by Denodo, they will not be listed on the From MPP Catalogs tab of the Embedded MPP data source in Denodo.

Defining New Catalogs

Beyond the predefined catalogs, you have the flexibility to define new catalogs in the Denodo Embedded MPP. These new catalogs are required when you need to connect to external metastores (e.g., AWS Glue Data Catalog, external Hive Metastore).

These new catalogs, once defined, will be accessible from the From MPP Catalogs tab in the Denodo Embedded MPP data source, allowing you to create base views over external metastore tables.

Additional catalogs in From MPP Catalogs

Additional catalogs in From MPP Catalogs

The recommended method for defining new catalogs is by using the presto.catalog property in the values.yaml file. This approach offers several advantages:

  • It simplifies managing your catalog definitions during version upgrades of the Embedded MPP.

  • It facilitates managing different catalog configurations for various environments (e.g., development, testing, production).

  • Keeps your application configuration in one place.

Below is an example of how to define a new Iceberg catalog named glue-iceberg that connects to an AWS Glue Data Catalog:

  # -- Additional catalogs
  catalog:
    #glue-iceberg: |-
    #  connector.name=iceberg
    #  iceberg.catalog.type=HIVE
    #  hive.metastore=glue
    #  hive.metastore.glue.region=xxx # e.g., us-east-1
    #  hive.metastore.glue.catalogid=yyy # Your AWS Account ID, usually
    #  hive.metastore.glue.aws-access-key=YOUR_ACCESS_KEY
    #  hive.metastore.glue.aws-secret-key=YOUR_SECRET_KEY
    #  hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
    #  hive.parquet-batch-read-optimization-enabled=true

While less recommended for manageability, you can also define new catalogs by creating a properties file directly in the presto/conf/catalog/ folder of the Embedded MPP deployment e.g., presto/conf/catalog/glue_iceberg.properties.

Below there is an example of an Iceberg catalog to connect to AWS Glue Data Catalog:

connector.name=iceberg

hive.metastore=glue

# AWS region of the Glue Catalog
hive.metastore.glue.region=

# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=

# Access Key and Secret Key for Glue Credentials and core-site.xml are not
# required when the MPP runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts or the IAM EC2 instance profile, whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml

hive.parquet-batch-read-optimization-enabled=true
Add feedback