USER MANUALS

Catalogs

Catalogs in the Denodo Embedded MPP are the equivalent to data sources in Denodo Virtual Dataport.

The Denodo Embedded MPP is distributed with three predefined catalogs:

It is possible to define new catalogs. The recommended way is using the presto.catalog property in values.yaml. Some examples of glue-hive, glue-iceberg and glue-delta can be found in this section.

  # -- Additional catalogs, as an example the jmx catalog provides JMX information from all nodes in the cluster.
  catalog:
    #jmx: |-
    #  connector.name=jmx
    #
    #glue-hive: |-
    #  connector.name=hive-hadoop2
    #  hive.metastore=glue
    #  hive.metastore.glue.region=
    #  hive.metastore.glue.catalogid=
    #  hive.metastore.glue.aws-access-key=
    #  hive.metastore.glue.aws-secret-key=
    #  hive.config.resources=core-site.xml
    #  hive.parquet.use-column-names=true
    #
    #glue-iceberg: |-
    #  connector.name=iceberg
    #  iceberg.catalog.type=HIVE
    #  hive.metastore=glue
    #  hive.metastore.glue.region=
    #  hive.metastore.glue.catalogid=
    #  hive.metastore.glue.aws-access-key=
    #  hive.metastore.glue.aws-secret-key=
    #  hive.config.resources=core-site.xml
    #  hive.parquet.use-column-names=true
    #
    #glue-delta: |-
    #  connector.name=delta
    #  hive.metastore=glue
    #  hive.metastore.glue.region=
    #  hive.metastore.glue.catalogid=
    #  hive.metastore.glue.aws-access-key=
    #  hive.metastore.glue.aws-secret-key=
    #  hive.config.resources=core-site.xml
    #  hive.parquet.use-column-names=true

You can also define new catalogs creating a catalog properties file in presto/conf/catalog/, e.g., presto/conf/catalog/glue_hive.properties. The file name, glue_hive, would be the catalog name. Then you have to add the type of catalog to the connector.name property - hive-hadoop2, delta and iceberg are supported- and any other properties required by the catalog type.

Below there is an example of a Hive catalog properties to read from AWS Glue Data Catalog:

connector.name=hive-hadoop2

hive.metastore=glue

# AWS region of the Glue Catalog
hive.metastore.glue.region=

# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=

# Access Key and Secret Key for Glue
# Credentials and core-site.xml are not required when the MPP
# runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts
# or the IAM EC2 instance profile, whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml


# For Bulk Data load
hive.allow-drop-table=true
hive.non-managed-table-writes-enabled=true

hive.parquet.use-column-names=true

Kerberos

If the Embedded MPP must authenticate to HDFS using Kerberos, you will need to configure additional properties in the Hive catalog, adding them to the additionalConfig property in values.yaml:

HDFS Kerberos configuration in values.yaml
presto:
hive:
  additionalConfig: [
    hive.hdfs.authentication.type=KERBEROS,
    hive.hdfs.presto.principal=xxxx@REALM,
    hive.hdfs.presto.keytab=/opt/secrets/xxx.keytab
  ]

This way the Embedded MPP connects to HDFS as the Kerberos principal hive.hdfs.presto.principal, using the keytab hive.hdfs.presto.keytab.

You need to place the keytab file in the presto/secrets folder. You also need to place the krb5.conf in the presto/conf/catalog/ folder. And add the following configuration property to the values.yaml:

krb5.conf configuration
presto:
  jvm:
    additionalJVMConfig: [
      -Djava.security.krb5.conf=/opt/presto-server/etc/catalog/krb5.conf
    ]

If hadoop.rpc.protection=privacy is required by the Hadoop Cluster then one more property must be added to the catalog configuration:

Enable HDFS wire encryption in vaslues.yaml
hive.hdfs.wire-encryption.enabled=true

You can find more information in Hive Security Configuration — Kerberos Support.

Add feedback