Catalogs¶
Catalogs in the Denodo Embedded MPP are the equivalent to data sources in Denodo Virtual Dataport.
The Denodo Embedded MPP is distributed with three predefined catalogs:
a
hive
catalog for accessing Hive tables of Parquet files, from the Embedded Hive Metastore.a
delta
catalog for accessing Delta Lake tables from the Embedded Hive Metastore.an
iceberg
catalog for accessing Iceberg tables from the Embedded Hive Metastore.
It is possible to define new catalogs. The recommended way is using the presto.catalog
property in values.yaml
.
Some examples of glue-hive
, glue-iceberg
and glue-delta
can be found in this section.
# -- Additional catalogs, as an example the jmx catalog provides JMX information from all nodes in the cluster.
catalog:
#jmx: |-
# connector.name=jmx
#
#glue-hive: |-
# connector.name=hive-hadoop2
# hive.metastore=glue
# hive.metastore.glue.region=
# hive.metastore.glue.catalogid=
# hive.metastore.glue.aws-access-key=
# hive.metastore.glue.aws-secret-key=
# hive.config.resources=core-site.xml
# hive.parquet.use-column-names=true
#
#glue-iceberg: |-
# connector.name=iceberg
# iceberg.catalog.type=HIVE
# hive.metastore=glue
# hive.metastore.glue.region=
# hive.metastore.glue.catalogid=
# hive.metastore.glue.aws-access-key=
# hive.metastore.glue.aws-secret-key=
# hive.config.resources=core-site.xml
# hive.parquet.use-column-names=true
#
#glue-delta: |-
# connector.name=delta
# hive.metastore=glue
# hive.metastore.glue.region=
# hive.metastore.glue.catalogid=
# hive.metastore.glue.aws-access-key=
# hive.metastore.glue.aws-secret-key=
# hive.config.resources=core-site.xml
# hive.parquet.use-column-names=true
You can also define new catalogs creating a catalog properties file in presto/conf/catalog/
,
e.g., presto/conf/catalog/glue_hive.properties
. The file name, glue_hive
, would be the catalog name.
Then you have to add the type of catalog to the connector.name
property - hive-hadoop2
, delta
and iceberg
are supported- and
any other properties required by the catalog type.
Below there is an example of a Hive catalog properties to read from AWS Glue Data Catalog:
connector.name=hive-hadoop2
hive.metastore=glue
# AWS region of the Glue Catalog
hive.metastore.glue.region=
# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=
# Access Key and Secret Key for Glue
# Credentials and core-site.xml are not required when the MPP
# runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts
# or the IAM EC2 instance profile, whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
# For Bulk Data load
hive.allow-drop-table=true
hive.non-managed-table-writes-enabled=true
hive.parquet.use-column-names=true
Kerberos
If the Embedded MPP must authenticate to HDFS using Kerberos, you will need to configure additional properties in the Hive catalog, adding them to the additionalConfig
property in values.yaml
:
presto:
hive:
additionalConfig: [
hive.hdfs.authentication.type=KERBEROS,
hive.hdfs.presto.principal=xxxx@REALM,
hive.hdfs.presto.keytab=/opt/secrets/xxx.keytab
]
This way the Embedded MPP connects to HDFS as the Kerberos principal hive.hdfs.presto.principal
,
using the keytab hive.hdfs.presto.keytab
.
You need to place the keytab file in the presto/secrets
folder.
You also need to place the krb5.conf
in the presto/conf/catalog/
folder. And add the following configuration property
to the values.yaml
:
presto:
jvm:
additionalJVMConfig: [
-Djava.security.krb5.conf=/opt/presto-server/etc/catalog/krb5.conf
]
If hadoop.rpc.protection=privacy
is required by the Hadoop Cluster then one more property must be added to the catalog configuration:
hive.hdfs.wire-encryption.enabled=true
You can find more information in Hive Security Configuration — Kerberos Support.