USER MANUALS

External Hive Metastore

If you already have an existing Hive Metastore that contains table definitions for Hive, Iceberg, or Delta Lake tables, and you wish to access these definitions directly from the Denodo Lakehouse Accelerator, you can configure your Denodo Lakehouse Accelerator to use that Hive Metastore as an external Metastore. This allows you to leverage your existing data catalog without having to recreate table definitions.

To connect to an external Hive Metastore, you must define a new catalog within your Denodo Lakehouse Accelerator configuration.

The recommended method for defining new catalogs is by using the presto.catalog property in your values.yaml file. This approach simplifies management and upgrades. Once configured, this new catalog will become accessible from the From MPP Catalogs tab in the Denodo Lakehouse Accelerator data source.

Create external Hive Metastore Views From MPP Catalog

Create external Hive Metastore Views From MPP Catalog

Here is an example of a Hive catalog named external_hivems configured to connect to an external Hive Metastore within your values.yaml:

  catalog:
    external_hivems: |-
      connector.name=hive-hadoop2
      hive.metastore.uri=thrift://<external Hive Metastore host>:<external Hive Metastore port>
      hive.config.resources=/opt/presto-server/etc/catalog/core-site-external.xml
      hive.copy-on-first-write-configuration-enabled=false
      hive.parquet-batch-read-optimization-enabled=true
      hive.parquet.use-column-names=true
      hive.pushdown-filter-enabled=true
      hive.quick-stats.enabled=true
      hive.skip-empty-files=true
      hive.allow-drop-table=true
      hive.non-managed-table-writes-enabled=true

Properties

Property Name

Description

hive.metastore.uri

URI of the external Hive Metastore: thrift://<external Hive Metastore host>:<external Hive Metastore port>

hive.config.resources

The path to your Hadoop configuration files. You need to place these XML files, hdfs-site.xml and core-site.xml, into the presto/conf/catalog folder of your Denodo Lakehouse Accelerator Helm chart, and then reference their paths here, e.g.: /opt/presto-server/etc/catalog/core-site-external.xml, /opt/presto-server/etc/catalog/hdfs-site-external.xml.

While defining catalogs in values.yaml is preferred for easier upgrades and centralized management, you can also define new catalogs by creating a separate .properties file directly in the presto/conf/catalog/ folder of your Denodo Lakehouse Accelerator Helm chart (e.g., presto/conf/catalog/external_hivems.properties).

As an example:

connector.name=hive-hadoop2

hive.metastore.uri=thrift://<external Hive Metastore host>:<external Hive Metastore port>
hive.config.resources=/opt/presto-server/etc/catalog/core-site-external.xml,/opt/presto-server/etc/catalog/hdfs-site-external.xml

# Write operations
hive.allow-drop-table=true
hive.non-managed-table-writes-enabled=true

# Performance tuning
hive.parquet-batch-read-optimization-enabled=true
hive.pushdown-filter-enabled=true
hive.quick-stats.enabled=true
hive.skip-empty-files=true

hive.copy-on-first-write-configuration-enabled=false
hive.parquet.use-column-names=true

Supported Operations by Format

The following table summarizes the external Hive Metastore operations supported by the Denodo Lakehouse Accelerator for various table formats:

Operation

Hive

Iceberg

Delta

Read

Yes

Yes

Yes

Create/Insert

Yes (*)

No

No

Update

No

No

No

Merge

No

No

No

Delete

No

No

No

(*) To support write operations into Hive tables, ensure that your Hive catalog configuration includes the following property:

hive.non-managed-table-writes-enabled=true

Kerberos

If the external Hive Metastore or the underlying HDFS uses Kerberos authentication, you have to configure additional properties to enable secure connections from the Denodo Lakehouse Accelerator.

  1. Configure Kerberos properties for the external Hive Metastore in values.yaml. Add the following properties to the presto.catalog section:

    catalog:
      external_hivems: |-
        connector.name=hive-hadoop2
        hive.metastore.uri=thrift://<external Hive Metastore host>:<external Hive Metastore port>
        ...
        hive.metastore.authentication.type=KERBEROS
        hive.metastore.service.principal=hive/_HOST@REALM
        hive.metastore.client.principal=primary@REALM
        hive.metastore.client.keytab=/opt/secrets/xxx.keytab
    

    This configuration ensures that the Denodo Lakehouse Accelerator authenticates to the external Hive Metastore using the Kerberos principal specified by hive.metastore.client.principal and its corresponding keytab file. It also verifies the identity of the Hive Metastore service against hive.metastore.service.principal.

  2. Configure Kerberos properties for HDFS access values.yaml (if required).

    If, in addition to the Hive Metastore, the Denodo Lakehouse Accelerator must also authenticate to HDFS using Kerberos (where your data files reside), configure these properties in the presto.catalog section:

    catalog:
      external_hivems: |-
        connector.name=hive-hadoop2
        hive.metastore.uri=thrift://<external Hive Metastore host>:<external Hive Metastore port>
        ...
        hive.hdfs.authentication.type=KERBEROS
        hive.hdfs.presto.principal=primary@REALM
        hive.hdfs.presto.keytab=/opt/secrets/xxx.keytab
    

    This ensures the Denodo Lakehouse Accelerator connects to HDFS using the Kerberos principal hive.hdfs.presto.principal, using the keytab hive.hdfs.presto.keytab.

    Note: If you use the same principal for both hive.metastore.client.principal and hive.hdfs.presto.principal, ensure that this principal has the necessary permissions to access both the external Hive Metastore and the HDFS filesystem. Otherwise, you may get Permission denied errors.

  3. Kerberos Files:

    • Place the keytab file in the presto/secrets folder within your Denodo Lakehouse Accelerator Helm chart.

    • Place the krb5.conf file in the presto/conf/catalog/ folder.

    • Add the following property to the values.yaml to inform Java about the krb5.conf location:

    presto:
      jvm:
        additionalJVMConfig: [
          -Djava.security.krb5.conf=/opt/presto-server/etc/catalog/krb5.conf
        ]
    
  4. Enable HDFS Wire Encryption (if required):

    If the HDFS cluster has HDFS wire encryption enabled, you must add one more property to the presto.catalog section in values.yaml:

    catalog:
      external_hivems: |-
        connector.name=hive-hadoop2
        hive.metastore.uri=thrift://<external Hive Metastore host>:<external Hive Metastore port>
        ...
    
        hive.hdfs.wire-encryption.enabled=true
    

For more detailed information on Hive Security Configuration and Kerberos support, you can refer to Hive Security Configuration — Kerberos Support.

Add feedback