USER MANUALS

AWS Glue Data Catalog

In case that you already have a AWS Glue Data Catalog containing table definitions you want to access from the Denodo Embedded MPP, you can use the AWS Glue Data Catalog as an external Metastore.

To do this, the recommended way is using the presto.catalog property in values.yaml. Some examples of glue-hive, glue-iceberg and glue-delta can be found in this section.

  # -- Additional catalogs, as an example the jmx catalog provides JMX information from all nodes in the cluster.
  catalog:
    #jmx: |-
    #  connector.name=jmx
    #
    #glue-hive: |-
    #  connector.name=hive-hadoop2
    #  hive.metastore=glue
    #  hive.metastore.glue.region=
    #  hive.metastore.glue.catalogid=
    #  hive.metastore.glue.aws-access-key=
    #  hive.metastore.glue.aws-secret-key=
    #  hive.config.resources=core-site.xml
    #  hive.parquet.use-column-names=true
    #
    #glue-iceberg: |-
    #  connector.name=iceberg
    #  iceberg.catalog.type=HIVE
    #  hive.metastore=glue
    #  hive.metastore.glue.region=
    #  hive.metastore.glue.catalogid=
    #  hive.metastore.glue.aws-access-key=
    #  hive.metastore.glue.aws-secret-key=
    #  hive.config.resources=core-site.xml
    #  hive.parquet.use-column-names=true
    #
    #glue-delta: |-
    #  connector.name=delta
    #  hive.metastore=glue
    #  hive.metastore.glue.region=
    #  hive.metastore.glue.catalogid=
    #  hive.metastore.glue.aws-access-key=
    #  hive.metastore.glue.aws-secret-key=
    #  hive.config.resources=core-site.xml
    #  hive.parquet.use-column-names=true

You can also define a new catalog by creating the properties file in presto/conf/catalog/, e.g., presto/conf/catalog/glue_hive.properties. The file name, glue_hive, would be the catalog name.

Note: The recommended way to connect to the AWS Glue Data Catalog is without providing AWS credentials. Using EKS Pod Identities, IAM Roles for Service Accounts or IAM EC2 instance profile.

In case none of these three options are applicable, it is possible to use AWS credentials. In this case is necessary to provide AWS credentials in the form of access and secret key or IAM role in the catalogs of the Embedded MPP.

In the Embedded MPP configuration they need to specify both Glue and S3 credentials properties (although they can be the same). S3 credentials are needed because the MPP needs access to the S3 files:

  1. Access and secret key

    • hive.metastore.glue.aws-access-key and hive.metastore.glue.aws-secret-key

    • hive.s3.aws-access-key and hive.s3.aws-secret-key

  2. IAM Role:

    • hive.metastore.glue.iam-role

    • hive.s3.iam-role

Hive Tables

Hive catalog to read Parquet files from AWS Glue Data Catalog
connector.name=hive-hadoop2

hive.metastore=glue

# AWS region of the Glue Catalog
hive.metastore.glue.region=

# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=

# Access Key and Secret Key for Glue
# Credentials and core-site.xml are not required when the MPP
# runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts or the IAM EC2 instance profile,
# whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml


# For Bulk Data load
hive.allow-drop-table=true
hive.non-managed-table-writes-enabled=true

hive.parquet.use-column-names=true

Delta Lake Tables

Delta catalog to read Delta Lake tables from AWS Glue Data Catalog
connector.name=delta

hive.metastore=glue

# AWS region of the Glue Catalog
hive.metastore.glue.region=

# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=

# Access Key and Secret Key for Glue
# Credentials and core-site.xml are not required when the MPP
# runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts or the IAM EC2 instance profile,
# whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
#

hive.parquet.use-column-names=true

Iceberg Tables

Iceberg catalog to read Iceberg tables from AWS Glue Data Catalog
connector.name=iceberg

iceberg.catalog.type=HIVE

hive.metastore=glue

# AWS region of the Glue Catalog
hive.metastore.glue.region=

# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=

# Access Key and Secret Key for Glue
# Credentials and core-site.xml are not required when the MPP
# runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts or the IAM EC2 instance profile,
# whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
#

hive.parquet.use-column-names=true

AWS Privileges for AWS Glue Data Catalog

The AWS privileges required by the IAM role of the Denodo Embedded MPP to access the AWS Glue Data Catalog are:

  • Reading from AWS Glue:

    • glue:GetDatabases

    • glue:GetDatabase

    • glue:GetTables

    • glue:GetTable

    • glue:GetPartitions

    • glue:GetPartition

    • glue:BatchGetPartition

  • Writing to AWS Glue. Same as for reading and also:

    • glue:CreateTable

    • glue:DeleteTable

    • glue:UpdateTable

    • glue:BatchCreatePartition

    • glue:UpdatePartition

    • glue:DeletePartition

Add feedback