USER MANUALS

AWS Glue Data Catalog

If you already manage your tables within AWS Glue Data Catalog, the Denodo Embedded MPP can connect to it and use it as an external Metastore. This allows you to leverage your existing tables and access your Hive, Iceberg, and Delta Lake tables directly.

To connect to AWS Glue Data Catalog, you need to define a new catalog within your Denodo Embedded MPP configuration. The recommended method is to use the presto.catalog property in your values.yaml file. This approach simplifies management and version upgrades.

Once configured, this new catalog will be accessible from the From MPP Catalogs tab in the Denodo Embedded MPP data source, enabling you to graphically explore and create base views over your AWS Glue Data Catalog tables.

Create AWS Glue Views From MPP Catalog

Create AWS Glue Views From MPP Catalogs

Here are examples of how to define glue-hive, glue-iceberg, and glue-delta catalogs that connect to AWS Glue Data Catalog, placed within the catalog section of your values.yaml:

  # -- Additional catalogs. Uncomment and configure as needed
  catalog:
    # Example: Hive Catalog for AWS Glue
    #glue-hive: |-
    #  connector.name=hive-hadoop2
    #  hive.metastore=glue
    #  hive.metastore.glue.region=your_aws_region # e.g., us-east-1
    #  hive.metastore.glue.catalogid=your_aws_account_id # Your AWS Account ID
    #  hive.metastore.glue.aws-access-key=YOUR_ACCESS_KEY
    #  hive.metastore.glue.aws-secret-key=YOUR_SECRET_KEY
    #  hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
    #  hive.parquet.use-column-names=true
    #  hive.parquet-batch-read-optimization-enabled=true
    #  hive.pushdown-filter-enabled=true
    #  hive.quick-stats.enabled=true
    #  hive.skip-empty-files=true

    # Example: Iceberg Catalog for AWS Glue
    #glue-iceberg: |-
    #  connector.name=iceberg
    #  iceberg.catalog.type=HIVE
    #  hive.metastore=glue
    #  hive.metastore.glue.region=your_aws_region # e.g., us-east-1
    #  hive.metastore.glue.catalogid=your_aws_account_id # Your AWS Account ID
    #  hive.metastore.glue.aws-access-key=YOUR_ACCESS_KEY
    #  hive.metastore.glue.aws-secret-key=YOUR_SECRET_KEY
    #  hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
    #  hive.parquet-batch-read-optimization-enabled=true
    #  hive.pushdown-filter-enabled=true

    # Example: Delta Lake Catalog for AWS Glue
    #glue-delta: |-
    #  connector.name=delta
    #  hive.metastore=glue
    #  hive.metastore.glue.region=
    #  hive.metastore.glue.region=your_aws_region # e.g., us-east-1
    #  hive.metastore.glue.catalogid=your_aws_account_id # Your AWS Account ID
    #  hive.metastore.glue.aws-access-key=YOUR_ACCESS_KEY
    #  hive.metastore.glue.aws-secret-key=YOUR_SECRET_KEY
    #  hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
    #  hive.parquet-batch-read-optimization-enabled=true
    #  hive.pushdown-filter-enabled=true

As an alternative to using values.yaml, you can define a new catalog by creating a separate properties file directly in the presto/conf/catalog/ folder of the Embedded MPP Helm chart (e.g., presto/conf/catalog/glue_iceberg.properties). The file name, e.g., glue_iceberg, will become the catalog name in the Embedded MPP.

Supported Operations by Format

The following table summarizes the AWS Glue Data Catalog operations supported by the Denodo Embedded MPP for various table formats:

Operation

Hive

Iceberg

Delta

Read

Yes

Yes

Yes

Create/Insert

Yes (*)

No

No

Update

No

No

No

Delete

No

No

No

(*) To support write operations into Hive tables, ensure that your Hive catalog configuration includes the following property:

hive.non-managed-table-writes-enabled=true

Authentication and AWS Credentials

The recommended way to connect to AWS Glue Data Catalog from Denodo Embedded MPP is without explicitly providing AWS access and secret keys. This is achieved by leveraging AWS IAM authentication methods:

  • EKS Pod Identities

  • IAM Roles for Service Accounts (IRSA)

  • IAM EC2 Instance Profile

In scenarios where the above IAM role-based authentication methods are not applicable (e.g., outside EKS or in specific custom setups), you can explicitly provide AWS credentials. In this case, you will need to provide both Glue and S3 credentials because the Embedded MPP needs access to the S3 files where the actual data resides, in addition to the Glue metadata.

You can provide these credentials either as:

  1. Access and secret key

    • hive.metastore.glue.aws-access-key and hive.metastore.glue.aws-secret-key, for Glue access.

    • hive.s3.aws-access-key and hive.s3.aws-secret-key, for S3 file access.

  2. IAM Role (Assumed Role):

    • hive.metastore.glue.iam-role, for Glue access.

    • hive.s3.iam-role, for S3 file access.

These properties need to be added to your catalog definition, in values.yaml or a .properties file.

AWS Privileges for AWS Glue Data Catalog

The IAM role or AWS credentials used by the Denodo Embedded MPP must have the appropriate AWS privileges to access the AWS Glue Data Catalog.

  • Reading from AWS Glue. The IAM role/user needs the following minimum permissions:

    • glue:GetDatabases

    • glue:GetDatabase

    • glue:GetTables

    • glue:GetTable

    • glue:GetPartitions

    • glue:GetPartition

    • glue:BatchGetPartition

  • Writing to AWS Glue. In addition to the read permissions, the IAM role/user will also require these permissions:

    • glue:CreateTable

    • glue:DeleteTable

    • glue:UpdateTable

    • glue:BatchCreatePartition

    • glue:UpdatePartition

    • glue:DeletePartition

Note: Ensure that the associated IAM role/user also has the necessary S3 permissions (s3:GetObject, s3:PutObject, etc.) for the specific S3 buckets where your data files are stored.

Add feedback