AWS Glue Data Catalog¶
In case that you already have a AWS Glue Data Catalog containing table definitions you want to access from the Denodo Embedded MPP, you can use the AWS Glue Data Catalog as an external Metastore.
To do this, the recommended way is using the presto.catalog
property in values.yaml
.
Some examples of glue-hive
, glue-iceberg
and glue-delta
can be found in this section.
# -- Additional catalogs, as an example the jmx catalog provides JMX information from all nodes in the cluster.
catalog:
#jmx: |-
# connector.name=jmx
#
#glue-hive: |-
# connector.name=hive-hadoop2
# hive.metastore=glue
# hive.metastore.glue.region=
# hive.metastore.glue.catalogid=
# hive.metastore.glue.aws-access-key=
# hive.metastore.glue.aws-secret-key=
# hive.config.resources=core-site.xml
# hive.parquet.use-column-names=true
#
#glue-iceberg: |-
# connector.name=iceberg
# iceberg.catalog.type=HIVE
# hive.metastore=glue
# hive.metastore.glue.region=
# hive.metastore.glue.catalogid=
# hive.metastore.glue.aws-access-key=
# hive.metastore.glue.aws-secret-key=
# hive.config.resources=core-site.xml
# hive.parquet.use-column-names=true
#
#glue-delta: |-
# connector.name=delta
# hive.metastore=glue
# hive.metastore.glue.region=
# hive.metastore.glue.catalogid=
# hive.metastore.glue.aws-access-key=
# hive.metastore.glue.aws-secret-key=
# hive.config.resources=core-site.xml
# hive.parquet.use-column-names=true
You can also define a new catalog by creating the properties file in presto/conf/catalog/
,
e.g., presto/conf/catalog/glue_hive.properties
. The file name, glue_hive
, would be the catalog name.
Note: The recommended way to connect to the AWS Glue Data Catalog is without providing AWS credentials. Using EKS Pod Identities, IAM Roles for Service Accounts or IAM EC2 instance profile.
In case none of these three options are applicable, it is possible to use AWS credentials. In this case is necessary to provide AWS credentials in the form of access and secret key or IAM role in the catalogs of the Embedded MPP.
In the Embedded MPP configuration they need to specify both Glue and S3 credentials properties (although they can be the same). S3 credentials are needed because the MPP needs access to the S3 files:
Access and secret key
hive.metastore.glue.aws-access-key
andhive.metastore.glue.aws-secret-key
hive.s3.aws-access-key
andhive.s3.aws-secret-key
IAM Role:
hive.metastore.glue.iam-role
hive.s3.iam-role
Hive Tables¶
connector.name=hive-hadoop2
hive.metastore=glue
# AWS region of the Glue Catalog
hive.metastore.glue.region=
# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=
# Access Key and Secret Key for Glue
# Credentials and core-site.xml are not required when the MPP
# runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts or the IAM EC2 instance profile,
# whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
# For Bulk Data load
hive.allow-drop-table=true
hive.non-managed-table-writes-enabled=true
hive.parquet.use-column-names=true
Delta Lake Tables¶
connector.name=delta
hive.metastore=glue
# AWS region of the Glue Catalog
hive.metastore.glue.region=
# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=
# Access Key and Secret Key for Glue
# Credentials and core-site.xml are not required when the MPP
# runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts or the IAM EC2 instance profile,
# whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
#
hive.parquet.use-column-names=true
Iceberg Tables¶
connector.name=iceberg
iceberg.catalog.type=HIVE
hive.metastore=glue
# AWS region of the Glue Catalog
hive.metastore.glue.region=
# The ID of the Glue Catalog in which the metadata database resides
hive.metastore.glue.catalogid=
# Access Key and Secret Key for Glue
# Credentials and core-site.xml are not required when the MPP
# runs in EKS because it will use the EKS Pod Identities,
# IAM Roles for Service Accounts or the IAM EC2 instance profile,
# whichever is configured in EKS
hive.metastore.glue.aws-access-key=
hive.metastore.glue.aws-secret-key=
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
#
hive.parquet.use-column-names=true
AWS Privileges for AWS Glue Data Catalog¶
The AWS privileges required by the IAM role of the Denodo Embedded MPP to access the AWS Glue Data Catalog are:
Reading from AWS Glue:
glue:GetDatabases
glue:GetDatabase
glue:GetTables
glue:GetTable
glue:GetPartitions
glue:GetPartition
glue:BatchGetPartition
Writing to AWS Glue. Same as for reading and also:
glue:CreateTable
glue:DeleteTable
glue:UpdateTable
glue:BatchCreatePartition
glue:UpdatePartition
glue:DeletePartition