Hive¶
The Denodo Embedded MPP is distributed with a predefined catalog named hive
connected to the Embedded Hive Metastore for accessing Hive tables
from Parquet files.
connector.name=hive-hadoop2
# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083
hive.metastore-timeout=20s
hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
hive.parquet.use-column-names=true
hive.skip-empty-files=true
hive.parquet-batch-read-optimization-enabled=true
hive.pushdown-filter-enabled=true
hive.allow-drop-table=true
hive.non-managed-table-writes-enabled=true
Note
hive
is used only by Denodo to create tables as a result of graphical exploration of Parquet datasets on the From object storage
tab of the Embedded MPP data source.
Therefore hive
is a restricted catalogs, so it is not listed on the From MPP Catalogs
tab of the Embedded MPP data source.
Parquet¶
The Denodo Embedded MPP data source allows you to graphically explore Parquet datasets, including those using the Hive style partitioning, create tables in Embedded MPP and create base views in Denodo.

Explore Parquet files¶
Important
To graphically explore Parquet datasets you need the Denodo subscription bundle Enterprise Plus and the update 8.0u20230301.
Features¶
The Denodo Embedded MPP data source provides the following features when treating with Parquet files:
Graphically explore Parquet datasets, create tables in the MPP and base views in Denodo (
From object storage
tab of the Embedded MPP data source)Create base views over existing Hive tables in an External Metastore (
From MPP Catalogs
tab of the Embedded MPP data source).To do this create a new catalog in the Embedded MPP, because the predefined catalog named
hive
is restricted and cannot be accessed from theFrom MPP Catalogs
tab.Querying
Limitations¶
Inserts: it is not possible to insert data using bulk load into views created from Parquet files using the
From Object Storage
orFrom MPP Catalogs
tabs of the Embedded MPP data source.But it is possible to add single rows executing the SQL statement
INSERT
or multiple rows using direct load .Note
Bulk Load: A method for efficiently move large amounts of data into a database when the source view is not in the Embedded MPP. First generates Parquet files containing the data to be inserted in Parquet format, then uploads those files to the specific path,
HDFS URI
configured in theWrite settings
section of the Embedded MPP data source.Direct Load: A method for loading data directly into the database by executing the
INSERT INTO table <SELECT QUERY>
query which is delegable to the Embedded MPP data source. That is, the source and target views are both in Embedded MPP.
Supported Operation by Metastore Type¶
Operation |
Embedded Hive Metastore |
External Hive Metastore |
AWS Glue Data Catalog |
---|---|---|---|
Read |
Yes |
Yes |
Yes |
Create/Insert (*) |
Yes |
Yes |
Yes |
Update |
No |
No |
No |
Delete |
No |
No |
No |
(*) To support write operations, make sure that Hive type catalogs have the following property:
hive.non-managed-table-writes-enabled=true
Kerberos¶
If the Parquet files are located in an HDFS with Kerberos, you need to configure Kerberos-related properties in the hive
catalog,
adding them to the additionalConfig
property in values.yaml
:
presto:
hive:
additionalConfig: [
hive.hdfs.authentication.type=KERBEROS,
hive.hdfs.presto.principal=xxxx@REALM,
hive.hdfs.presto.keytab=/opt/secrets/xxx.keytab
]
This way the Embedded MPP will connect to HDFS as the Kerberos principal hive.hdfs.presto.principal
,
using the keytab hive.hdfs.presto.keytab
.
You also need to place the keytab file in the presto/secrets
folder, the krb5.conf
in the presto/conf/catalog/
folder and
add the following property to the values.yaml
:
presto:
jvm:
additionalJVMConfig: [
-Djava.security.krb5.conf=/opt/presto-server/etc/catalog/krb5.conf
]
If hadoop.rpc.protection=privacy
is required by the Hadoop Cluster then one more property must be added to the catalog configuration:
hive.hdfs.wire-encryption.enabled=true
You can find more information in Hive Security Configuration — Kerberos Support.
Avro, CSV, ORC¶
The Denodo Embedded MPP data source does not allow you to graphically explore datasets in file formats other than Parquet, such as Avro, CSV or ORC. But you can do it manually.
For this you have to create the Hive tables in the Denodo Embedded MPP using a JDBC client.
And then create the Denodo base views over those Hive tables using From MPP Catalogs
tab of the Embedded MPP data source.
The steps to follow are:
Create a new catalog in the Embedded MPP because the predefined catalog named
hive
is not accessible from theFrom MPP Catalogs
tab of the Embedded MPP data source.The recommended ways is using the
presto.catalog
property invalues.yaml
. This new catalog will be accessed from theFrom MPP Catalogs
tab.Below there is an example of a Hive catalog properties,
hive-orc
, connected to the Embedded Hive Metastore to read Avro, CSV or ORC datasets:# -- Additional catalogs catalog: #hive-orc: |- # connector.name=hive-hadoop2 # hive.metastore.uri=thrift://hive-metastore:9083 # hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml # hive.allow-drop-table=true # hive.non-managed-table-writes-enabled=true
Create the Hive table in the Embedded MPP with a JDBC client. For simple SQL queries like this you can use the Presto UI, as shown below. This SQL query editor has the limitation of returning only 100 records maximum per query.
Create Hive table from ORC datasets¶
Create the Denodo base view on the previously created Hive table using
From MPP Catalogs
tab of the Embedded MPP data source.