Hive¶
The Denodo Embedded MPP is distributed with a predefined catalog named hive
connected to the Embedded Hive Metastore for accessing Hive tables
from Parquet files.
connector.name=hive-hadoop2
# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083
Parquet¶
To query Parquet datasets you have to create Hive tables in the Denodo Embedded MPP. For this you can use the Embedded MPP data source in Denodo to graphically explore Parquet datasets (including those using the Hive style partitioning), create the tables in the MPP and base views in Denodo.
![Explore Parquet files](/docs/html/img/9.1/mpp_explore_parquet.png)
Explore Parquet files¶
Important
To graphically explore Parquet datasets you need the Denodo subscription bundle Enterprise Plus. You can check Object Storage Data in Parquet, Delta and Iceberg format for more details on how to connect an Object Storage graphically.
Note
Hive is a restricted catalog, so it is not listed on the From MPP Catalogs
tab of the Embedded MPP data source.
Features¶
The Denodo Embedded MPP provides the following features when treating with Parquet files:
Graphically explore Parquet datasets, create tables in the MPP and base views in Denodo (
From object storage
tab of the Embedded MPP data source)Create base views over existing Hive tables in an External Metastore (
From MPP Catalogs
tab of the Embedded MPP data source)Querying
Limitations¶
Inserts: it is not possible to insert data into views created from Parquet files using the
From Object Storage
tab of the Embedded MPP data source. See Manage Views Created from Object Storage for details.
Avro, CSV, ORC¶
To query datasets in file formats other than Parquet, such as Avro, CSV or ORC, you have to create the Hive tables in the Denodo Embedded MPP.
For this you have to use a JDBC client, since the graphical manner is not available in the Denodo Embedded MPP data source.
And then create the Denodo base views over those Hive tables using From MPP Catalogs
tab of the Embedded MPP data source.
The steps to follow are:
Create a new catalog in the Embedded MPP because the predefined catalog named
hive
is not accessible from theFrom MPP Catalogs
tab of the Embedded MPP data source.Add a new catalog properties file, e.g.,
presto/conf/catalog/hive_orc_formats
. Then you have to add the type of catalog to theconnector.name
property:hive-hadoop2
and any other properties required by the catalog type.Below there is an example of a Hive catalog properties connected to the Embedded Hive Metastore to read Avro, CSV or ORC datasets:
connector.name=hive-hadoop2 hive.metastore.uri=thrift://hive-metastore:9083 hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml hive.allow-drop-table=true hive.non-managed-table-writes-enabled=true
Create the Hive table in the Embedded MPP with a JDBC client. For simple SQL queries like this you can use the Presto UI, as shown below. This SQL query editor has the limitation of returning only 100 records maximum per query.
Create Hive table from ORC datasets¶
Create the Denodo base view on the previously created Hive table using
From MPP Catalogs
tab of the Embedded MPP data source.