USER MANUALS

Hive

The Denodo Embedded MPP is distributed with a predefined catalog named hive connected to the Embedded Hive Metastore for accessing Hive tables from Parquet files.

hive.properties
connector.name=hive-hadoop2

# Embedded Hive Metastore
hive.metastore.uri=thrift://hive-metastore:9083

Parquet

To query Parquet datasets you have to create Hive tables in the Denodo Embedded MPP. For this you can use the Embedded MPP data source in Denodo to graphically explore Parquet datasets (including those using the Hive style partitioning), create the tables in the MPP and base views in Denodo.

Explore Parquet files

Explore Parquet files

Important

To graphically explore Parquet datasets you need the Denodo subscription bundle Enterprise Plus. You can check Object Storage Data in Parquet, Delta and Iceberg format for more details on how to connect an Object Storage graphically.

Note

Hive is a restricted catalog, so it is not listed on the From MPP Catalogs tab of the Embedded MPP data source.

Features

The Denodo Embedded MPP provides the following features when treating with Parquet files:

Limitations

  • Inserts: it is not possible to insert data into views created from Parquet files using the From Object Storage tab of the Embedded MPP data source. See Manage Views Created from Object Storage for details.

Avro, CSV, ORC

To query datasets in file formats other than Parquet, such as Avro, CSV or ORC, you have to create the Hive tables in the Denodo Embedded MPP. For this you have to use a JDBC client, since the graphical manner is not available in the Denodo Embedded MPP data source. And then create the Denodo base views over those Hive tables using From MPP Catalogs tab of the Embedded MPP data source.

The steps to follow are:

  1. Create a new catalog in the Embedded MPP because the predefined catalog named hive is not accessible from the From MPP Catalogs tab of the Embedded MPP data source.

    Add a new catalog properties file, e.g., presto/conf/catalog/hive_orc_formats. Then you have to add the type of catalog to the connector.name property: hive-hadoop2 and any other properties required by the catalog type.

    Below there is an example of a Hive catalog properties connected to the Embedded Hive Metastore to read Avro, CSV or ORC datasets:

     connector.name=hive-hadoop2
     hive.metastore.uri=thrift://hive-metastore:9083
    
     hive.config.resources=/opt/presto-server/etc/catalog/core-site.xml
    
     hive.allow-drop-table=true
     hive.non-managed-table-writes-enabled=true
    
  2. Create the Hive table in the Embedded MPP with a JDBC client. For simple SQL queries like this you can use the Presto UI, as shown below. This SQL query editor has the limitation of returning only 100 records maximum per query.

    Create Hive table from ORC datasets

    Create Hive table from ORC datasets

  3. Create the Denodo base view on the previously created Hive table using From MPP Catalogs tab of the Embedded MPP data source.

Add feedback