USER MANUALS


Embedded MPP at Denodo

Since the update 8.0u20230301, Denodo has customized Presto to behave as the Denodo Platform’s embedded Massively Parallel Processing (MPP) to accelerate queries, see What Is New in Denodo 8.0u20230301 section.

Important

The Denodo Embedded MPP requires the subscription bundle Enterprise Plus and the update 8.0u20230301.

MPP Architecture

To use the Denodo Embedded MPP you have to register it in Denodo. The registration process can be executed:

  • Within the deployment process, with the --register option:

    cluster.sh deploy --register
    
  • After the deployment process, with the register command:

    cluster.sh register --register-user --register-password [--presto-password]
    

The registration process consists of:

  1. Creation of a new database admin_denodo_mpp, a new user denodo_mpp_user and a special data source in Denodo called embedded_mpp:

    embedded_mpp data source

    embedded_mpp data source

    The Denodo Embedded MPP data source allows you to connect to an Object Storage (S3, HDFS) graphically, explore its data and create base views on Parquet files and the corresponding Hive tables in the Denodo Embedded MPP. A functionality similar to the Metadata Discovery Tool for Parquet files, but now integrated in the Denodo Platform.

    For the embedded_mpp data source to connect to the Object Storage you have to manually configure the Object storage configuration in the Read & Write tab. You must select the file system you want to access and provide the credential information. The file systems available graphically are S3 and HDFS. You can use other systems like Azure Data Lake Storage that are compatible with the Hadoop API. To do this, select HDFS and provide the required Hadoop properties (see Support for Hadoop-compatible routes section).

    embedded_mpp Object Storage configuration

    embedded_mpp Object Storage configuration

    You can then add the paths you want to browse from that object storage. Once you have saved the necessary credentials and paths, you can click on Create Base View to browse these paths and select the ones you want to import. Denodo automatically detects the folders corresponding to tables in Parquet format (including those using the Hive style partitioning).

    embedded_mpp base view creation

    embedded_mpp base view creation

    Select the tables to import and click on Create selected to create the base view. Denodo will create the base view and a table in the embedded_mpp data source to access the data. Denodo will create the table in the Hive catalog of the Embedded MPP and the schema of your choice. You can select the schema from those available in the Target schema drop-down at the bottom of the Create Base View dialog.

    For each table Denodo will automatically calculate its statistics by calling the stored procedure COMPUTE_SOURCE_TABLE_STATS. This stored procedure collects statistical information about the data that the Denodo Embedded MPP optimizer uses to plan the query based on cost strategies.

  2. Configuration of the Denodo query optimizer to consider this Denodo Embedded MPP for query acceleration.

    Denodo Embedded MPP acceleration

    Denodo Embedded MPP acceleration

    This is useful in scenarios where a query combines large amounts of Parquet data stored in an Object Storage such as HDFS, S3 or Azure Data Lake with data in a different data source. In these cases, the Denodo query optimizer may decide to send the query to the Denodo Embedded MPP. The Denodo Embedded MPP can access the data in Object Storage using its own engine and can access the data outside the Object Storage in streaming through Denodo, without the need to create temporary tables or files.

    In this way, the Denodo query engine can combine its powerful optimization techniques, federation and security capabilities with parallel processing on big data.

It is recommended that after the registration of the Denodo Embedded MPP at Denodo, the VALIDATE_MPP_LICENSE is invoked to validate that the registration has been done correctly. This stored procedure has been included since the 8.0u20240306 update.

Important

Requirements to connect from embedded_mpp data source to Denodo Embedded MPP:

Note that if the certificate used by the Denodo Embedded MPP is signed by a private authority, or it is self-signed, you have to import the Denodo Embedded MPP certificate into the Denodo server truststore.

The certs/certificate.crt is distributed for testing purposes ONLY.

This certificate accepts presto-denodo as the Denodo Embedded MPP hostname. In this case, you have to add an entry in the hosts file where the Denodo server is running, with presto-denodo and the IP that appears as the EXTERNAL-IP of the presto service.

Kubernetes services status

Kubernetes services status

Ping External IP

Ping External IP

hosts file

hosts file

Note

If you have a cluster of Denodo servers it needs to be configured to store its metadata in an external database to take full advantage of the Denodo Embedded MPP functionalities, as explained in Storing the Metadata on an External Database.

Add feedback