Embedded MPP at Denodo¶
Since the update 8.0u20230301, Denodo has customized Presto to behave as the Denodo Platform’s embedded Massively Parallel Processing (MPP) to accelerate queries, see What Is New in Denodo 8.0u20230301 section.
Important
The Denodo Embedded MPP requires the subscription bundle Enterprise Plus and the update 8.0u20230301.

To use the Denodo Embedded MPP you have to register it in Denodo. The registration process can be executed:
Within the deployment process, with the
--register
option:cluster.sh deploy --register
After the deployment process, with the
register
command:cluster.sh register --register-user --register-password [--presto-password]
The registration process consists of:
Creation of a new database
admin_denodo_mpp
, a new userdenodo_mpp_user
and a special data source, the Denodo Embedded MPP data source, calledembedded_mpp
:embedded_mpp data source¶
The Denodo Embedded MPP data source allows you to explore an Object Storage like Amazon S3, Azure Data Lake Storage or HDFS, graphically and create base views over data stored in Parquet file format (see Object Storage data in Parquet Format).
For the
embedded_mpp
data source to connect to the Object Storage you have to manually configure the Object storage configuration in the Read Write tab. You must select the file system you want to access and provide the credential information. The file systems available graphically are S3 and HDFS. You can use other systems like Azure Data Lake Storage that are compatible with the Hadoop API. To do this, select HDFS and provide the required Hadoop properties (see Support for Hadoop-compatible routes section).embedded_mpp Object Storage configuration¶
You can then add the paths you want to browse from that object storage. Once you have saved the necessary credentials and paths, you can click on
Create Base View
to browse these paths and select the ones you want to import. Denodo automatically detects the folders corresponding to tables in Parquet format (including those using the Hive style partitioning).embedded_mpp base view creation¶
Select the tables to import and click on
Create selected
to create the base view. Denodo will create the base view and a table in theembedded_mpp
data source to access the data. Denodo will create the table in the Hive catalog of the Embedded MPP and the schema of your choice. You can select the schema from those available in theTarget schema
drop-down at the bottom of theCreate Base View
dialog.For each table Denodo will automatically calculate its statistics by calling the stored procedure COMPUTE_SOURCE_TABLE_STATS. This stored procedure collects statistical information about the data that the Denodo Embedded MPP optimizer uses to plan the query based on cost strategies.
In many scenarios new partitions are going to be added after the creation of the Hive table/Denodo base view. These new partitions will not be reflected in query results until the Embedded MPP’s Metastore is updated to get all new partitions added since the Hive table creation or since its last synchronization. Therefore, when you need to synchronize the partitions from the Object Storage with those in the Embedded MPP’s Metastore run the REFRESH_SOURCE_TABLE_METADATA stored procedure.
Configuration of the Denodo query optimizer to consider this Denodo Embedded MPP for query acceleration.
Denodo Embedded MPP acceleration¶
This is useful in scenarios where a query combines large amounts of Parquet data stored in an Object Storage such as HDFS, S3 or Azure Data Lake with data in a different data source. In these cases, the Denodo query optimizer may decide to send the query to the Denodo Embedded MPP. The Denodo Embedded MPP can access the data in Object Storage using its own engine and can access the data outside the Object Storage in streaming through Denodo, without the need to create temporary tables or files.
In this way, the Denodo query engine can combine its powerful optimization techniques, federation and security capabilities with parallel processing on big data.
It is recommended that after the registration of the Denodo Embedded MPP at Denodo, the VALIDATE_MPP_LICENSE is invoked to validate that the registration has been done correctly. This stored procedure has been included in the 8.0u20240306 update.
Important
Requirements to connect from embedded_mpp data source to Denodo Embedded MPP:
Note that if the certificate used by the Denodo Embedded MPP is signed by a private authority, or it is self-signed, you have to import the Denodo Embedded MPP certificate into the Denodo server truststore.
The certs/certificate.crt
is distributed for testing purposes ONLY.
This certificate accepts presto-denodo
as the Denodo Embedded MPP hostname. In this case, you have to add an entry in the hosts
file where
the Denodo server is running, with presto-denodo
and the IP that appears as the EXTERNAL-IP
of the presto
service.

Kubernetes services status¶

Ping External IP¶

hosts file¶
Note
If you have a cluster of Denodo servers it needs to be configured to store its metadata in an external database to take advantage of the Embedded MPP Acceleration technique. In environments with just one Virtual DataPort server it is possible to avoid this restriction executing:
SET 'queryOptimization.parallelProcessing.denodoConnector.enableUsingSharedMetadataOnly'='false';