Embedded MPP at Denodo¶
Denodo has customized Presto to behave as the Denodo Platform’s embedded Massively Parallel Processing (MPP) to accelerate queries.
Important
The Denodo Embedded MPP requires the subscription bundle Enterprise Plus.
To use the Denodo Embedded MPP you have to register it in Denodo. The registration process can be executed:
Within the deployment process, with the
--register
option:cluster.sh deploy --register
After the deployment process, with the
register
command:cluster.sh register --register-user --register-password [--presto-password]
The registration process consists of:
Creation of a new database
admin_denodo_mpp
, a new userdenodo_mpp_user
and a special data source, the Denodo Embedded MPP data source, calledembedded_mpp
:The Denodo Embedded MPP data source allows you to explore an Object Storage like Amazon S3, Azure Data Lake Storage or HDFS, graphically and create base views over data stored in Parquet file or Delta Lake table format (see Object Storage Data in Parquet, Delta and Iceberg format).
For the
embedded_mpp
data source to connect to the Object Storage you have to manually configure the Object storage configuration in the Read Write tab. You must select the file system you want to access and provide the credential information. The file systems available graphically are S3, ADLS and HDFS. You can use other systems like Google Cloud Storage that are compatible with the Hadoop API. To do this, select HDFS and provide the required Hadoop properties (see Support for Hadoop-compatible routes section).You can then add the routes you want to browse from that object storage. Once you have saved the necessary credentials and routes, you can click on
Create Base View
to browse these routes and select the ones you want to import. Denodo automatically detects the folders corresponding to tables in Parquet format (including those using the Hive style partitioning) and Delta Lake format. Each table format is identified with the logo of the corresponding technology.Select the tables to import and click Create selected to create the base view. Denodo will create the base view and a table in the
embedded_mpp
data source to access the data. Denodo will create the table in the Hive catalog or Delta catalog of the Embedded MPP and the schema of your choice. You can select the schema from those available in the Target schema drop-down at the bottom of the Create Base View dialog.For each table Denodo will automatically calculate its statistics by calling the stored procedure COMPUTE_SOURCE_TABLE_STATS. This stored procedure collects statistical information about the data that the Denodo Embedded MPP optimizer uses to plan the query based on cost strategies.
In many scenarios new partitions are going to be added after the creation of the Hive table/Denodo base view. These new partitions will not be reflected in query results until the Embedded MPP’s Metastore is updated to get all new partitions added since the Hive table creation or since its last synchronization. Therefore, when you need to synchronize the partitions from the Object Storage with those in the Embedded MPP’s Metastore run the REFRESH_SOURCE_TABLE_METADATA stored procedure.
Configuration of the Denodo query optimizer to consider this Denodo Embedded MPP for query acceleration.
This is useful in scenarios where a query combines large amounts of Parquet data stored in an Object Storage such as S3, Azure Data Lake or HDFS with data in a different data source. In these cases, the Denodo query optimizer may decide to send the query to the Denodo Embedded MPP. The Denodo Embedded MPP can access the data in Object Storage using its own engine and can access the data outside the Object Storage in streaming through Denodo, without the need to create temporary tables or files.
In this way, the Denodo query engine can combine its powerful optimization techniques, federation and security capabilities with parallel processing on big data.
It is recommended that after the registration of the Denodo Embedded MPP at Denodo, you click the Validate MPP License
button to validate that
the registration has been done correctly.
Important
Requirements to connect from embedded_mpp data source to Denodo Embedded MPP:
Note that if the certificate used by the Denodo Embedded MPP is signed by a private authority, or it is self-signed, you have to import the Denodo Embedded MPP certificate into the Denodo server truststore.
The certs/certificate.crt
is distributed for testing purposes ONLY.
This certificate accepts presto-denodo
as the Denodo Embedded MPP hostname. In this case, you have to add an entry in the hosts
file where
the Denodo server is running, with presto-denodo
and the IP that appears as the EXTERNAL-IP
of the presto
service.
Note
If you have a cluster of Denodo servers it needs to be configured to store its metadata in an external database to take advantage of the Embedded MPP Acceleration technique. In environments with just one Virtual DataPort server it is possible to avoid this restriction executing:
SET 'queryOptimization.parallelProcessing.denodoConnector.enableUsingSharedMetadataOnly'='false';