Embedded Parallel Processing¶
This feature is only available with the subscription bundle Enterprise Plus. To find out the bundle you have, open the About dialog of the Design Studio or the Administration Tool. See more about this in the section Denodo Platform - Subscription Bundles.
Since the update 8.0u20230301, Denodo includes embedded Massively Parallel Processing (MPP) capabilities to improve performance on environments containing data in an object storage. For this purpose Denodo now embeds a customized version of Presto, which is an open source parallel SQL query engine that excels in accessing data lake content.
A Presto cluster can be deployed following the instructions in the Presto cluster on Kubernetes user manual. Versions of that utility newer than 20221018 include a Presto that has been customized to interact with the Denodo Platform. In addition, the deployment process includes a final step that creates a new special data source in Denodo called “embedded_mpp”.
It also configures the Denodo query optimizer to consider this embedded MPP for query acceleration.
The data source “embedded_mpp” is located in a new database “admin_denodo_mpp”. On the one hand, it allows one to explore an object storage like Amazon S3 or HDFS and create base views over data stored in Parquet format (see Object Storage data in Parquet format). On the other hand, it allows the query optimizer to apply new Embedded MPP Acceleration techniques that have been specially designed for queries accessing this kind of data.