Overview¶
Denodo has embedded its own MPP engine, the Denodo Embedded MPP, based on Presto.
Important
The Denodo Embedded MPP requires the subscription bundle Enterprise Plus.
Presto is a high performance, distributed SQL query engine for big data. It was originally created by Facebook to provide self-service analytics on top of their massive data sets, and now is open source.
The main goal of the Denodo Embedded MPP is to provide efficient access to Data Lake content: Parquet files, Delta Lake tables and Apache Iceberg tables, in an easy way, using only SQL.
Also, with the Denodo Embedded MPP there is no need for additional external engines. The use of an MPP engine and Object Storage brings out-of-the-box options for those Denodo capabilities that require storage, such as, caching, query acceleration, remote tables, summaries, etc.
The main steps to use the Denodo Embedded MPP are:
Store your dataset in the Object Storage. The Denodo Embedded MPP can read data from many distributed storage systems, such as:
Amazon S3
S3-compatible storage
Azure Data Lake Storage Gen2
Google Cloud Storage
Hadoop Distributed File System (HDFS)
Deploy the Denodo Embedded MPP on Kubernetes using the Helm chart, registering it as a data source in the Denodo Platform.
Graphically explore the data stored in Parquet file or Delta Lake table format in your Object Storage to create the corresponding tables in the Embedded MPP and the base views in Denodo.
Query your Data Lake in Denodo using MPP acceleration.
Load data, using Parquet file or Iceberg table format, in your Data Lake using Denodo capabilities like caching, remote tables or summaries.
Note
If you have a cluster of Denodo servers it needs to be configured to store its metadata in an external database to take advantage of the Embedded MPP Acceleration technique. In environments with just one Virtual DataPort server it is possible to avoid this restriction executing:
SET 'queryOptimization.parallelProcessing.denodoConnector.enableUsingSharedMetadataOnly'='false';