Starting with the update 8.0u20230301, Denodo has embedded its own MPP engine, the Denodo Embedded MPP, based on Presto.

Embedded MPP at Denodo


The Denodo Embedded MPP requires the subscription bundle Enterprise Plus and the update 8.0u20230301.

Presto is a high performance, distributed SQL query engine for big data. It was originally created by Facebook to provide self-service analytics on top of their massive data sets, and now is open source.

The main goal of the Denodo Embedded MPP is to provide efficient access to Data Lake content: Parquet files, Delta Lake tables and Apache Iceberg tables, in an easy way, using only SQL.

Also, with the Denodo Embedded MPP there is no need for additional external engines. The use of an MPP engine and Object Storage brings out-of-the-box options for those Denodo capabilities that require storage, such as, caching, query acceleration, remote tables, summaries, etc.

The main steps to use the Denodo Embedded MPP are:

  1. Store your dataset in the Object Storage. The Denodo Embedded MPP can read data from many distributed storage systems, such as:

    • Amazon S3

    • S3-compatible storage

    • Azure Data Lake Storage Gen2

    • Google Cloud Storage

    • Hadoop Distributed File System (HDFS)

  2. Deploy the Denodo Embedded MPP on Kubernetes using the Helm chart, registering it as a data source in the Denodo Platform.

  3. Graphically explore your storage from Denodo to introspect the structure of the Parquet files and its data types. Create the corresponding Hive tables in the Embedded MPP and the base views in Denodo.

  4. Query your Data Lake in Denodo using MPP acceleration.

  5. Load data in your Data Lake using Denodo capabilities like caching, remote tables or summaries.


If you have a cluster of Denodo servers it needs to be configured to store its metadata in an external database to take full advantage of the Denodo Embedded MPP functionalities, as explained in Storing the Metadata on an External Database.

Add feedback