Overview¶
Denodo has embedded a massive parallel processing engine optimized for Datalake scenarios, the Denodo Lakehouse Accelerator (formerly known as Denodo Embedded MPP). The Denodo Lakehouse Accelerator is based on Presto, a high-performance, distributed SQL query engine for big data. Originally created by Facebook for massive-scale analytics, it is now a leading open-source project.
Denodo Lakehouse Accelerator includes two engine options:
Presto on Velox: A high-performance C++ implementation built on Velox (also known as Prestissimo) designed for maximum efficiency. This is the recommended engine provided your specific use case is not impacted by current feature gaps.
Presto Java: The classic, stable distributed SQL engine.
Next-Generation Performance with Presto On Velox¶
Presto on Velox offers a dramatic step forward in query performance and efficiency. Velox is a high-performance C++ execution library that powers the next-generation engine.
By utilizing Velox-based workers, the Denodo Lakehouse Accelerator provides:
Optimized Resource Usage: Achieves more efficient memory handling and lower resource consumption compared to traditional engines.
Vectorized Execution: Fully leverages SIMD optimizations to process data at the hardware level.
Scalability: The efficiency gains of C++ workers are most evident under high levels of concurrency, ensuring stable performance for modern data workloads at scale.
Key Capabilities¶
The main goal of the Denodo Lakehouse Accelerator is to provide efficient access to Data Lake content—such as Parquet files, Iceberg, and Delta Lake tables—using only standard SQL.
With the Denodo Lakehouse Accelerator, there is no need for additional external engines. The combination of a parallel engine and Object Storage enables out-of-the-box support for Denodo capabilities that require high-performance storage:
Caching
Query Acceleration
Remote Tables
Summaries
Main Steps to Use the Lakehouse Accelerator¶
Store your dataset in Object Storage. The Denodo Lakehouse Accelerator supports: * Amazon S3 and S3-compatible storage * Azure Data Lake Storage Gen2 * Google Cloud Storage * Hadoop Distributed File System (HDFS)
Deploy via Kubernetes using the provided Helm chart. During deployment, you can choose between the Velox (C++) or Java worker types before registering it as a data source in the Denodo Platform.
Explore data graphically in your Object Storage to create corresponding tables in the Lakehouse Accelerator and base views in Denodo.
Query your Data Lake leveraging the massive parallel processing capabilities of the engine.
Load data (Parquet or Iceberg) using Denodo features like caching or remote tables to optimize your Lakehouse performance.
Important
The Denodo Lakehouse Accelerator requires the subscription bundle Enterprise Plus. For production environments where performance and concurrency are critical, Presto on Velox is the recommended engine flavor, provided your specific use case is not impacted by current feature gaps.
