USER MANUALS

Presto on Velox

Starting with Denodo 9.4, the Denodo Lakehouse Accelerator 2.0.0 includes both the Presto C++ (recommended) and the Presto Java engines. The Presto C++ engine, referred to as Presto on Velox , is designed to deliver a significant performance boost for intensive workloads.

Features

The initial release of the Presto on Velox engine includes support for the following connectors and storage systems:

Connectors

Connector

Read

Create/Insert (*)

Update

Merge

Delete

Hive (Parquet files)

Yes

Yes

No

No

No

Iceberg

Yes

Yes (Only unpartitioned datasets)

No

No

No

Delta

No

No

No

No

No

(*) To support write operations, ensure that the Denodo Lakehouse Accelerator Hive catalog has the following property enabled:

hive.non-managed-table-writes-enabled=true

Be aware of the insert limitations that apply to Hive tables within the Denodo Lakehouse Accelerator.

Important

S3 Endpoint Configuration for Presto on Velox

When using the Presto on Velox engine with Amazon S3, you must explicitly configure the s3Endpoint if your bucket is located outside of the default us-east-1 region.

Failure to match the endpoint with the bucket’s physical region will result in a VeloxRuntimeError (HTTP 301 Moved Permanently). Ensure your values.yaml uses the correct regional format: s3.<region>.amazonaws.com.

Object Storage

  • AWS S3 (and S3-compatible storage): Supported.

  • HDFS: Supported.

  • Azure: Currently NOT supported.

  • GCS: Currently NOT supported.

Supported Catalogs

The following catalogs are supported for Presto on Velox:

Hive Metastore
  • Supported Tables: Hive and Iceberg.

AWS Glue Data Catalog
  • Supported Tables: Hive and Iceberg.

Unity Catalog
  • Supported Tables: Delta Lake tables via Uniform and Iceberg.

Nessie Catalog
  • Supported Tables: Iceberg.

Snowflake Open Catalog
  • Currently NOT supported.

Kubernetes Platforms

The Denodo Lakehouse Accelerator is supported on the following platforms:

  • Amazon EKS

  • Red Hat OpenShift

  • Azure AKS

  • Google GKE

Helm Chart Configuration

To utilize the next-generation engine, you must first enable the Velox engine. This switches the Lakehouse Accelerator from the traditional Java workers to the high-performance C++ implementation.

Enabling Presto on Velox Engine

Set the following property to true in your values.yaml to activate Presto on Velox:

presto:
  prestoOnVelox:
    enabled: true

Tip

Enabling this property is required for the denodoOnVeloxConnector settings below to take effect.

Denodo Connection Details

To enable the Embedded MPP Acceleration (enlace) configure the connection to the Denodo server using the Arrow Flight SQL protocol via the presto.denodoOnVeloxConnector section:

Property

Description

presto.denodoOnVeloxConnector

Section for Denodo connection details. Denodo will use Presto On Velox to accelerate queries.

...arrowFlightHost

Defines the Denodo host address.

...arrowFlightPort

Defines the Denodo Arrow Flight port. The default Arrow Flight port is 9994

...ssl

Boolean. Defines if SSL is enabled on the Denodo server.

...pemCertificate

PEM certificate filename for Denodo VDP. File must reside in presto/secrets/.

...user

Denodo username.

...password

Denodo password (must comply with Denodo password policies).

Worker Memory Configuration

For stability, you must tune memory limits in presto.worker.additionalConfig.

The values below are calibrated for r6a.8xlarge instances (256 GiB RAM each). If your cluster uses different instance types, scale these memory parameters accordingly.

presto:
  worker:
    additionalConfig: [
      system-memory-gb: "231",
      query-memory-gb: "219",
      query.max-memory-per-node: "219GB",
      system-mem-limit-gb: "241"
    ]
  • system-memory-gb

    Memory allocation limit enforced by an internal memory allocator. It consists of two parts:

    1. Memory used by queries as specified in query-memory-gb.

    2. Memory used by the system, such as disk spilling and cache prefetch.

    • Default value: 57

    • Recommendation: Set to approximately 90% of the available machine memory. This provides a buffer to handle unaccounted memory and prevent out-of-memory (OOM) conditions. (e.g., the default 57 GB is optimized for a 64 GB machine).

  • query-memory-gb

    Specifies the total memory in GB available for all concurrent queries on a worker node. System usage, such as disk spilling and cache prefetch, is not counted toward this limit.

    • Default value: 38

    • Recommendation: Set to approximately 85% of the available machine memory.

  • query.max-memory-per-node

    Maximum memory usage allowed for a single query.

    • Default value: 4 GB

    • Recommendation: Set this to the same value as query-memory-gb.

  • system-mem-limit-gb

    Specifies the system memory threshold that triggers memory pushback or a heap dump if server usage exceeds this limit.

    • Default value: 60

    • Recommendation: Set this to be greater than or equal to system-memory-gb, but do not exceed the total physical machine memory.

Troubleshooting

system-memory-gb is greater than the node memory

system-memory-gb is greater than the node memory
 Line: /prestissimo/presto_cpp/main/LinuxMemoryChecker.cpp:102, Function:start, Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment (64424509440 vs. 33367973888) system memory limit = 64424509440 bytes is higher than the available machine memory of deployment = 33367973888 bytes., Source: RUNTIME, ErrorCode: INVALID_STATE
Cause

The configured system-memory-gb exceeds the physical memory available on the deployment node. Presto on Velox performs a check at startup to ensure the allocation is safe.

Solution

Adjust the system-memory-gb value to be approximately 90% of the node’s total RAM. Review the error log for the exact comparison (in bytes):

query-memory-gb is greater than system-memory-gb

query-memory-gb is greater than system-memory-gb crash
 Reason: (38 vs. 23) Query memory capacity must not be larger than system memory capacity
 Retriable: False
 Expression: queryMemoryGb <= memoryGb
Cause

The memory allocated for all queries, query-memory-gb. cannot exceed the total memory limit managed by the system, system-memory-gb.

Solution

Update the configuration to ensure query-memory-gb is less than or equal to system-memory-gb. We recommend setting query-memory-gb to 85% of the machine memory and set system-mem-limit-gb to a value greater than or equal to system-memory-gb, but ensure it does not exceed the total physical machine memory.

system-mem-limit-gb is higher than the node memory

system-mem-limit-gb is higher than available machine memory crash
 Line: /prestissimo/presto_cpp/main/LinuxMemoryChecker.cpp:102, Function:start, Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment (64424509440 vs. 33367973888) system memory limit = 64424509440 bytes is higher than the available machine memory of deployment = 33367973888 bytes., Source: RUNTIME, ErrorCode: INVALID_STATE
 terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
   what():  Exception: VeloxRuntimeError
 Error Source: RUNTIME
 Error Code: INVALID_STATE
 Reason: (64424509440 vs. 33367973888) system memory limit = 64424509440 bytes is higher than the available machine memory of deployment = 33367973888 bytes.
 Retriable: False
 Expression: config_.systemMemLimitBytes <= availableMemoryOfDeployment
Cause

The threshold for memory pushbacks or heap dumps, system-mem-limit-gb, is set higher than the actual RAM available on the worker node.

Solution

Reduce system-mem-limit-gb so it is equal to or slightly lower than the total machine memory. It should be greater than or equal to system-memory-gb.

The query fails because it exceeds the local memory limit

The query fails because it exceeds the local memory limit
 VeloxRuntimeError:  Local arbitration failure. Reclaimable used capacity 0B is less than min reclaim bytes 128.00MB
 ARBITRATOR[SHARED CAPACITY[6.00GB] STATS[numRequests 36 numRunning 24 numSucceded 0 numAborted 0 numFailures 1 numNonReclaimableAttempts 0 reclaimedFreeCapacity 0B reclaimedUsedCapacity 0B maxCapacity 6.00GB freeCapacity 4.00GB freeReservedCapacity 4.00GB] CONFIG[kind=SHARED;capacity=6.00GB;arbitrationStateCheckCb=(set);global-arbitration-abort-time-ratio=0.5;global-arbitration-memory-reclaim-pct=10;memory-reclaim-threads-hw-multiplier=0.5;memory-pool-min-reclaim-bytes=128MB;check-usage-leak=1;fast-exponential-growth-capacity-limit=512MB;slow-capacity-grow-pct=0.25;global-arbitration-enabled=false;memory-pool-min-free-capacity-pct=0.25;max-memory-arbitration-time=5m;memory-pool-reserved-capacity=64MB;global-arbitration-without-spill=false;memory-pool-min-free-capacity=128MB;memory-pool-initial-capacity=128MB;memory-pool-abort-capacity-limit=8GB;reserved-capacity=4GB;]]
 Memory Pool[20260209_115728_00001_2nr29_0 AGGREGATE root[20260209_115728_00001_2nr29_0] parent[null] MMAP track-usage thread-safe]<max capacity 4.00GB capacity 2.00GB used 1.84GB available 0B reservation [used 0B, reserved 2.00GB, min 0B] counters [allocs 0, frees 0, reserves 0, releases 0, collisions 0])>
Cause

The query has requested more memory than allowed by the local node limit or the total query memory pool. This usually occurs with complex joins or large aggregations.

Solution

Review the query plan and the configured limits. You may need to increase query.max-memory-per-node or query-memory-gb if the hardware allows, or optimize the query to reduce its memory footprint.

S3 Query fails with HTTP 301 error

S3 Region Mismatch Error (HTTP 301)
VeloxRuntimeError:  Failed to get metadata for S3 object due to: 'Unknown error'. Path:'s3://your-bucket-name/data.parquet', SDK Error Type:100, HTTP Status Code:301, S3 Service:'AmazonS3', Message:'No response body.', Source: RUNTIME, ErrorCode: INVALID_STATE
Cause

The Presto on Velox engine is attempting to access an S3 bucket located in a different region than the default endpoint, s3.us-east-1.amazonaws.com. While table creation and metadata exploration may work, the native Velox S3 client fails with an HTTP 301 (Moved Permanently) error when the endpoint does not point to the specific region where the bucket resides.

Solution

Update the s3Endpoint property in your values.yaml to include the correct region matching your S3 bucket. For example, use the structure s3.<region>.amazonaws.com.

Add feedback