Configuring Cluster Nodes: Workers, CPU and Memory¶
This section details how to configure the underlying Kubernetes cluster nodes to ensure optimal performance for the Denodo Embedded MPP.
These settings are defined within the presto section of the values.yaml file.
It is strongly recommended to run the Embedded MPP on a Kubernetes cluster where no other applications are running on the nodes to run it efficiently.
The total number of nodes in the Kubernetes cluster will be calculated based on the numWorkers setting, plus two dedicated nodes for
the Coordinator and the Embbeded Hive Metastore components.
presto:
# The total number of nodes in the cluster will be: numWorkers + 2
numWorkers: 4
cpusPerNode: 16
memoryPerNode: 64
numWorkers: The number of the MPP Workers nodes in the Kubernetes cluster.
The recommended cluster topology for the Denodo Embedded MPP is N + 2 nodes, where N is your
numWorkersvalue. This ensures dedicated resources for:one Coordinator node
one node for the Embedded Hive Metastore and the Embedded PostgreSQL
N Workers nodes
For detailed sizing guidelines refer to Sizing Recommendations for the Denodo Embedded MPP.
cpusPerNode: The number of CPU cores allocated to each individual node within the Kubernetes cluster.
The Denodo Embedded MPP is a CPU-intensive application, especially when processing complex analytical queries: it is recommended to start with nodes that have at least 16-32 cores. For Amazon Elastic Kubernetes Service (EKS), you can consider instance types like m6a.8xlarge (32 cores) or r6a.4xlarge (16 cores) as starting points.
Generally, doubling the CPU resources across the cluster (while keeping memory consistent) can significantly reduce query execution times (e.g., a query might take half the time). More CPU cores directly contribute to shorter query durations.
memoryPerNode: The total memory in Gigabytes (GB) available on each node of the Kubernetes cluster.
This setting previously defined the memory available to the MPP Coordinator and each MPP Worker. 80% of the total memoryPerNode will be allocated for the MPP Coordinator and each MPP Workers JVM heap size.
Important
Starting with version 20250703 of the Denodo Embedded MPP, the
memoryPerNodeproperty is deprecated. This means it is no longer the primary mechanism for controlling the JVM heap size for the MPP Coordinator and MPP Workers.The Denodo Embedded MPP now fully leverages Java container awareness. Now the Java Virtual Machine (JVM) running the MPP Coordinator and the MPP Workers is designed to automatically detect and adjust its heap size based on the memory limits configured for its Kubernetes container.
If memory limits are explicitly set for the Denodo Embedded MPP containers in your Kubernetes pod definition (configured in the
presto.coordinator.resourcesandpresto.workers.resourcessections invalues.yaml), the JVM’s actual heap size will dynamically adapt to these limits, allocating the 80% of the configured memory limit for the JVM heap.If no memory limits are explicitly set for the Denodo Embedded MPP containers, the JVM will default to attempting to allocate the 80% of the total physical RAM available on the underlying Kubernetes node.
Memory is critical for query processing, particularly for operations involving
JOIN,GROUP BY, andORDER BY, which often require holding large datasets in memory: it is recommended to start with nodes that have at least 64-128 GB of memory. For Amazon Elastic Kubernetes Service (EKS), you can consider instance types like m6a.8xlarge (128 GB) or r6a.4xlarge (64 GB) as starting points.The amount of available memory directly influences the maximum number of concurrent queries that the Denodo Embedded MPP can efficiently process without performance degradation. More memory allows for higher concurrency.
Memory Settings for Query Performance¶
Adjusting the Denodo Embedded MPP’s memory settings is critical for balancing the performance of individual queries with the overall concurrency the system can handle. This section explains how memory is utilized and how to configure it for optimal query performance.
The Denodo Embedded MPP operates with several layers of memory limits. It’s important to understand the default calculations to effectively tune your environment:
Node Memory (memoryPerNode): This is the total physical RAM available on each Kubernetes cluster node, configured via
memoryPerNodeinvalues.yaml.Starting with version 20250703 of the Denodo Embedded MPP, the
memoryPerNodeproperty is deprecated.Now if Kubernetes memory limits are explicitly set, the JVM of the Coordinator and Workers will adhere to these limits (configured in the
resources.limits.memorysection of your pod definition for the Coordinator and Worker invalues.yaml).If Kubernetes memory limits are not explicitly set, the JVM of the Coordinator and Workers will default to allocate memory based on the total physical RAM available on the underlying node.
JVM Maximum Memory: The 80% of the memoryPerNode is automatically allocated as the maximum JVM heap size for the Denodo Embedded MPP processes: Coordinator and Workers.
But starting with version 20250703 of the Denodo Embedded MPP, if Kubernetes memory limits are explicitly set, the JVM of the Coordinator and Workers will allocate the 80% of these configured limits.
If Kubernetes memory limits are not explicitly set, the JVM of the Coordinator and Workers will allocate the 80% of the total physical RAM available on the underlying node.
Default Query User Memory per Worker (query.max-memory-per-node): By default, a single query can utilize up to 10% of the JVM’s maximum memory on an individual MPP Worker node:
query.max-memory-per-node (default) = (detected_memory * 0.8) * 0.1This means a query starts with a relatively small portion of the node’s total memory for its individual execution on a worker.
When large or complex queries are executed, the Denodo Embedded MPP might encounter memory errors, such as:
Query exceeded per-node user memory limit of xGBQuery exceeded per-node total memory limit of xGBQuery exceeded distributed user memory limit of xGB
These errors indicate that the query requires more memory than currently allocated by the default settings to complete its operations.
When these occur, you need to adjust the memory settings in the values.yaml to override the default values, by adding specific properties
to the additionalConfig sections for both the coordinator and worker in the values.yaml file:
presto:
coordinator:
additionalConfig: [
query.max-memory=zGB
]
…
presto:
worker:
additionalConfig: [
query.max-memory-per-node=xGB,
query.max-total-memory-per-node=yGB,
query.max-memory=zGB
]
The most important memory properties you can configure are:
query.max-memory-per-node: the maximum amount of user memory a single query can consume on an individual MPP Worker node. The default value is
JVM max memory * 0.1(10% of the JVM’s assigned memory on the node).If you are encountering
Query exceeded per-node user memory limiterrors, a good starting point for increasing this value is to set:query.max-memory-per-node = JVM max memory * 0.5(50% of the JVM’s assigned memory on the node).Increasing the default
query.max-memory-per-nodecan improve the performance and success rate of large queries, but also can reduce the available memory for other concurrent queries.query.max-total-memory-per-node: the maximum amount of user and system memory that a query can use on an individual MPP Worker node. The default values is
query.max-memory-per-node * 2.If you are encountering
Query exceeded per-node total memory limiterrors, a recommended starting point for increasing this value is to set:query.max-total-memory-per-node = JVM max memory * 0.6(60% of the JVM’s assigned memory on the node).Increasing the default
query.max-total-memory-per-nodecan improve the performance of large queries, but also may reduce the available memory for other queries in highly concurrent scenarios.query.max-memory: the maximum amount of user memory that a query can consume across all MPP Workers in the entire cluster. The default value is
20GB.If the cluster needs to handle large queries, you will need to increase
query.max-memory. A good initial recommendation is to set:query.max-memory = query.max-memory-per-node * numWorkers.
Sometimes adjusting memory settings is not enough. Consider these additional strategies:
Query Optimization: Analyze query plans with EXPLAIN ANALYZE (or EXPLAIN (TYPE DISTRIBUTED) if
EXPLAIN ANALYZEfails with a memory error), on the problematic queries using an external JDBC client like DBeaver. The query plan will provide invaluable insights, highlighting stages that consume significant resources and offering clues for optimization:
Gathering statistics for each view involved in the queries. Accurate statistics allow the query optimizer to create more efficient execution plans.
Adding filter conditions (WHERE clauses) as early as possible in the queries to reduce the amount of data processed by the Embedded MPP engine.
Infrastructure Scaling If individual queries consistently hit per-node memory limits even after tuning, the underlying cluster nodes might simply not have enough physical RAM, it might indicate that the underlying cluster nodes simply do not have enough physical RAM. In such cases, consider upgrading to Kubernetes cluster nodes with higher memory or adding more worker nodes (increasing
numWorkers) to the Kubernetes cluster to distribute the processing load and increase the overall cluster memory capacity.
CPU and Memory Management in Kubernetes¶
Kubernetes uses resource requests and resource limits to efficiently schedule pods across the cluster nodes.
Resource Requests: This specifies the minimum amount of a resource (CPU or Memory) that a container requires to function correctly. The Kubernetes scheduler will only place a Denodo Embedded MPP pod on a node that can guarantee the availability of the requested resources.
Resource Limits: This specifies the maximum amount of a resource (CPU or Memory) that a container is allowed to consume. Limits prevent a single pod from consuming all available resources on a node.
CPU Limits: If a pod tries to use more CPU than its limit, Kubernetes will throttle its CPU usage.
Memory Limits: If a pod tries to use more memory than its limit, Kubernetes will terminate (kill) the pod to prevent it from impacting the node. This often results in an “Out-Of-Memory” (OOMKilled) error.
The CPU and Memory resource requests and limits for the Denodo Embedded MPP’s Coordinator and Worker pods can be configured within the
presto section of the values.yaml file:
presto:
coordinator:
resources:
limits:
cpu: 25
memory: 64Gi
requests:
cpu: 25
memory: 64Gi
...
worker:
resources:
limits:
cpu: 25
memory: 64Gi
requests:
cpu: 25
memory: 64Gi
CPU units:
1.0 represents one full CPU core (or vCPU in cloud environments).
0.1 or 100m (100 millicores) represents one-tenth of a CPU core.
Memory units:
Gi (Gibibytes) is the standard Kubernetes unit for memory. 1Gi = 1024Mi.
Notice that the resources section for both coordinator and worker is commented out by default in the provided values.yaml.
We leave these settings as a choice for the Kubernetes cluster administrator as the optimal CPU and Memory values are highly dependent on
the instance types of the Kubernetes nodes, the workload patterns for the Denodo Embedded MPP, etc.
