USER MANUALS


Workers, CPU and Memory

The following properties of the values.yaml define the characteristics of the Kubernetes cluster nodes:

presto:
  # The cluster has numNodes = numWorkers + 2
  numWorkers: 4
  cpusPerNode: 16
  memoryPerNode: 64
  • numWorkers: The number of the MPP Workers in the Kubernetes cluster.

    The recommendation for running the Denodo Embedded MPP is to create a Kubernetes cluster with N + 2 nodes, with no other applications running on the nodes:

    • one for the Coordinator

    • one for the Embedded Hive Metastore and the Embedded PostgreSQL

    • N for the Workers

    then numWorkers must be N.

    See also Sizing Recommendations for the Denodo Embedded MPP.

  • cpusPerNode: The number of cores of each node of the Kubernetes cluster.

    The Denodo Embedded MPP requires a certain amount of CPU to process the data for a given query: it is recommended to start with nodes with at least 16-32 cores. For example, in Amazon Elastic Kubernetes Service you can start with m6a.8xlarge or r6a.4xlarge nodes.

    In general, if you double the CPU in the cluster, keeping the same memory, the query will take half the time. More CPUs mean shorter queries.

  • memoryPerNode: The total memory in GB of each node of the Kubernetes cluster. This setting determines the memory available to the MPP Coordinator and each MPP Worker, it will be the 80% of the total memory of the cluster node.

    The Denodo Embedded MPP requires a certain amount of memory to process queries with JOIN, GROUP BY, and ORDER BY operations: it is recommended to start with nodes with at least 64-128 GB of memory. For example, in Amazon Elastic Kubernetes Service you can start with m6a.8xlarge or r6a.4xlarge nodes.

    The memory available determines the maximum number of concurrent queries that can be run in the Denodo Embedded MPP.

Memory Settings for Queries

It is important to adjust the Denodo Embedded MPP memory settings for query performance, finding a balance between maximum memory per query and the maximum number of concurrent queries that can be run in the Denodo Embedded MPP.

The maximum amount of user memory a query can use on a MPP Worker is query.max-memory-per-node= JVM max memory * 0.1 by default. The JVM maximum memory is the 80% of the total memory of the cluster node. Therefore, the memory available for executing queries in a MPP Worker is memoryPerNode * 0.8 * 0.1.

With very large queries the Embedded MPP could throw Query exceeded per-node user memory limit of xGB, Query exceeded per-node total memory limit of xGB or Query exceeded distributed user memory limit of xGB, meaning that it needs more memory to handle queries. In this case you can configure the memory settings in the values.yaml to override the default values:

Memory configuration for queries
presto:
  coordinator:
    additionalConfig: [
      query.max-memory-per-node=xGB,
      query.max-total-memory-per-node=yGB,
      query.max-memory=zGB
    ]


presto:
  worker:
    additionalConfig: [
      query.max-memory-per-node=xGB,
      query.max-total-memory-per-node=yGB,
      query.max-memory=zGB
    ]

Most important memory properties are:

  • query.max-memory-per-node: the maximum amount of user memory a query can use on a MPP Worker. The default value is JVM max memory * 0.1.

    If this is not enough, our recommendation as a starting point is to set query.max-memory-per-node = JVM max memory * 0.5.

    Increasing the default query.max-memory-per-node can improve the performance of large queries, but also may reduce the available memory for other queries in highly concurrent scenarios.

  • query.max-total-memory-per-node: the maximum amount of user and system memory a query can use on a MPP Worker. The default values is query.max-memory-per-node * 2.

    If this is not enough, our recommendation as a starting point is to set query.max-total-memory-per-node = JVM max memory * 0.6.

    Increasing the default query.max-total-memory-per-node can improve the performance of large queries, but also may reduce the available memory for other queries in highly concurrent scenarios.

  • query.max-memory: the maximum amount of user memory that a query can use across all MPP Workers in the cluster. The default value is 20GB.

    If the cluster needs to handle bigger queries, you need to increase query.max-memory, our recommendation is query.max-memory = query.max-memory-per-node * numWorkers.

    You can also run the EXPLAIN ANALYZE (or EXPLAIN (TYPE DISTRIBUTED) if EXPLAIN ANALYZE throws any Query exceeded error), with these big queries, from an external JDBC client like DBeaver, to examine the query plan and try to optimize it by gathering statistics for each view involved in the queries, by adding filter conditions to reduce the amount of data to be processed, etc..

    In addition to adjusting the memory settings, sometimes, the only solution to handle large queries is to use cluster nodes with more memory or adding more nodes to the cluster.

CPU and Memory Management in Kubernetes

Kubernetes schedules pods across nodes based on the resource requests and limits for CPU and Memory. If a container pod requests certain CPU and/or memory values, Kubernetes will only schedule it on a node that can guarantee those resources. Limits, on the other hand, ensure that a container pod never exceeds a certain value.

presto:
  coordinator:
    resources:
      limits:
        cpu: 12.8
        memory: 56Gi
      requests:
        cpu: 12.8
        memory: 56Gi

   ...

  worker:
    resources:
      limits:
        cpu: 12.8
        memory: 56Gi
      requests:
        cpu: 12.8
        memory: 56Gi

Note that resources are commented out, as we leave this setting as a choice for the Kubernetes cluster administrator.

Add feedback