USER MANUALS


Embedded PostgreSQL

Denodo Embedded MPP is shipped with an Embedded PostgreSQL that stores the metadata of the Embedded Hive Metastore.

The Embedded PostgreSQL uses a Kubernetes Persistent Volume to ensure the persistence of the metadata.

The postgresql section of the values.yaml configures the persistence options:

postgresql:
  enabled: true

  pvClaim:
    annotations:
      # Add the following annotation if you want to preserve
      # Embedded MPP metadata after cluster removal
      "helm.sh/resource-policy": keep

    storage: 5Gi
    storageClassName: ""
  • pvClaim.annotations: Annotations for the Persistent Volume Claim used by the Embedded PostgreSQL.

    To preserve the Denodo Embedded MPP metadata after cluster removal you need to add the following annotation:

    "helm.sh/resource-policy": keep

  • pvClaim.storage: Storage available for the Embedded PostgreSQL.

    Default size is 5Gi, but it must be configured according to the scenario and data volumes to be queried, as this influences the size of the metadata.

  • pvClaim.storageClassName: See Persistent Volume section.

Persistent Volume

The Embedded PostgreSQL uses a Persistent Volume to store Postgres data and to ensure persistence.

But the Denodo Embedded MPP deployment does not include a Persistent Volume object, as the user instantiating it may not have permission to create Persistent Volumes. It includes a Persistent Volume Claim, that used in conjunction with Storage Class, dynamically requests the Persistent Volume. Therefore, at least one Storage Class has to be defined in your cluster.

To configure the Denodo Embedded MPP Storage Class, there are two options:

  1. If the cluster has a default Storage Class installed leave the storageClassName empty, pvClaim.storageClassName: "". This causes a Persistent Volume to be automatically provisioned for the cluster with the default Storage Class.

  2. Provide a Storage Class name into the pvClaim.storageClassName: field.

Use kubectl to check for StorageClass objects. The default StorageClass is marked with (default):

# sc is an acronym for StorageClass
kubectl get sc

NAME                        PROVISIONER               AGE
standard (default)          kubernetes.io/gce-pd      1d
gold                        kubernetes.io/gce-pd      1d

Amazon Elastic Kubernetes Service (EKS)

Starting with 1.30, Amazon EKS no longer includes the default annotation on the gp2 StorageClass resource applied to newly created clusters. This has no impact if you are referencing this storage class by name. But if you were relying on having a default StorageClass in the cluster you have to configure it now in pvClaim.storageClassName in the values.yaml file.

Since Amazon recommends using gp3 in any scenario where gp2 might be employed, we strongly recommend to create a StorageClass to provision EBS gp3 volumes. Amazon EBS gp3 volumes are the latest generation of general-purpose SSD-based EBS volumes that enable customers to provision performance independent of storage capacity, while providing up to 20% lower price per GB than existing gp2 volumes.

The following example defines a Kubernetes StorageClass that provisions Amazon EBS volumes using the gp3 type.

First, create a file named gp3-def-sc.yaml

 apiVersion: storage.k8s.io/v1
 kind: StorageClass
 metadata:
   annotations:
     storageclass.kubernetes.io/is-default-class: "true"
   name: gp3
 parameters:
   type: gp3
 provisioner: ebs.csi.aws.com
 reclaimPolicy: Delete
 volumeBindingMode: WaitForFirstConsumer
 allowVolumeExpansion: true

Second, apply the storage class to your cluster.

 $ kubectl apply -f gp3-def-sc.yaml
 storageclass.storage.k8s.io/gp3 created

Backup

You need to define a backup strategy for the Persistent Volume of the Embedded PostgreSQL, so that you do not lose the metadata, that is the table definitions, that the Denodo Embedded MPP relies upon.

To do this, you can choose between several methods depending on your storage provider.

Otherwise, there is the manual option to perform a backup with a dump of the Embedded PostgreSQL data. This dump generates a text file with SQL commands that, when fed back to the Embedded PostgreSQL, will recreate the database in the same state as it was at the time of the dump.

Dump of the Embedded PostgreSQL
kubectl exec <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive pg_dump -c -U hive -h localhost metastore" > database.sql
Restore the Embedded PostgreSQL
cat database.sql | kubectl exec -i <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive psql -U hive -h localhost -d metastore"

CPU and Memory Management in Kubernetes

Kubernetes uses resource requests and resource limits to efficiently schedule pods across the cluster nodes.

  • Resource Requests: This specifies the minimum amount of a resource (CPU or Memory) that a container requires to function correctly. The Kubernetes scheduler will only place a Denodo Embedded MPP pod on a node that can guarantee the availability of the requested resources.

  • Resource Limits: This specifies the maximum amount of a resource (CPU or Memory) that a container is allowed to consume. Limits prevent a single pod from consuming all available resources on a node.

    • CPU Limits: If a pod tries to use more CPU than its limit, Kubernetes will throttle its CPU usage.

    • Memory Limits: If a pod tries to use more memory than its limit, Kubernetes will terminate (kill) the pod to prevent it from impacting the node. This often results in an “Out-Of-Memory” (OOMKilled) error.

The CPU and Memory resource requests and limits for the Denodo Embedded PostgreSQL pod can be configured within the postgresql section of the values.yaml file:

postgresql:
 resources:
   limits:
     cpu: 1
     memory: 1Gi
   requests:
     cpu: 1
     memory: 1Gi
  • CPU units:

    • 1.0 represents one full CPU core (or vCPU in cloud environments).

    • 0.1 or 100m (100 millicores) represents one-tenth of a CPU core.

  • Memory units:

    • Gi (Gibibytes) is the standard Kubernetes unit for memory. 1Gi = 1024Mi.

Notice that the resources section for metastore is commented out by default in the provided values.yaml. We leave these settings as a choice for the Kubernetes cluster administrator as the optimal CPU and Memory values are highly dependent on the instance types of the Kubernetes nodes, the workload patterns for the Denodo Embedded MPP, etc.

Add feedback