Embedded PostgreSQL¶
Denodo Embedded MPP is shipped with an Embedded PostgreSQL that stores the metadata of the Embedded Hive Metastore.
The Embedded PostgreSQL uses a Kubernetes Persistent Volume to ensure the persistence of the metadata.
The postgresql section of the values.yaml configures the persistence options:
postgresql:
enabled: true
pvClaim:
annotations:
# Add the following annotation if you want to preserve
# Embedded MPP metadata after cluster removal
"helm.sh/resource-policy": keep
storage: 5Gi
storageClassName: ""
pvClaim.annotations: Annotations for the Persistent Volume Claim used by the Embedded PostgreSQL.
To preserve the Denodo Embedded MPP metadata after cluster removal you need to add the following annotation:
"helm.sh/resource-policy": keeppvClaim.storage: Storage available for the Embedded PostgreSQL.
Default size is
5Gi, but it must be configured according to the scenario and data volumes to be queried, as this influences the size of the metadata.pvClaim.storageClassName: See Persistent Volume section.
Persistent Volume¶
The Embedded PostgreSQL uses a Persistent Volume to store Postgres data and to ensure persistence.
But the Denodo Embedded MPP deployment does not include a Persistent Volume object, as the user instantiating it may not have permission to create Persistent Volumes. It includes a Persistent Volume Claim, that used in conjunction with Storage Class, dynamically requests the Persistent Volume. Therefore, at least one Storage Class has to be defined in your cluster.
To configure the Denodo Embedded MPP Storage Class, there are two options:
If the cluster has a default Storage Class installed leave the
storageClassNameempty,pvClaim.storageClassName: "". This causes a Persistent Volume to be automatically provisioned for the cluster with the default Storage Class.Provide a Storage Class name into the
pvClaim.storageClassName:field.
Use
kubectlto check for StorageClass objects. The default StorageClass is marked with(default):# sc is an acronym for StorageClass kubectl get sc NAME PROVISIONER AGE standard (default) kubernetes.io/gce-pd 1d gold kubernetes.io/gce-pd 1d
Amazon Elastic Kubernetes Service (EKS)¶
Starting with 1.30, Amazon EKS no longer includes the default annotation on the gp2 StorageClass resource applied to newly created clusters. This has no impact if you are referencing this storage class by name.
But if you were relying on having a default StorageClass in the cluster you have to configure it now in pvClaim.storageClassName in the values.yaml file.
Since Amazon recommends using gp3 in any scenario where gp2 might be employed, we strongly recommend to create a StorageClass to provision EBS gp3 volumes.
Amazon EBS gp3 volumes are the latest generation of general-purpose SSD-based EBS volumes that enable customers to provision performance independent of storage capacity, while providing up to 20% lower price per GB than existing gp2 volumes.
The following example defines a Kubernetes StorageClass that provisions Amazon EBS volumes using the gp3 type.
First, create a file named gp3-def-sc.yaml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" name: gp3 parameters: type: gp3 provisioner: ebs.csi.aws.com reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer allowVolumeExpansion: true
Second, apply the storage class to your cluster.
$ kubectl apply -f gp3-def-sc.yaml storageclass.storage.k8s.io/gp3 created
Backup¶
You need to define a backup strategy for the Persistent Volume of the Embedded PostgreSQL, so that you do not lose the metadata, that is the table definitions, that the Denodo Embedded MPP relies upon.
To do this, you can choose between several methods depending on your storage provider.
Otherwise, there is the manual option to perform a backup with a dump of the Embedded PostgreSQL data. This dump generates a text file with SQL commands that, when fed back to the Embedded PostgreSQL, will recreate the database in the same state as it was at the time of the dump.
kubectl exec <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive pg_dump -c -U hive -h localhost metastore" > database.sql
cat database.sql | kubectl exec -i <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive psql -U hive -h localhost -d metastore"
CPU and Memory Management in Kubernetes¶
Kubernetes uses resource requests and resource limits to efficiently schedule pods across the cluster nodes.
Resource Requests: This specifies the minimum amount of a resource (CPU or Memory) that a container requires to function correctly. The Kubernetes scheduler will only place a Denodo Embedded MPP pod on a node that can guarantee the availability of the requested resources.
Resource Limits: This specifies the maximum amount of a resource (CPU or Memory) that a container is allowed to consume. Limits prevent a single pod from consuming all available resources on a node.
CPU Limits: If a pod tries to use more CPU than its limit, Kubernetes will throttle its CPU usage.
Memory Limits: If a pod tries to use more memory than its limit, Kubernetes will terminate (kill) the pod to prevent it from impacting the node. This often results in an “Out-Of-Memory” (OOMKilled) error.
The CPU and Memory resource requests and limits for the Denodo Embedded PostgreSQL pod can be configured within the
postgresql section of the values.yaml file:
postgresql:
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
CPU units:
1.0 represents one full CPU core (or vCPU in cloud environments).
0.1 or 100m (100 millicores) represents one-tenth of a CPU core.
Memory units:
Gi (Gibibytes) is the standard Kubernetes unit for memory. 1Gi = 1024Mi.
Notice that the resources section for metastore is commented out by default in the provided values.yaml.
We leave these settings as a choice for the Kubernetes cluster administrator as the optimal CPU and Memory values are highly dependent on
the instance types of the Kubernetes nodes, the workload patterns for the Denodo Embedded MPP, etc.
