Embedded PostgreSQL¶
Denodo Embedded MPP is shipped with an Embedded PostgreSQL that stores the metadata of the Embedded Hive Metastore.
The Embedded PostgreSQL uses a Kubernetes Persistent Volume to ensure the persistence of the metadata.
The postgresql section of the values.yaml configures the persistence options:
postgresql:
enabled: true
pvClaim:
annotations:
# Add the following annotation if you want to preserve
# Embedded MPP metadata after cluster removal
"helm.sh/resource-policy": keep
storage: 5Gi
storageClassName: ""
pvClaim.annotations: Annotations for the Persistent Volume Claim used by the Embedded PostgreSQL.
To preserve the Denodo Embedded MPP metadata after cluster removal you need to add the following annotation:
"helm.sh/resource-policy": keeppvClaim.storage: Storage available for the Embedded PostgreSQL.
Default size is
5Gi, but it must be configured according to the scenario and data volumes to be queried, as this influences the size of the metadata.pvClaim.storageClassName: See Persistent Volume section.
Persistent Volume¶
The Embedded PostgreSQL uses a Persistent Volume to store Postgres data and to ensure persistence.
But the Denodo Embedded MPP deployment does not include a Persistent Volume object, as the user instantiating it may not have permission to create Persistent Volumes. It includes a Persistent Volume Claim, that used in conjunction with Storage Class, dynamically requests the Persistent Volume. Therefore, at least one Storage Class has to be defined in your cluster.
To configure the Denodo Embedded MPP Storage Class, there are two options:
Use the actual definition,
pvClaim.storageClassName: "", that causes a Persistent Volume to be automatically provisioned for the cluster with the default Storage Class. Many cluster environments have a default Storage Class installed, or Kubernetes administrators can create one.Provide a Storage Class name into the
pvClaim.storageClassName:field.
Use kubectl to check for StorageClass objects:
# sc is an acronym for StorageClass
kubectl get sc
NAME PROVISIONER AGE
standard (default) kubernetes.io/gce-pd 1d
gold kubernetes.io/gce-pd 1d
The default StorageClass is marked with (default).
Backup¶
You need to define a backup strategy for the Persistent Volume of the Embedded PostgreSQL, so that you do not lose the metadata, that is the table definitions, that the Denodo Embedded MPP relies upon.
To do this, you can choose between several methods depending on your storage provider.
Otherwise, there is the manual option to perform a backup with a dump of the Embedded PostgreSQL data. This dump generates a text file with SQL commands that, when fed back to the Embedded PostgreSQL, will recreate the database in the same state as it was at the time of the dump.
kubectl exec <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive pg_dump -c -U hive -h localhost metastore" > database.sql
cat database.sql | kubectl exec -i <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive psql -U hive -h localhost -d metastore"
CPU and Memory management in Kubernetes¶
Kubernetes uses resource requests and resource limits to efficiently schedule pods across the cluster nodes.
Resource Requests: This specifies the minimum amount of a resource (CPU or Memory) that a container requires to function correctly. The Kubernetes scheduler will only place a Denodo Embedded MPP pod on a node that can guarantee the availability of the requested resources.
Resource Limits: This specifies the maximum amount of a resource (CPU or Memory) that a container is allowed to consume. Limits prevent a single pod from consuming all available resources on a node.
CPU Limits: If a pod tries to use more CPU than its limit, Kubernetes will throttle its CPU usage.
Memory Limits: If a pod tries to use more memory than its limit, Kubernetes will terminate (kill) the pod to prevent it from impacting the node. This often results in an “Out-Of-Memory” (OOMKilled) error.
The CPU and Memory resource requests and limits for the Denodo Embedded PostgreSQL pod can be configured within the
postgresql section of the values.yaml file:
postgresql:
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
CPU units:
1.0 represents one full CPU core (or vCPU in cloud environments).
0.1 or 100m (100 millicores) represents one-tenth of a CPU core.
Memory units:
Gi (Gibibytes) is the standard Kubernetes unit for memory. 1Gi = 1024Mi.
Notice that the resources section for metastore is commented out by default in the provided values.yaml.
We leave these settings as a choice for the Kubernetes cluster administrator as the optimal CPU and Memory values are highly dependent on
the instance types of the Kubernetes nodes, the workload patterns for the Denodo Embedded MPP, etc.
