Embedded PostgreSQL¶
Denodo Embedded MPP is shipped with an Embedded PostgreSQL that stores the metadata of the Embedded Hive Metastore.
The Embedded PostgreSQL uses a Kubernetes Persistent Volume to ensure the persistence of the metadata.
The postgresql
section of the values.yaml
configures the persistence options:
postgresql:
enabled: true
pvClaim:
annotations:
# Add the following annotation if you want to preserve
# Embedded MPP metadata after cluster removal
"helm.sh/resource-policy": keep
storage: 5Gi
storageClassName: ""
pvClaim.annotations: Annotations for the Persistent Volume Claim used by the Embedded PostgreSQL.
To preserve the Denodo Embedded MPP metadata after cluster removal you need to add the following annotation:
"helm.sh/resource-policy": keep
pvClaim.storage: Storage available for the Embedded PostgreSQL.
Default size is
5Gi
, but it must be configured according to the scenario and data volumes to be queried, as this influences the size of the metadata.pvClaim.storageClassName: See Persistent Volume section.
Persistent Volume¶
The Embedded PostgreSQL uses a Persistent Volume to store Postgres data and to ensure persistence.
But the Denodo Embedded MPP deployment does not include a Persistent Volume object, as the user instantiating it may not have permission to create Persistent Volumes. It includes a Persistent Volume Claim, that used in conjunction with Storage Class, dynamically requests the Persistent Volume. Therefore, at least one Storage Class has to be defined in your cluster.
To configure the Denodo Embedded MPP Storage Class, there are two options:
Use the actual definition,
pvClaim.storageClassName: ""
, that causes a Persistent Volume to be automatically provisioned for the cluster with the default Storage Class. Many cluster environments have a default Storage Class installed, or Kubernetes administrators can create one.Provide a Storage Class name into the
pvClaim.storageClassName:
field.
Use kubectl
to check for StorageClass objects:
# sc is an acronym for StorageClass
kubectl get sc
NAME PROVISIONER AGE
standard (default) kubernetes.io/gce-pd 1d
gold kubernetes.io/gce-pd 1d
The default StorageClass is marked with (default)
.
Backup¶
You need to define a backup strategy for the Persistent Volume of the Embedded PostgreSQL, so that you do not lose the metadata, that is the table definitions, that the Denodo Embedded MPP relies upon.
To do this, you can choose between several methods depending on your storage provider.
An alternative way to backup the Embedded PostgreSQL data is to dump it out. This dump generates a text file with SQL commands that, when fed back to the Embedded PostgreSQL, will recreate the database in the same state as it was at the time of the dump.
kubectl exec <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive pg_dump -c -U hive -h localhost metastore" > database.sql
cat database.sql | kubectl exec -i <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive psql -U hive -h localhost -d metastore"
CPU and Memory Management in Kubernetes¶
Kubernetes schedules pods across nodes based on the resource requests and limits for CPU and Memory. If a container pod requests certain CPU and/or memory values, Kubernetes will only schedule it on a node that can guarantee those resources. Limits, on the other hand, ensure that a container pod never exceeds a certain value.
postgresql:
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 1
memory: 1Gi
Note that resources
are commented out, as we leave this setting as a choice for the Kubernetes cluster administrator.