USER MANUALS


Embedded PostgreSQL

Denodo Embedded MPP is shipped with an Embedded PostgreSQL that stores the metadata of the Embedded Hive Metastore.

The Embedded PostgreSQL uses a Kubernetes Persistent Volume to ensure the persistence of the metadata.

The postgresql section of the values.yaml configures the persistence options:

postgresql:
  enabled: true

  pvClaim:
    annotations:
      # Add the following annotation if you want to preserve
      # Embedded MPP metadata after cluster removal
      "helm.sh/resource-policy": keep

    storage: 5Gi
    storageClassName: ""
  • pvClaim.annotations: Annotations for the Persistent Volume Claim used by the Embedded PostgreSQL.

    To preserve the Denodo Embedded MPP metadata after cluster removal you need to add the following annotation:

    "helm.sh/resource-policy": keep

  • pvClaim.storage: Storage available for the Embedded PostgreSQL.

    Default size is 5Gi, but it must be configured according to the scenario and data volumes to be queried, as this influences the size of the metadata.

  • pvClaim.storageClassName: See Persistent Volume section.

Persistent Volume

The Embedded PostgreSQL uses a Persistent Volume to store Postgres data and to ensure persistence.

But the Denodo Embedded MPP deployment does not include a Persistent Volume object, as the user instantiating it may not have permission to create Persistent Volumes. It includes a Persistent Volume Claim, that used in conjunction with Storage Class, dynamically requests the Persistent Volume. Therefore, at least one Storage Class has to be defined in your cluster.

To configure the Denodo Embedded MPP Storage Class, there are two options:

  1. Use the actual definition, pvClaim.storageClassName: "", that causes a Persistent Volume to be automatically provisioned for the cluster with the default Storage Class. Many cluster environments have a default Storage Class installed, or Kubernetes administrators can create one.

  2. Provide a Storage Class name into the pvClaim.storageClassName: field.

Use kubectl to check for StorageClass objects:

# sc is an acronym for StorageClass
kubectl get sc

NAME                         PROVISIONER               AGE
standard (default)           kubernetes.io/gce-pd      1d
gold                         kubernetes.io/gce-pd      1d

The default StorageClass is marked with (default).

Backup

You need to define a backup strategy for the Persistent Volume of the Embedded PostgreSQL, so that you do not lose the metadata, that is the table definitions, that the Denodo Embedded MPP relies upon.

To do this, you can choose between several methods depending on your storage provider.

An alternative way to backup the Embedded PostgreSQL data is to dump it out. This dump generates a text file with SQL commands that, when fed back to the Embedded PostgreSQL, will recreate the database in the same state as it was at the time of the dump.

Dump of the Embedded PostgreSQL
kubectl exec <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive pg_dump -c -U hive -h localhost metastore" > database.sql
Restore the Embedded PostgreSQL
cat database.sql | kubectl exec -i <PostgreSQL Pod> -- bash -c "PGPASSWORD=hive psql -U hive -h localhost -d metastore"

CPU and Memory Management in Kubernetes

Kubernetes schedules pods across nodes based on the resource requests and limits for CPU and Memory. If a container pod requests certain CPU and/or memory values, Kubernetes will only schedule it on a node that can guarantee those resources. Limits, on the other hand, ensure that a container pod never exceeds a certain value.

postgresql:
 resources:
   limits:
     cpu: 1
     memory: 1Gi
   requests:
     cpu: 1
     memory: 1Gi

Note that resources are commented out, as we leave this setting as a choice for the Kubernetes cluster administrator.

Add feedback