You can translate the document:

Introduction

Container technology is gaining a lot of popularity due to all the benefits it provides. Their efficiency, portability, and agility are making a revolution in the IT departments of many companies. However, this technology requires significant changes in how software operates within it, so it is necessary to review how this new infrastructure interacts with the software it is supporting.

One of the largest differences between containers and older software deployments in physical machines or Virtual Machines is that they do not persist the data by default. Their ephemeral nature means that once the container dies and the stopped container is removed, the data inside the container is also gone. There are some cases where this is not a problem, for instance, changes in the application are not expected, or if the Denodo metadata is stored in an external database. On the other hand, an internal metadata database is used, any time a user, role, view, or any other element is created or changed the files stored in the container will change. Additional configuration will be necessary in order to persist these changes between containers.

Recommended method

To persist configuration and elements between Virtual DataPort servers, it is recommended to leverage the Denodo Docker container configuration scripts and to use an external metadata database. This method works by performing the following:

  • Configuration of the Denodo Platform can be encoded into environment variables that are pulled into the startup scripts to configure the container each time on startup. This supports defining infrastructure as code, and allows for changes to configuration to be quickly updated. The connection to an external metadata database can also be defined with these variables.
  • An external metadata database persists Denodo elements between containers, as each container will automatically retrieve elements from the database after starting up. This also allows for multiple instances of Denodo to be quickly deployed referencing the same metadata, since the database supports multiple Denodo Platform instances accessing the metadata simultaneously.

This is the recommended method for persistence as it has low configuration overhead, does not increase the size of the image, and still allows for deployments of metadata from the Solution Manager without downtime (in Denodo 8 update 20230301.1 and later). Note that in order for deployments to work without downtime, the Denodo ping script with the “-r” option should be used to prevent containers in PROMOTION MODE from being shut down:

  • The ping script with the “-r” option will ensure that the container is ready to accept new connections so it could be targeted by a load balancer.
  • The ping script without the “-r” option could be used to remove the Virtual DataPort servers from the load balancing groups, but not have them shut down (since they are still healthy, but not accepting connections since they are executing changes in PROMOTION MODE).

This happens automatically in the Denodo Helm charts; more information about using these charts can be found in the Denodo Helm Charts Quick Start Guide.

Other methods of persisting data in Denodo containers

In some deployments, the use of an external database is not necessary. This is usually the case if the metadata of the Denodo Platform does not change often, or if reducing the number of involved components and complexity is a priority.

To persist Denodo Platform configuration between containers, one of the following methods can be used (sorted by most recommended to least):

  1. Elements in the Denodo Platform can be imported by loading export files into the “/container-entrypoint-init” directory, and configuration can be defined in environment variables.
  2. Volumes can be mounted from the host (or another volume provider) into the container so the data is persisted (however, note that only one server should access the files at one time).
  3. A new container image can be created from the current modified container, using docker container commit.

These methods will be explained in more detail below.

Loading metadata on startup

In order to start a Denodo container with a specific set of metadata, exports from other Denodo installations can be imported into the container on startup by the entrypoint scripts in the container. This allows for customization of the container without requiring management of an external database, and also allows the enforcement of strict reproducibility in the construction of the Denodo instance.

When the Denodo container is started, it will execute or import .sh, .vql and specific .zip files located under the /container-entrypoint-init directory.

For importing metadata, the import of the files is performed against a temporary Denodo instance that is started in the container before the main process, using the flag –singleuser. After the scripts and VQL are executed, the temporary Denodo instance will be stopped and the main Denodo processes will be started.

For .zip files to be imported, they must match to the following syntax:

  • dc-metadata-*.zip: to import the Data Catalog metadata.
  • denodo-scheduler-*.zip: to import Scheduler metadata.

For more information about this configuration, see the Denodo Docker container configuration article.

The above means that a custom Denodo instance can be deployed either by loading the files into the image in a Dockerfile:

FROM denodo-platform:9.0-latest

COPY C:/Denodo/exports /container-entrypoint-init

Or by mounting the files when starting the container:

docker run -v "/opt/Denodo/exports:/container-entrypoint-init" … denodo-platform:9.0-latest --vdpserver

Using volumes to persist changes

In the case that changes in the Denodo Platform should be propagated between instances of the container and the implementation team has decided to avoid an external metadata database, the metadata database of the Denodo Platform can be mounted in each container.

In Docker, volumes can be added in the docker run command with the option “-v. For instance, in order to persist the metadata folder the following command can be executed:

docker run -v "/opt/Denodo/Metadata:/opt/denodo/metadata" … denodo-platform:9.0-latest --vdpserver

Note that configuration changes that can be made using the startup scripts and environment variables should not be persisted. Additionally, many other configuration changes not available directly in the scripts can still be imported by copying the configuration files into the “/denodo/conf” directory and using the “DENODO_MERGE_CONF” property if necessary. More information can be found on the Denodo Docker container configuration page.

Additionally, please note that only one Denodo container can reference the mounted data at a time. In the case that multiple Denodo containers should reference the same metadata, a copy of the metadata should be taken and separately mounted into the other container.

When persisting metadata, it is necessary to review which folders should be persisted. If the folder contents will not change in the organization’s specific usage of the Denodo Platform, then that folder does not need to be persisted. Only folders whose content might change should be persisted.

For instance, the following folders are subject to change in some scenarios and may require to be persisted:

Folder

Reason

/opt/denodo/bin

The Denodo scripts can be regenerated after changing the JVM configuration

/opt/denodo/conf

The configuration files of Denodo are stored in this folder

/opt/denodo/lib/extensions

Contains JDBC drivers not distributed with Denodo

/opt/denodo/lib/data-catalog-extensions

Contains Jar libraries used by Data Catalog

/opt/denodo/lib/scheduler-extensions

Contains Jar libraries used by Scheduler

/opt/denodo/lib/solution-manager-extensions

Contains Jar libraries used by Solution Manager (used in Solution Manager containers only)

/opt/denodo/logs

The logs are stored in this folder (check the section below for more information on this)

/opt/denodo/metadata

The metadata of the Denodo Platform is saved in this folder

/opt/denodo/resources/apache-tomcat

The folder includes the configuration of the embedded Tomcat

/opt/denodo/extensions/thirdparty/sap-jco

Contains libraries needed by SAP BW and SAP BI data sources

Note that this is not a complete list for all cases, and to mount the volumes it may be necessary to change the ownership of the mounted folders. For more information see Mounting volumes to persist data.

Initialization of volumes in Kubernetes

Volumes in Kubernetes work in a different way than in Docker, and have additional subtleties due to the fact that Kubernetes can manage multiple replicas of an application. When a volume is created in Kubernetes, the volume will be created as an empty folder. In many cases, the directory where the volume should be mounted has some content in the container image, so the general expectation is for this default content to be included in the volume. However, in Kubernetes, if a volume is mounted in the /opt/denodo/metadata path of the Denodo container the folder will be empty instead of containing the default metadata distributed with the Denodo Platform.

In order to achieve the same behavior as Docker volumes with Kubernetes volumes, the volumes must be initialized explicitly before launching pods that will include that volume. Additionally, note that Persistent Volumes must be used to propagate data between successive restarts of the Denodo container.

The below YAML shows a deployment of a Denodo container that is mounting the folder /opt/denodo/metadata in a volume. Notice that without the initContainers section of the deployment, the pod will not start since an empty metadata folder causes the Denodo application to crash:

apiVersion: v1

kind: Service

metadata:

  name: denodo-service

spec:

  selector:

    app: denodo-app

  ports:

  - name: svc-denodo

    protocol: "TCP"

    port: 9999

    targetPort: denodo-port

  - name: svc-web

    protocol: "TCP"

    port: 9090

    targetPort: web-container

  type: LoadBalancer

---

apiVersion: apps/v1

kind: StatefulSet

metadata:

  name: denodo

spec:

  selector:

    matchLabels:

      app: denodo-app

  replicas: 1

  template:

    metadata:

      labels:

        app: denodo-app

    spec:

      hostname: denodo-hostname

      initContainers:

      - name: init-volume

        image: denodo-platform:latest

        command: ["/bin/sh"]

        args:

          - "-ec"

          - |

            if [ ! -d /vol/denodo-metadata ]; then mkdir /vol/denodo-metadata; fi

            if [ -z "$(ls -A /vol/denodo-metadata)" ]; then cp -R /opt/denodo/metadata/* /vol/denodo-metadata/; fi

        volumeMounts:

          - name:  denodo-platform-pvc

            mountPath:  /vol

      containers:

      - name: denodo-container

        image: denodo-platform:latest

        args: ["--vqlserver"]

        ports:

        - name: denodo-port

          containerPort: 9999

        - name: web-container

          containerPort: 9090

        volumeMounts:

        - name: config-volume

          mountPath: /opt/denodo/conf/denodo.lic

          subPath: denodo.lic

        - name: denodo-platform-pvc

          mountPath: /opt/denodo/metadata

          subPath: denodo-metadata # Folder of the volume containing the metadata  

      volumes:

      - name: config-volume

        configMap:

          name: denodo-license

  volumeClaimTemplates: # volumeClaimTemplate ensures that each pod will have a separate volume initialized; the Derby metadata database can only be read by one container at a time.

  - metadata:

      name: denodo-platform-pvc

    spec:

      accessModes:

        - ReadWriteOnce

      resources:

        requests:

          storage:  "1Gi"

In short, the above YAML document starts a container to initialize the volume. If “metadata-volume” has already been initialized then the “cp” command is not executed; otherwise, the content of the “/opt/denodo/metadata” directory is copied into the volume. At this point, since the metadata must be initialized, the main container can start.

Persisting changes in the image

If the Denodo deployment will only have changes to its metadata during the deployment of new features then it may be more effective to persist changes as part of the image. These strategies benefit from the rollout and rollback functionality provided by the container management infrastructure, such as the deployment mechanisms provided by Kubernetes.

Note that data for some parts of the deployment will still reside outside of the Denodo container, like the user authentication database (IdP or LDAP server) or cache database if these modules are in use.

With Docker a new container image can be created from a running container with the following commands:

docker stop <my-container>

docker commit <my-container> <my-image>:<my-tag>

docker start <my-container>

The docker commit command creates a new Docker image including the changes that were made in the running container. This will allow new containers referencing the image to run with the same metadata as the container that was previously stopped. Although it is preferable to generate images with Dockerfiles, this is still a valid option to generate images in a development environment that can be later deployed into Production.

Persisting Denodo logs

If a containerized application unexpectedly ends it is very important to review the container’s logs. However, without additional configuration, the logs will disappear when the container is removed.

To solve these issues an appropriate solution should be chosen to manage the Denodo application’s logs. Some examples of methods to do this are the following:

  • Use volumes to persist logs. In this scenario, a persistent volume must be created for the /opt/denodo/logs directory, but please consider that if multiple containers are logging to this volume there may be conflicts in writing to the files.

  • Change the log configuration (log4j2.xml files) to output data to the standard output stream by using a ConsoleAppender. Note that the Denodo provided entrypoint scripts perform this configuration by default. If multiple logs are redirected to standard output it is recommended to modify the PatternLayout to include a reference to the component that has produced the log entry. This is the default log mechanism supported by container engines and implies that it will be necessary to use the container engine’s tools to check the application logs (for example, using “docker logs -f <container>” or “kubectl logs -f <pod> -c <container>”).

For instance, in order to redirect the vdp.log file content to the standard output so docker logs will display the container logs, the Root logger content in the Log4j configuration file /opt/denodo/conf/vdp/log4j2.xml:

...

<Root level="error">

    <AppenderRef ref="FILEOUT" />

</Root>

...

Can be replaced with the following:

...

<Root level="error">

    <AppenderRef ref="STDOUT" />

</Root>

...

  • Use dedicated sidecar containers for log management. Although the Denodo logging system is flexible, some infrastructure needs can be solved effectively by using sidecar containers with a logging agent. More about this method can be found in the Kubernetes Logging Architecture document.

Disclaimer
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here