Introduction
Container technology is gaining a lot of popularity due to all the benefits it provides. Their efficiency, portability, and agility are making a revolution in the IT departments of many companies. However, this technology requires significant changes in how software operates within it, so it is necessary to review how this new infrastructure interacts with the software it is supporting.
One of the largest differences between containers and older software deployments in physical machines or Virtual Machines is that they do not persist the data by default. Their ephemeral nature means that once the container dies and the stopped container is removed, the data inside the container is also gone. There are some cases where this is not a problem, for instance, changes in the application are not expected, or if the Denodo metadata is stored in an external database. On the other hand, an internal metadata database is used, any time a user, role, view, or any other element is created or changed the files stored in the container will change. Additional configuration will be necessary in order to persist these changes between containers.
Recommended method
To persist configuration and elements between Virtual DataPort servers, it is recommended to leverage the Denodo Docker container configuration scripts and to use an external metadata database. This method works by performing the following:
- Configuration of the Denodo Platform can be encoded into environment variables that are pulled into the startup scripts to configure the container each time on startup. This supports defining infrastructure as code, and allows for changes to configuration to be quickly updated. The connection to an external metadata database can also be defined with these variables.
- An external metadata database persists Denodo elements between containers, as each container will automatically retrieve elements from the database after starting up. This also allows for multiple instances of Denodo to be quickly deployed referencing the same metadata, since the database supports multiple Denodo Platform instances accessing the metadata simultaneously.
This is the recommended method for persistence as it has low configuration overhead, does not increase the size of the image, and still allows for deployments of metadata from the Solution Manager without downtime (in Denodo 8 update 20230301.1 and later). Note that in order for deployments to work without downtime, the Denodo ping script with the “-r” option should be used to prevent containers in PROMOTION MODE from being shut down:
- The ping script with the “-r” option will ensure that the container is ready to accept new connections so it could be targeted by a load balancer.
- The ping script without the “-r” option could be used to remove the Virtual DataPort servers from the load balancing groups, but not have them shut down (since they are still healthy, but not accepting connections since they are executing changes in PROMOTION MODE).
This happens automatically in the Denodo Helm charts; more information about using these charts can be found in the Denodo Helm Charts Quick Start Guide.
Other methods of persisting data in Denodo containers
In some deployments, the use of an external database is not necessary. This is usually the case if the metadata of the Denodo Platform does not change often, or if reducing the number of involved components and complexity is a priority.
To persist Denodo Platform configuration between containers, one of the following methods can be used (sorted by most recommended to least):
- Elements in the Denodo Platform can be imported by loading export files into the “/container-entrypoint-init” directory, and configuration can be defined in environment variables.
- Volumes can be mounted from the host (or another volume provider) into the container so the data is persisted (however, note that only one server should access the files at one time).
- A new container image can be created from the current modified container, using docker container commit.
These methods will be explained in more detail below.
Loading metadata on startup
In order to start a Denodo container with a specific set of metadata, exports from other Denodo installations can be imported into the container on startup by the entrypoint scripts in the container. This allows for customization of the container without requiring management of an external database, and also allows the enforcement of strict reproducibility in the construction of the Denodo instance.
When the Denodo container is started, it will execute or import .sh, .vql and specific .zip files located under the /container-entrypoint-init directory.
For importing metadata, the import of the files is performed against a temporary Denodo instance that is started in the container before the main process, using the flag –singleuser. After the scripts and VQL are executed, the temporary Denodo instance will be stopped and the main Denodo processes will be started.
For .zip files to be imported, they must match to the following syntax:
- dc-metadata-*.zip: to import the Data Catalog metadata.
- denodo-scheduler-*.zip: to import Scheduler metadata.
For more information about this configuration, see the Denodo Docker container configuration article.
The above means that a custom Denodo instance can be deployed either by loading the files into the image in a Dockerfile:
FROM denodo-platform:9.0-latest COPY C:/Denodo/exports /container-entrypoint-init |
Or by mounting the files when starting the container:
docker run -v "/opt/Denodo/exports:/container-entrypoint-init" … denodo-platform:9.0-latest --vdpserver |
Using volumes to persist changes
In the case that changes in the Denodo Platform should be propagated between instances of the container and the implementation team has decided to avoid an external metadata database, the metadata database of the Denodo Platform can be mounted in each container.
In Docker, volumes can be added in the docker run command with the option “-v”. For instance, in order to persist the metadata folder the following command can be executed:
docker run -v "/opt/Denodo/Metadata:/opt/denodo/metadata" … denodo-platform:9.0-latest --vdpserver |
Note that configuration changes that can be made using the startup scripts and environment variables should not be persisted. Additionally, many other configuration changes not available directly in the scripts can still be imported by copying the configuration files into the “/denodo/conf” directory and using the “DENODO_MERGE_CONF” property if necessary. More information can be found on the Denodo Docker container configuration page.
Additionally, please note that only one Denodo container can reference the mounted data at a time. In the case that multiple Denodo containers should reference the same metadata, a copy of the metadata should be taken and separately mounted into the other container.
When persisting metadata, it is necessary to review which folders should be persisted. If the folder contents will not change in the organization’s specific usage of the Denodo Platform, then that folder does not need to be persisted. Only folders whose content might change should be persisted.
For instance, the following folders are subject to change in some scenarios and may require to be persisted:
Folder |
Reason |
/opt/denodo/bin |
The Denodo scripts can be regenerated after changing the JVM configuration |
/opt/denodo/conf |
The configuration files of Denodo are stored in this folder |
/opt/denodo/lib/extensions |
Contains JDBC drivers not distributed with Denodo |
/opt/denodo/lib/data-catalog-extensions |
Contains Jar libraries used by Data Catalog |
/opt/denodo/lib/scheduler-extensions |
Contains Jar libraries used by Scheduler |
/opt/denodo/lib/solution-manager-extensions |
Contains Jar libraries used by Solution Manager (used in Solution Manager containers only) |
/opt/denodo/logs |
The logs are stored in this folder (check the section below for more information on this) |
/opt/denodo/metadata |
The metadata of the Denodo Platform is saved in this folder |
/opt/denodo/resources/apache-tomcat |
The folder includes the configuration of the embedded Tomcat |
/opt/denodo/extensions/thirdparty/sap-jco |
Contains libraries needed by SAP BW and SAP BI data sources |
Note that this is not a complete list for all cases, and to mount the volumes it may be necessary to change the ownership of the mounted folders. For more information see Mounting volumes to persist data.
Initialization of volumes in Kubernetes
Volumes in Kubernetes work in a different way than in Docker, and have additional subtleties due to the fact that Kubernetes can manage multiple replicas of an application. When a volume is created in Kubernetes, the volume will be created as an empty folder. In many cases, the directory where the volume should be mounted has some content in the container image, so the general expectation is for this default content to be included in the volume. However, in Kubernetes, if a volume is mounted in the /opt/denodo/metadata path of the Denodo container the folder will be empty instead of containing the default metadata distributed with the Denodo Platform.
In order to achieve the same behavior as Docker volumes with Kubernetes volumes, the volumes must be initialized explicitly before launching pods that will include that volume. Additionally, note that Persistent Volumes must be used to propagate data between successive restarts of the Denodo container.
The below YAML shows a deployment of a Denodo container that is mounting the folder /opt/denodo/metadata in a volume. Notice that without the initContainers section of the deployment, the pod will not start since an empty metadata folder causes the Denodo application to crash:
apiVersion: v1 kind: Service metadata: name: denodo-service spec: selector: app: denodo-app ports: - name: svc-denodo protocol: "TCP" port: 9999 targetPort: denodo-port - name: svc-web protocol: "TCP" port: 9090 targetPort: web-container type: LoadBalancer --- apiVersion: apps/v1 kind: StatefulSet metadata: name: denodo spec: selector: matchLabels: app: denodo-app replicas: 1 template: metadata: labels: app: denodo-app spec: hostname: denodo-hostname initContainers: - name: init-volume image: denodo-platform:latest command: ["/bin/sh"] args: - "-ec" - | if [ ! -d /vol/denodo-metadata ]; then mkdir /vol/denodo-metadata; fi if [ -z "$(ls -A /vol/denodo-metadata)" ]; then cp -R /opt/denodo/metadata/* /vol/denodo-metadata/; fi volumeMounts: - name: denodo-platform-pvc mountPath: /vol containers: - name: denodo-container image: denodo-platform:latest args: ["--vqlserver"] ports: - name: denodo-port containerPort: 9999 - name: web-container containerPort: 9090 volumeMounts: - name: config-volume mountPath: /opt/denodo/conf/denodo.lic subPath: denodo.lic - name: denodo-platform-pvc mountPath: /opt/denodo/metadata subPath: denodo-metadata # Folder of the volume containing the metadata volumes: - name: config-volume configMap: name: denodo-license volumeClaimTemplates: # volumeClaimTemplate ensures that each pod will have a separate volume initialized; the Derby metadata database can only be read by one container at a time. - metadata: name: denodo-platform-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: "1Gi" |
In short, the above YAML document starts a container to initialize the volume. If “metadata-volume” has already been initialized then the “cp” command is not executed; otherwise, the content of the “/opt/denodo/metadata” directory is copied into the volume. At this point, since the metadata must be initialized, the main container can start.
Persisting changes in the image
If the Denodo deployment will only have changes to its metadata during the deployment of new features then it may be more effective to persist changes as part of the image. These strategies benefit from the rollout and rollback functionality provided by the container management infrastructure, such as the deployment mechanisms provided by Kubernetes.
Note that data for some parts of the deployment will still reside outside of the Denodo container, like the user authentication database (IdP or LDAP server) or cache database if these modules are in use.
With Docker a new container image can be created from a running container with the following commands:
docker stop <my-container> docker commit <my-container> <my-image>:<my-tag> docker start <my-container> |
The docker commit command creates a new Docker image including the changes that were made in the running container. This will allow new containers referencing the image to run with the same metadata as the container that was previously stopped. Although it is preferable to generate images with Dockerfiles, this is still a valid option to generate images in a development environment that can be later deployed into Production.
Persisting Denodo logs
If a containerized application unexpectedly ends it is very important to review the container’s logs. However, without additional configuration, the logs will disappear when the container is removed.
To solve these issues an appropriate solution should be chosen to manage the Denodo application’s logs. Some examples of methods to do this are the following:
- Push the Denodo logs to a log aggregation system. For an example in AWS, see the use Amazon CloudWatch to monitor Denodo article.
- Store the logs in an external system. For example, store Denodo logs in Amazon S3.
- Use volumes to persist logs. In this scenario, a persistent volume must be created for the /opt/denodo/logs directory, but please consider that if multiple containers are logging to this volume there may be conflicts in writing to the files.
- Change the log configuration (log4j2.xml files) to output data to the standard output stream by using a ConsoleAppender. Note that the Denodo provided entrypoint scripts perform this configuration by default. If multiple logs are redirected to standard output it is recommended to modify the PatternLayout to include a reference to the component that has produced the log entry. This is the default log mechanism supported by container engines and implies that it will be necessary to use the container engine’s tools to check the application logs (for example, using “docker logs -f <container>” or “kubectl logs -f <pod> -c <container>”).
For instance, in order to redirect the vdp.log file content to the standard output so docker logs will display the container logs, the Root logger content in the Log4j configuration file /opt/denodo/conf/vdp/log4j2.xml:
... <Root level="error"> <AppenderRef ref="FILEOUT" /> </Root> ... |
Can be replaced with the following:
... <Root level="error"> <AppenderRef ref="STDOUT" /> </Root> ... |
- Use dedicated sidecar containers for log management. Although the Denodo logging system is flexible, some infrastructure needs can be solved effectively by using sidecar containers with a logging agent. More about this method can be found in the Kubernetes Logging Architecture document.
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.