Storing the Metadata on an External Database

The catalog of Virtual DataPort consists of its settings and the metadata created by users: data sources, views, web services, users, roles, etc. By default, the catalog uses the Apache Derby database that is embedded in Virtual DataPort.

You can configure the catalog to use an external database instead of the embedded one. The benefits are:

  1. Automatic replication: if the Virtual DataPort servers of a cluster, share the same external database, you only need to perform a change in one of the servers and it will be propagated to the others, automatically. Otherwise, you have to do the same change on each server of the cluster.

    In this scenario, all the servers of the cluster are “master nodes”. That is, if any Virtual DataPort shuts down, the others can still work without interruption.

  2. Backups: if the administrator of this external database is doing periodic backups of this database, the metadata of Virtual DataPort will also be backed up. This is beneficial even if you do not have a cluster.

This feature is meant to be used in environments where changes in elements are not done in a controlled way. That is, users create and modify views, web services… at any moment. These environments usually are:

  • The development environments, where the developers create data sources, views, web services, etc.

  • Sandbox environments. These environments usually have the same views as the production servers but the users (e.g. data scientists) create additional views to perform analysis.

With this feature you can increase the capacity (i.e. the number of concurrent requests) for this type of environments by setting up a cluster of Virtual DataPort servers that share the same external database.

In the environments where the changes should be done in a controlled way (i.e. staging and production environments), we do not recommend enabling this feature. Instead, we recommend using the Solution Manager to promote changes from the development environment to staging and production. The reasons are:

  1. When this feature is enabled, you cannot use the features of the Solution Manager that promote the changes to each server of a cluster without downtime. That is because, when this feature is enabled, a modification on an element in one server is immediately available to the other servers of the cluster. This can provoke that a client application access a production server during a promotion, when a set of views is replaced by other views.

  2. This feature adds a small overhead to the queries. This overhead is not relevant for analytical queries that last minutes or hours but it can be an issue in operational queries that are expected to complete in milliseconds.

Virtual DataPort replicates these objects:

  • Any object that can be created from the administration tool or the Design Studio: data sources, views, summary, remote tables, associations, web services, JMS listeners, stored procedures and databases.

  • Users, roles and the privileges they have granted, column privileges, row restrictions and custom policies (including their jar extensions).

  • Resources: that is, all the elements uploaded from the dialog Extension management, or created with the commands CREATE JAR or CREATE RESOURCE.

  • All the settings that you may change either from the administration tool, the Design Studio or using the command SET, with the exception of settings related to enabling SSL on Virtual DataPort.

Replication Process

This section explains how, when a server does a change in the catalog (e.g. a user creates a view), the other servers that share the same database become aware of this change.

The catalog of Virtual DataPort has an in-memory cache that stores objects temporarily (this is not the Cache Module that stores data of views obtained from data sources). The goal of this in-memory cache is to be able to access objects (data sources, views…) quicker than if they had to be read from the database every time. When Virtual DataPort starts, this cache is empty and the first time an object is accessed (i.e. used in a query, opened from the administration tool…), the catalog stores this object in this in-memory cache. If later, an object is modified (e.g. the configuration of a data source changes), the catalog updates this cache.

By default, the catalog uses the local database Apache Derby (default option). In this case, the in-memory cache is always updated because only the local server can modify it.

When the catalog uses an external database, it also uses this in-memory cache. However, the catalog has to ensure that the objects retrieved from this cache are up to date. That is because another server could have modified an object since the catalog read it from the database and stored in the cache. To ensure an objects is up to date, the catalog does these steps when another component wants to read an object:

  1. If the in-memory cache does not contain the object, the catalog reads it from the external database and stores it in the in-memory cache.

  2. If the in-memory cache already contains the object, the catalog executes a query on the external database to validate that this object has not been modified in the database since it was read. If the object is outdated, the catalog reads it again and updates the cache.

Regarding web services, the external database stores the web services and also, if they are deployed or not. Every minute, Virtual DataPort, if it is configured to use an external database, checks if each web service is marked as deployed or undeployed in the database. If in a server, the status of the web service is different, it deploys it or undeploys it respectively. E.g. if server #1 deploys the web service “customer”, it stores in the database that the status of the web service is deployed. When server #2 checks the status of the web services, it will deploy “customer” if it has not been deployed already.

The same applies to JMS listeners. When you start or stop a JMS listener on one server, in one minute, the other Virtual DataPort servers will start it or stop it respectively.

Avoiding Conflicts

When Virtual DataPort stores, modifies or deletes an object in the external database, it places a lock on the rows of the tables that is going to modify. This prevents a race condition from occurring when two users connected to different Virtual DataPort servers create, modify or delete the same object (e.g. the same view) simultaneously.

If two users try to create an element of the same type with the same name, one of the users will get an error indicating that an element with that name already exists.

If two users try to modify the same element simultaneously, the changes of the second request will override the changes of the first one.

If two users try to delete the same element simultaneously, one of the users will get an error indicating that the element does not exist.

How to Configure Virtual DataPort to Store the Metadata on an External Database

This section explains how to configure the catalog of Virtual DataPort to use an external database.

Before changing the configuration, consider this:

  • Virtual DataPort does not transfer metadata from one database to the other (not even from the embedded Apache Derby). Therefore, when you do this process, you have to re-create the metadata you had, including its configuration.

  • You will have to enable SSL again.

The steps below take into account these situations.

  1. Create a catalog/schema on the external database that will store the metadata.

  2. If you already created objects (users, data sources, views…), export all the metadata. That is, click the menu File > Export. In this dialog, select Replace existing elements and select all the check boxes below this option.

  3. Click the menu Administration > Metadata database configuration.

  4. In Database adapter, select the database you are going to use and in Driver class, the driver.

    If Driver class is empty is because it is not included in Virtual DataPort and has not been uploaded yet to this server. To upload the driver to this server, close this window and follow these steps:

    1. Click the menu File > Extension management.

    2. In this dialog, click Libraries and then, Import.

    3. In the new dialog, in Resource type, select jdbc.

    4. In Version, select the driver you are going to upload. For example, if you are going to upload the driver of IBM DB2 11, select db2-11.

    5. Click Add and select the file(s). To select more than one file, hold the key Ctrl and select each file.

    6. Click Ok to upload the file(s) to Virtual DataPort.

    7. Go back to the Metadata database configuration dialog.

  1. Enter the Database URI, Login and Password.

  2. Select Specify custom catalog and schema.

  3. Important: click Test configuration to ensure that the connection details, the user name and password are correct; and that this account has privileges to create tables in this database.

  4. Click Ok to save.

    The moment you click Ok, Virtual DataPort creates the tables in the external database that will store the catalog but does not start using these tables until you restart.

    If the tables, already exist, it does not re-create them. That is because in the scenario where you configure several Virtual DataPort servers to use an external database, only the first server where you do this change has to create these tables.

  5. Restart Virtual DataPort.

    At this point, Virtual DataPort begins reading all the metadata from the new database.

  6. Log in to Virtual DataPort using the default user account. That is, user name is admin and password, admin. This is because when you switch the database of the catalog, the metadata is not transferred to the new database so there are no users created.

  7. If in step 2, you exported the metadata, import the VQL file.

  8. If in the past, you had removed the user “admin”, delete it as well. To check if you had removed the user “admin”, check if the VQL file contains the statement “CREATE OR REPLACE USER admin”.

  9. If you enabled TLS on the server, repeat the process. That is, execute the script <DENODO_HOME>/bin/denodo_tls_configurator. See more about this in the section Denodo SSL/TLS Configurator Script of the Installation Guide.

  10. If you are building a cluster of Virtual DataPort servers that will all share the same database, you need to repeat all the steps above, except one: you only need to export and import the VQL file on the first server where you configure the external database. The metadata will be propagated automatically to the other servers once you configure them to use the external database.

Some Considerations About This Feature

  • To use this feature, the database of the catalog should be “as close as possible” to the Virtual DataPort servers. That is, the database should be on the same network or at least, on the same data center as the servers. If this condition is met, the overhead of using an external database for the catalog is minimal. Otherwise, there may be some delays in the execution of queries.

  • All the Virtual DataPort servers that share the same external database need to have the same update of the Denodo Platform installed. That is because the metadata stored by a Virtual DataPort server may not be readable by another Virtual DataPort that has an older update. Therefore, having several Virtual DataPort servers with different updates sharing the same database is an unsupported configuration.

  • The communication from Virtual DataPort to the external database has to be stable. If at some point, the communication to the external database is not possible because of for example, a network problem, the execution engine cannot start to process any query until the communication is re-established.

  • When Virtual DataPort is starting and detects that the configuration of the catalog changed (e.g. the administrator changed the database to store the catalog), it replaces the file <DENODO_HOME>/conf/vdp/ with <DENODO_HOME>/setup/vdp/templates/conf/VDBConfiguration.template. This file is only replaced when Virtual DataPort starts after this configuration has changed, not after every restart.

  • When you configure the catalog to use an external database, the metadata is not transferred with the possible exception of the JDBC driver required to connect to this database. If this driver is not included with Denodo, when you click Ok in the Metadata database configuration dialog, Virtual DataPort does the following:

    • If the driver is already stored in the new database, it computes the MD5 hash value of the driver stored in the current database (e.g. the local Derby) and the driver stored in the new database. If the MD5 value is different, Virtual DataPort replaces the driver in the new database with the driver in the current database.

    • If the driver is not stored in the new database, it stores it.

  • When you use an external database, to change a configuration property do not modify the file <DENODO_HOME>/conf/vdp/ Instead, change the property using the command SET. This will change the configuration property in the server where you execute it and propagate the change to the other servers that share the same database.

  • When you use an external database, Apache Derby is not available for using as cache engine.

Databases Supported

Virtual DataPort can store its catalog in these databases:

  • IBM DB2 10 and 11

  • Microsoft SQL Server 2012, 2014, 2016 and 2017

  • MySQL 5

  • Oracle 11g, Oracle 12c, Oracle 18c, Oracle 19c

  • PostgreSQL 9 and 10.

To use MySQL, launch the MySQL server with the parameter --max-allowed-packet=500M to ensure MySQL can receive and transfer the resources that Virtual DataPort stores. The documentation of MySQL explains how to set this parameter.