Storing the Metadata on an External Database

The catalog of Virtual DataPort consists of its settings and the metadata created by users: data sources, views, web services, users, roles, etc. By default, the catalog uses the Apache Derby database that is embedded in Virtual DataPort.

You can configure the catalog to use an external database instead of the embedded one. The benefits are:

  1. Automatic replication: if the Virtual DataPort servers of a cluster, share the same external database, you only need to perform a change in one of the servers and it will be propagated to the others, automatically. Otherwise, you have to do the same change on each server of the cluster.

    In this scenario, all the servers of the cluster are “master nodes”. That is, if any Virtual DataPort shuts down, the others can still work without interruption.

  2. Backups: if the administrator of this external database is doing periodic backups of this database, the metadata of Virtual DataPort will also be backed up. This is beneficial even if you do not have a cluster.


This feature is meant to be used in environments where changes in elements are not done in a controlled way. That is, users create and modify views, web services… at any moment. These environments usually are:

  • The development environments, where the developers create data sources, views, web services, etc.

  • Sandbox environments. These environments usually have the same views as the production servers but the users (e.g. data scientists) create additional views to perform analysis.

With this feature you can increase the capacity (i.e. the number of concurrent requests) for this type of environments by setting up a cluster of Virtual DataPort servers that share the same external database.

In the environments where the changes should be done in a controlled way (i.e. staging and production environments), we do not recommend enabling this feature. Instead, we recommend using the Solution Manager to promote changes from the development environment to staging and production. The reasons are:

  1. When this feature is enabled, you cannot use the features of the Solution Manager that promote the changes to each server of a cluster without downtime. That is because, when this feature is enabled, a modification on an element in one server is immediately available to the other servers of the cluster. This can provoke that a client application access a production server during a promotion, when a set of views is replaced by other views.

  2. This feature adds a small overhead to the queries. This overhead is not relevant for analytical queries that last minutes or hours but it can be an issue in operational queries that are expected to complete in milliseconds.


Virtual DataPort replicates these objects:

  • Any object that can be created from the administration tool or the Design Studio: data sources, views, summary, remote tables, associations, web services, JMS listeners, stored procedures and databases.

  • Users, roles and the privileges they have granted, column privileges, row restrictions and custom policies (including their jar extensions).

  • Resources: that is, all the elements uploaded from the dialog Extension management, or created with the commands CREATE JAR or CREATE RESOURCE.

  • All the settings that you may change either from the administration tool, the Design Studio or using the command SET, with the exception of settings related to enabling SSL on Virtual DataPort.

Replication Process

This section explains how, when a server does a change in the catalog (e.g. a user creates a view), the other servers that share the same database become aware of this change.

The catalog of Virtual DataPort has an in-memory cache that stores objects temporarily (this is not the Cache Module that stores data of views obtained from data sources). The goal of this in-memory cache is to be able to access objects (data sources, views…) quicker than if they had to be read from the database every time. When Virtual DataPort starts, this cache is empty and the first time an object is accessed (i.e. used in a query, opened from the administration tool…), the catalog stores this object in this in-memory cache. If later, an object is modified (e.g. the configuration of a data source changes), the catalog updates this cache.

By default, the catalog uses the local database Apache Derby (default option). In this case, the in-memory cache is always updated because only the local server can modify it.

When the catalog uses an external database, it also uses this in-memory cache. However, the catalog has to ensure that the objects retrieved from this cache are up to date. That is because another server could have modified an object since the catalog read it from the database and stored in the cache. To ensure an objects is up to date, the catalog does these steps when another component wants to read an object:

  1. If the in-memory cache does not contain the object, the catalog reads it from the external database and stores it in the in-memory cache.

  2. If the in-memory cache already contains the object, the catalog executes a query on the external database to validate that this object has not been modified in the database since it was read. If the object is outdated, the catalog reads it again and updates the cache.

Regarding web services, the external database stores the web services and also, if they are deployed or not. Every minute, Virtual DataPort, if it is configured to use an external database, checks if each web service is marked as deployed or undeployed in the database. If in a server, the status of the web service is different, it deploys it or undeploys it respectively. E.g. if server #1 deploys the web service “customer”, it stores in the database that the status of the web service is deployed. When server #2 checks the status of the web services, it will deploy “customer” if it has not been deployed already.

The same applies to JMS listeners. When you start or stop a JMS listener on one server, in one minute, the other Virtual DataPort servers will start it or stop it respectively.

Avoiding Conflicts

When Virtual DataPort stores, modifies or deletes an object in the external database, it places a lock on the rows of the tables that is going to modify. This prevents a race condition from occurring when two users connected to different Virtual DataPort servers create, modify or delete the same object (e.g. the same view) simultaneously.

If two users try to create an element of the same type with the same name, one of the users will get an error indicating that an element with that name already exists.

If two users try to modify the same element simultaneously, the changes of the second request will override the changes of the first one.

If two users try to delete the same element simultaneously, one of the users will get an error indicating that the element does not exist.

Databases Supported

Virtual DataPort can store its catalog in these databases:

  • Amazon Aurora MySQL.

  • Amazon Aurora PostgreSQL.

  • IBM DB2 10 and 11

  • Microsoft SQL Server 2012, 2014, 2016 and 2017. Also Azure SQL.

  • MySQL 5.

  • Oracle 11g, Oracle 12c, Oracle 18c, Oracle 19c.

  • PostgreSQL 9, 10, 11 and 12.

To use MySQL, launch the MySQL server with the parameter --max-allowed-packet=500M to ensure MySQL can receive and transfer the resources that Virtual DataPort stores. The documentation of MySQL explains how to set this parameter.

Considerations About This Feature

Take this into account before enabling this feature:

  • To use this feature, the database of the catalog should be “as close as possible” to the Virtual DataPort servers. That is, the database should be on the same network or at least, on the same data center as the servers. If this condition is met, the overhead of using an external database for the catalog is minimal. Otherwise, there may be some delays in the execution of queries.

  • All the Virtual DataPort servers that share the same external database need to have the same update of the Denodo Platform installed. That is because the metadata stored by a Virtual DataPort server may not be readable by another Virtual DataPort that has an older update. Therefore, having several Virtual DataPort servers with different updates sharing the same database is an unsupported configuration.

  • The communication from Virtual DataPort to the external database has to be stable. If at some point, the communication to the external database is not possible because of for example, a network problem, the execution engine cannot start to process any query until the communication is re-established.

  • When Virtual DataPort is starting and detects that the configuration of the catalog changed (e.g. the administrator changed the database to store the catalog), it replaces the file <DENODO_HOME>/conf/vdp/VDBConfiguration.properties with <DENODO_HOME>/setup/vdp/templates/conf/VDBConfiguration.template. This file is only replaced when Virtual DataPort starts after this configuration has changed, not after every restart.

  • When you configure the catalog to use an external database, the metadata is not transferred with the possible exception of the JDBC driver required to connect to this database. If this driver is not included with Denodo, when you click Ok in the Metadata database configuration dialog, Virtual DataPort does the following:

    • If the driver is already stored in the new database, it computes the MD5 hash value of the driver stored in the current database (e.g. the local Derby) and the driver stored in the new database. If the MD5 value is different, Virtual DataPort replaces the driver in the new database with the driver in the current database.

    • If the driver is not stored in the new database, it stores it.

  • When you use an external database, to change a configuration property do not modify the file <DENODO_HOME>/conf/vdp/VDBConfiguration.properties. Instead, change the property using the command SET. This will change the configuration property in the server where you execute it and propagate the change to the other servers that share the same database.

  • When you use an external database, the Apache Derby database embedded with the Denodo Platform cannot be used as cache database.

How to Configure Virtual DataPort to Store the Metadata on an External Database

This section explains how to configure the catalog of Virtual DataPort to use an external database.

Before changing the configuration, consider this:

  • Virtual DataPort does not transfer metadata from one database to the other (not even from the embedded Apache Derby). Therefore, when you do this process, you have to re-create the metadata you had, including its configuration.

  • You will have to enable SSL again.

  • The administrator of the database needs to create a user account in this external database that has these privileges:

    • CREATE/DROP/ALTER TABLE. Also for creating INDEXES over those tables.

    • INSERT/DELETE/UPDATE privilege on own tables.

    • LOCK TABLE (only for DB2 and Oracle)

    • CREATE/DROP SEQUENCE (only for Oracle)

    • CREATE/DROP/REPLACE TRIGGER (only for Oracle)

Now follow these steps:

  1. Create a catalog/schema on the external database that will store the metadata.

  2. If you already created objects (users, data sources, views…), export all the metadata. That is, click the menu File > Export. In this dialog, select Replace existing elements and select all the check boxes below this option.

  3. Click the menu Administration > Metadata database configuration.

External database configuration

External database configuration

  1. In Database adapter, select the database you are going to use. The default value of Driver class is correct.

    If Driver class is empty is because it is not included in Virtual DataPort and has not been uploaded yet to this server. To upload the driver to this server, close this window and follow these steps:

    1. Click the menu File > Extension management.

    2. In this dialog, click Libraries and then, Import.

    3. In the new dialog, in Resource type, select jdbc.

    4. In Version, select the driver you are going to upload. For example, if you are going to upload the driver of IBM DB2 11, select db2-11.

    5. Click Add and select the file(s). To select more than one file, hold the key Ctrl and select each file.

    6. Click Ok to upload the file(s) to Virtual DataPort.

    7. Go back to the Metadata database configuration dialog.

  2. Enter the Database URI, Login and Password.

  3. Select Specify custom catalog and schema.

  4. Important: click Test configuration to ensure that the connection details, the user name and password are correct; and that this account has privileges to create tables in this database.

  5. Click Ok to save.

    The moment you click Ok, Virtual DataPort creates the tables in the external database that will store the catalog but does not start using these tables until you restart.

    If the tables, already exist, it does not re-create them. That is because in the scenario where you configure several Virtual DataPort servers to use an external database, only the first server where you do this change has to create these tables.

  6. Restart Virtual DataPort.

    At this point, Virtual DataPort begins reading all the metadata from the new database.

  7. Log in to Virtual DataPort using the default user account. That is, user name is admin and password, admin. This is because when you switch the database of the catalog, the metadata is not transferred to the new database so there are no users created.

  8. If in step 2, you exported the metadata, import the VQL file.

  9. If in the past, you had removed the user “admin”, delete it as well. To check if you had removed the user “admin”, check if the VQL file contains the statement “CREATE OR REPLACE USER admin”.

  10. If you enabled TLS on the server, repeat the process. That is, execute the script <DENODO_HOME>/bin/denodo_tls_configurator. See more about this in the section Denodo SSL/TLS Configurator Script of the Installation Guide.

  11. If you are building a cluster of Virtual DataPort servers that will all share the same database, do this:

    1. In this first installation, click the menu Administration > Metadata database configuration.

    2. In this dialog, click Export settings and enter “metadata_database_configuration.properties”. This will generate a file with the settings of this dialog.

    3. Copy this file to all the computers that will be part of the cluster.

    4. Stop the Virtual DataPort server of all these computers.

    5. On each computer of the cluster, open a command line and execute this command:

      On Linux
      cd <DENODO_HOME>
      ./bin/regenerateMetadata.sh "metadata_database_configuration.properties"
      
      On Windows
      cd <DENODO_HOME>
      .\bin\regenerateMetadata.bat "metadata_database_configuration.properties"
      

Using a Script to Enable this Feature

Instead of enabling this feature using the Administration Tool, you can do it using the script “regenerateMetadata”. This script is located in the directory <DENODO_HOME>/bin.

This script has two modes:

  1. You pass a “properties” file as argument. This file contains the configuration it has to apply.

  2. You pass all the parameters required to configure this feature on the Virtual DataPort of this installation.

regenerateMetadata -f <input-file> [-reset] [-y]

regenerateMetadata --adapter <adapter-name>
                   --version <adapter-version>
                   --driver <driver-classname>
                   [--driverProperties <driver-properties>]
                   --classPath <driver-classpath>
                   --databaseUri <uri>
                   --user <user>
                   --password <password>
                   [--catalog <catalog>]
                   [--schema <schema>]
                   [--initialSize <pool-initial-size>]
                   [--maxActive <pool-max-active>]
                   [--testConnections]
                   [--pingQuery <pool-validation-query>]
                   [--reset]
                   [--yes]
Parameters of the regenerateMetadata script

Parameter Name

Description

-f

--file <input-file>

Input file with the external database configuration. Generate this input file with the option Export settings of the dialog Metadata database configuration.

-a

--adapter <adapter-name>

Adapter name (one of the supported engines). This value is the same used at VQL property DATABASENAME

E.g. --adapter oracle

-v

--version <adapter-version>

Adapter version (one of the supported engines). This value is the same used at VQL property DATABASEVERSION

E.g. --version 19c

-d

--driver <driver-classname>

Driver class name

E.g. --driver "oracle.jdbc.OracleDriver"

-dp

--driverProperties <driver-properties>

Driver properties with JSON format.

E.g. -dp "{\"prop1\":\"value1\", \"prop2\":\"value2\"}"

-cp

--classPath <driver-classpath>

Driver class path. This value is the same used at VQL property CLASSPATH

E.g. --classPath "oracle-19c"

-uri

--databaseUri <uri>

Database URI.

E.g. --databaseUri "jdbc:oracle:thin:@acme:1521:orcl"

-u

--user <user>

Database user name.

E.g. --user denodo_user

-p

--password <password>

Database user password. The script “encrypt_password” can be used for obtaining an encrypted value.

E.g. --password "my-s3cr3t" or with an encrypted value --password "encrypted:passwordEncrypted"

-c

--catalog <catalog>

Database custom catalog.

E.g. --catalog my_catalog

-s

--schema <schema>

Database custom schema.

E.g. --schema my_schema

-pi

--initialSize <pool-initial-size>

Pool initial size.

E.g. --initialSize 4

-pma

--maxActive <pool-max-active>

Pool max active.

E.g. --maxActive 100

-pt

--testConnections

Pool test connections. It requires a validation query value. (–pingQuery)

E.g. --testConnections

-pq

--pingQuery <pool-validation-query>

Pool validation query for connections.

E.g. --pingQuery "SELECT COUNT(*) FROM SYS.DUAL"

--reset

Reset metadata if exists (disabled by default, reusing metadata if exists.

-y

--yes

Confirm changes automatically (disabled by default)

Examples

Example 1:

Configure the catalog to store the metadata on an external database, using a configuration file generated with the option Export settings of the dialog Metadata database configuration.

regenerateMetadata --file configuration-file.properties

The script will ask for confirmation before applying changes.

Example 2:

Configure the catalog to store the metadata on an external database, using a configuration file. With the parameter --reset, if the tables that store the metadata already exist in the database, they will be deleted and created again.

regenerateMetadata --file configuration-file.properties --reset

Example 3:

Configure the catalog to store the metadata in a PostgreSQL database, in the schema “denodo_metadata”.

regenerateMetadata --adapter postgresql
                   --version 12
                   --classPath "postgresql-12"
                   --databaseUri "jdbc:postgresql://psql-db:5432/vdpcatalog"
                   --driver "org.postgresql.Driver"
                   --user "denodo_user"
                   --password "encrypted:passwordEncrypted"
                   --schema "denodo_metadata"
                   --testConnections
                   --pingQuery "SELECT 1"
                   --driverProperties "{\"prop1\":\"value1\", \"prop2\":\"value2\"}"

In this example, the password (parameter --password) is encrypted. To encrypt this password, execute the script <DENODO_HOME>/bin/encrypt_password and provide the password of this user account in PostgreSQL. The argument for --password is encrypted: followed by the result of this script.

Note that Virtual DataPort will store this password encrypted regardless of if you pass it encrypted to this script or not.

Changing user password

Execute this command to update the password used to access the external database.

regenerateMetadata --password "encrypted:aqlgsesdkloqwepiouimzzkd=="

Note that in this example, the password is also encrypted.

You can also change the password in the dialog Metadata database configuration of the Administration Tool.

Important

Do not modify this password directly or any of the configuration parameters of this feature in the configuration file of Virtual DataPort.