Storing the Metadata on an External Database¶
The catalog of Virtual DataPort consists of its settings and the metadata created by users: data sources, views, web services, users, roles, etc. By default, the catalog uses the Apache Derby database that is embedded in Virtual DataPort.
Note
We only recommend using this feature on some environments. Take into account the considerations below.
Note
This feature is disabled when accessing Design Studio from Agora since Agora automatically configures an external database.
You can configure the catalog to use an external database instead of the embedded one. The benefits are:
Automatic replication: if the Virtual DataPort servers of a cluster, share the same external database, you only need to perform a change in one of the servers and it will be propagated to the others, automatically. Otherwise, you have to do the same change on each server of the cluster.
In this scenario, all the servers of the cluster are “master nodes”. That is, if any Virtual DataPort shuts down, the others can still work without interruption.
Backups: if the administrator of this external database is doing periodic backups of this database, the metadata of Virtual DataPort will also be backed up. This is beneficial even if you do not have a cluster.
This feature is meant to be used in environments where changes in elements are not done in a controlled way. That is, users create and modify views, web services… at any moment. These environments usually are:
The development environments, where the developers create data sources, views, web services, etc.
Sandbox environments. These environments usually have the same views as the production servers but the users (e.g. data scientists) create additional views to perform analysis.
With this feature you can increase the capacity (i.e. the number of concurrent requests) for this type of environments by setting up a cluster of Virtual DataPort servers that share the same external database.
Take into account the following before deciding to use an external database for metadata:
All the nodes must be configured with the same encryption key. See Installation Encryption Key for more details.
When this feature is enabled, you cannot use the features of the Solution Manager that promote the changes to each server of a cluster without downtime. That is because, when this feature is enabled, a modification on an element in one server is immediately available to the other servers of the cluster. This can provoke that a client application access a production server during a promotion, when a set of views is replaced by other views.
This feature adds a small overhead to the queries. This overhead is not relevant for analytical queries that last minutes or hours but it can be an issue in operational queries that are expected to complete in milliseconds.
Important
Using this feature without the same encryption key in all nodes will make the VDP servers to not work properly and a metadata reset will be required.
Virtual DataPort replicates these objects:
Any object that can be created from the administration tool or the Design Studio: data sources, views, summary, remote tables, associations, web services, JMS listeners, stored procedures and databases.
Users, roles and the privileges they have granted, column privileges, row restrictions and custom policies (including their jar extensions).
Resources: that is, all the elements uploaded from the dialog Extension management, or created with the commands
CREATE JAR
orCREATE RESOURCE
.All the settings that you may change either from the administration tool, the Design Studio or using the command SET, with the exception of settings related to enabling SSL on Virtual DataPort.
Replication Process¶
This section explains how, when a server does a change in the catalog (e.g. a user creates a view), the other servers that share the same database become aware of this change.
The catalog of Virtual DataPort has an in-memory cache that stores objects temporarily (this is not the Cache Module that stores data of views obtained from data sources). The goal of this in-memory cache is to be able to access objects (data sources, views…) quicker than if they had to be read from the database every time. When Virtual DataPort starts, this cache is empty and the first time an object is accessed (i.e. used in a query, opened from the administration tool…), the catalog stores this object in this in-memory cache. If later, an object is modified (e.g. the configuration of a data source changes), the catalog updates this cache.
By default, the catalog uses the local database Apache Derby (default option). In this case, the in-memory cache is always updated because only the local server can modify it.
When the catalog uses an external database, it also uses this in-memory cache. However, the catalog has to ensure that the objects retrieved from this cache are up to date. That is because another server could have modified an object since the catalog read it from the database and stored in the cache. To ensure an objects is up to date, the catalog does these steps when another component wants to read an object:
If the in-memory cache does not contain the object, the catalog reads it from the external database and stores it in the in-memory cache.
If the in-memory cache already contains the object, the catalog executes a query on the external database to validate that this object has not been modified in the database since it was read. If the object is outdated, the catalog reads it again and updates the cache.
Regarding web services, the external database stores the configuration of the web services and their status (i.e. if they are deployed or not). Every minute, Virtual DataPort, if it is configured to use an external database, checks if each web service is marked as deployed or undeployed in the database. If in a server, the status of the web service is different, it deploys it or undeploys it respectively. E.g. if server #1 deploys the web service “customer”, it stores in the database that the status of the web service is deployed. When server #2 checks the status of the web services, it will deploy “customer” if it has not been deployed already.
The same applies to JMS listeners. When you start or stop a JMS listener on one server, in one minute, the other Virtual DataPort servers will start it or stop it respectively.
Avoiding Conflicts¶
When Virtual DataPort stores, modifies or deletes an object in the external database, it places a lock on the rows of the tables that is going to modify. This prevents a race condition from occurring when two users connected to different Virtual DataPort servers create, modify or delete the same object (e.g. the same view) simultaneously.
If two users try to create an element of the same type with the same name, one of the users will get an error indicating that an element with that name already exists.
If two users try to modify the same element simultaneously, the changes of the second request will override the changes of the first one.
If two users try to delete the same element simultaneously, one of the users will get an error indicating that the element does not exist.
storing_catalog_on_external_database
Databases Supported¶
Virtual DataPort can store its catalog in these databases:
Amazon Aurora MySQL.
Amazon Aurora PostgreSQL.
IBM DB2 10, 11 and higher.
Microsoft SQL Server 2014, 2016, 2017, 2019 and higher. Also Azure SQL.
MySQL 5.7, 8 and higher and GCP Cloud SQL for MySQL.
Oracle 12c, Oracle 19c and higher.
PostgreSQL 11, 12 and higher and GCP Cloud SQL for PostgreSQL.
To use MySQL, launch the MySQL server with the parameter --max-allowed-packet=500M
to ensure MySQL can receive and transfer the resources that Virtual DataPort stores. The documentation of MySQL explains how to set this parameter.
Note
For SQL Server, we recommend enabling snapshots isolation and read committed snapshots for the database where Virtual DataPort metadata is stored, so Virtual DataPort can benefit from read committed isolation during Solution Manager deployments.
Considerations About This Feature¶
Take this into account before enabling this feature:
To use this feature, the database of the catalog should be “as close as possible” to the Virtual DataPort servers. That is, the database should be on the same network or at least, on the same data center as the servers. If this condition is met, the overhead of using an external database for the catalog is minimal. Otherwise, there may be some delays in the execution of queries.
Virtual DataPort manages its internal tables by itself. Making manual modifications on them could cause different types of malfunctioning on the system.
All the Virtual DataPort servers that share the same external database need to have the same update of the Denodo Platform installed. That is because the metadata stored by a Virtual DataPort server may not be readable by another Virtual DataPort that has an older update. Therefore, having several Virtual DataPort servers with different updates sharing the same database is an unsupported configuration.
When you install a Denodo Platform update in the Virtual DataPort servers of a cluster that share the same external database, do not execute any queries or do actions that modify the metadata in any of the servers of the cluster until the update process is completed in all the servers. That is, do not create/modify/delete views, data sources…, do not publish web services, etc. until the update is installed on all the servers.
The communication from Virtual DataPort to the external database has to be stable. If at some point, the communication to the external database is not possible because of for example, a network problem, the execution engine cannot start to process any query until the communication is re-established.
When Virtual DataPort is starting and detects that the configuration of the catalog changed (e.g. the administrator changed the database to store the catalog), it replaces the file
<DENODO_HOME>/conf/vdp/VDBConfiguration.properties
with<DENODO_HOME>/setup/vdp/templates/conf/VDBConfiguration.template
. This file is only replaced when Virtual DataPort starts after this configuration has changed, not after every restart.When you configure the catalog to use an external database, the metadata is not transferred with the possible exception of the JDBC driver required to connect to this database. If this driver is not included with Denodo, when you click Ok in the Metadata database configuration dialog, Virtual DataPort does the following:
If the driver is already stored in the new database, it computes the MD5 hash value of the driver stored in the current database (e.g. the local Derby) and the driver stored in the new database. If the MD5 value is different, Virtual DataPort replaces the driver in the new database with the driver in the current database.
If the driver is not stored in the new database, it stores it.
When you use an external database, to change a configuration property do not modify the file
<DENODO_HOME>/conf/vdp/VDBConfiguration.properties
. Instead, change the property using the command SET. This will change the configuration property in the server where you execute it and propagate the change to the other servers that share the same database.If you have a cluster of Denodo servers, when an application executes the command ENTER SINGLE USER MODE, only the Denodo server that receives this command switches to single user mode; the others continue operating normally. This occurs regardless of if Virtual DataPort stores the metadata on an external database or not.
When you use an external database, the Apache Derby database embedded with the Denodo Platform cannot be used as cache database.
When you use Microsoft SQL Server or Azure SQL as external database catalog, by default, Virtual DataPort configures the text columns in the metadata catalog with
VARCHAR
type withLatin1_General_BIN
as collation. If you are using Non-Unicode characters for Virtual DataPort catalog elements it is necessary you to configure those columns withNVARCHAR
type. To do so, execute the following before the metadata catalog creation:Execute this command before configuring Virtual DataPort to store the metadata on an external database:
SET 'com.denodo.vdb.catalog.storage.sqlserver.textColumnAsNVarchar' = 'true';
Configure Virtual DataPort to store the metadata on an external database. If you had already done this process, you have to use an empty catalog/schema so in the tables that will store the metadata, the columns for text values have the data type
NVARCHAR
.Restart Virtual DataPort for the changes to be applied.
To go back to the default behavior, execute this:
SET 'com.denodo.vdb.catalog.storage.sqlserver.textColumnAsNVarchar' = NULL;
Then, you have to perform again a catalog change to a clean schema in order to the columns to be created as
VARCHAR
type.
How to Configure Virtual DataPort to Store the Metadata on an External Database¶
This section explains how to configure the catalog of Virtual DataPort to use an external database.
Before changing the configuration, consider this:
Virtual DataPort does not transfer metadata from one database to the other (not even from the embedded Apache Derby). Therefore, when you do this process, you have to re-create the metadata you had, including its configuration.
You will have to enable SSL again.
The administrator of the database needs to create a user account in this external database that has these privileges:
CREATE/DROP/ALTER TABLE. Also for creating INDEXES over those tables.
INSERT/DELETE/UPDATE privilege on own tables.
LOCK TABLE (only for DB2 and Oracle)
CREATE/DROP SEQUENCE (only for Oracle)
CREATE/DROP/REPLACE TRIGGER (only for Oracle)
Now follow these steps:
Create a catalog/schema in the external database that will store the metadata.
Consider enabling the high availability features of this database to meet higher uptime requirements.
If you already created objects (users, data sources, views…), export all the metadata. That is, click the menu File > Export. In this dialog, select Replace existing elements and select all the check boxes below this option.
Click the menu Administration > Metadata database configuration.
In Database adapter, select the database you are going to use. The default value of Driver class is correct.
If Driver class is empty is because it is not included in Virtual DataPort and has not been uploaded yet to this server. To upload the driver to this server, close this window and follow these steps:
Click the menu File > Extension management.
In this dialog, click Libraries and then, Import.
In the new dialog, in Resource type, select jdbc.
In Version, select the driver you are going to upload. For example, if you are going to upload the driver of IBM DB2 11, select db2-11.
Click Add and select the file(s). To select more than one file, hold the key Ctrl and select each file.
Click Ok to upload the file(s) to Virtual DataPort.
Go back to the Metadata database configuration dialog.
Enter the Database URI, Login and Password.
Select Specify custom catalog and schema.
Important: click Test configuration to ensure that the connection details, the user name and password are correct; and that this account has privileges to create tables in this database.
Click Ok to save.
The moment you click Ok, Virtual DataPort creates the tables in the external database that will store the catalog but does not start using these tables until you restart.
If the tables, already exist, it does not re-create them. That is because in the scenario where you configure several Virtual DataPort servers to use an external database, only the first server where you do this change has to create these tables.
Restart Virtual DataPort.
At this point, Virtual DataPort begins reading all the metadata from the new database.
Log in to Virtual DataPort using the default user account. That is, user name is
admin
and password,admin
. This is because when you switch the database of the catalog, the metadata is not transferred to the new database so there are no users created.If in step 2, you exported the metadata, import the VQL file.
If in the past, you had removed the user “admin”, delete it as well. To check if you had removed the user “admin”, check if the VQL file contains the statement “CREATE OR REPLACE USER admin”.
If you enabled TLS on the server, repeat the process. That is, execute the script
<DENODO_HOME>/bin/denodo_tls_configurator
. See more about this in the section Denodo SSL/TLS Configurator Script of the Installation Guide.If you are building a cluster of Virtual DataPort servers that will all share the same database, do this:
In this first installation, click the menu Administration > Metadata database configuration.
In this dialog, click Export settings and enter “metadata_database_configuration.properties”. This will generate a file with the settings of this dialog.
Copy this file to all the computers that will be part of the cluster.
Stop the Virtual DataPort server of all these computers.
On each computer of the cluster, open a command line and execute this command:
cd <DENODO_HOME> ./bin/regenerateMetadata.sh -f "metadata_database_configuration.properties"
cd <DENODO_HOME> .\bin\regenerateMetadata.bat -f "metadata_database_configuration.properties"
Using a Script to Enable this Feature¶
Instead of enabling this feature using the Administration Tool, you can do it using the script “regenerateMetadata”. This script is located in the directory <DENODO_HOME>/bin
.
This script has two modes:
You pass a “properties” file as argument. This file contains the configuration it has to apply.
You pass all the parameters required to configure this feature on the Virtual DataPort of this installation.
regenerateMetadata -f <input-file> [--externalDriver <path-to-external-driver>] [-reset] [-y]
regenerateMetadata --adapter <adapter-name>
--version <adapter-version>
--driver <driver-classname>
[--driverProperties <driver-properties>]
--classPath <driver-classpath>
[--externalDriver <path-to-external-driver>]
--databaseUri <uri>
--user <user>
--password <password>
[--useKerberos]
[--useKeyTab]
[--keyTabFile <path-to-keytab-file>]
[--useKrbCache]
[--krbCacheFile <path-to-krb-cache-file>]
[--catalog <catalog>]
[--schema <schema>]
[--initialSize <pool-initial-size>]
[--maxActive <pool-max-active>]
[--testConnections]
[--pingQuery <pool-validation-query>]
[--reset]
[--yes]
Parameter Name |
Description |
---|---|
-f --file <input-file> |
Input file with the external database configuration. Generate this input file with the option Export settings of the dialog Metadata database configuration. |
-a --adapter <adapter-name> |
Adapter name (one of the supported engines).
This value is the same used at VQL
property E.g. |
-v --version <adapter-version> |
Adapter version (one of the supported engines).
This value is the same used at VQL
property E.g. |
-d --driver <driver-classname> |
Driver class name E.g. |
-dp --driverProperties <path-to-external-driver> |
Driver properties with JSON format. E.g.
|
-cp --classPath <driver-classpath> |
Driver class path.
This value is the same used at VQL
property E.g. |
-e --externalDriver <driver-classpath> |
External location to driver file or files. This value is the path to a file or directory. E.g. |
-uri --databaseUri <uri> |
Database URI. E.g. |
-u --user <user> |
Database user name. E.g. |
-p --password <password> |
Database user password. The script “encrypt_password” can be used for obtaining an encrypted value. E.g. |
-krb --useKerberos |
Use kerberos authentication. E.g. |
-uktab --useKeyTab |
Use keytab. E.g. |
-ktab --keyTabFile <path-to-keytab-file> |
Keytab file. E.g. |
-ukcache --useKrbCache |
Use kerberos credentials cache. E.g. |
-kcache --krbCacheFile <path-to-krb-cache-file> |
Kerberos credentials cache file. E.g. |
-c --catalog <catalog> |
Database custom catalog. E.g. |
-s --schema <schema> |
Database custom schema. E.g. |
-pi --initialSize <pool-initial-size> |
Pool initial size. E.g. |
-pma --maxActive <pool-max-active> |
Pool max active. E.g. |
-pt --testConnections |
Pool test connections. It requires a validation query value. (–pingQuery) E.g. |
-pq --pingQuery <pool-validation-query> |
Pool validation query for connections. E.g. |
--reset |
Reset metadata if exists (disabled by default, reusing metadata if exists). |
-y --yes |
Confirm changes automatically (disabled by default) |
Examples
Example 1:
Configure the catalog to store the metadata on an external database, using a configuration file generated with the option Export settings of the dialog Metadata database configuration.
regenerateMetadata --file configuration-file.properties
The script will ask for confirmation before applying changes.
Example 2:
Configure the catalog to store the metadata on an external database, using a configuration file. With the parameter --reset
, if the tables that store the metadata already exist in the database, they will be deleted and created again.
regenerateMetadata --file configuration-file.properties --reset
Example 3:
Configure the catalog to store the metadata in a PostgreSQL database, in the schema “denodo_metadata”.
regenerateMetadata --adapter postgresql
--version 12
--classPath "postgresql-12"
--databaseUri "jdbc:postgresql://psql-db:5432/vdpcatalog"
--driver "org.postgresql.Driver"
--user "denodo_user"
--password "encrypted:passwordEncrypted"
--schema "denodo_metadata"
--testConnections
--pingQuery "SELECT 1"
--driverProperties "{\"prop1\":\"value1\", \"prop2\":\"value2\"}"
In this example, the password (parameter --password
) is encrypted. To encrypt this password, execute the script <DENODO_HOME>/bin/encrypt_password
and provide the password of this user account in PostgreSQL. The argument for --password
is encrypted:
followed by the result of this script.
Note that Virtual DataPort will store this password encrypted regardless of if you pass it encrypted to this script or not.
Example 4:
Configure the catalog to store the metadata in a PostgreSQL database, in the schema “denodo_metadata”.
regenerateMetadata --adapter mysql
--version 5
--classPath "mysql-5"
--externalDriver "/my-drivers/mysql5"
--databaseUri "jdbc:mysql://mysql-db:3306/vdpcatalog"
--driver "com.mysql.jdbc.Driver"
--user "denodo_user"
--password "encrypted:passwordEncrypted"
--schema "denodo_metadata"
--testConnections
--pingQuery "SELECT 1"
--driverProperties "{\"prop1\":\"value1\", \"prop2\":\"value2\"}"
In this example, we use parameter --externalDriver
indicating an external location to Denodo where the MySQL driver
is present. The script will use that location for loading the driver and opening the connection. Also, the driver will
be copied to internal Denodo class path location indicated by parameter --classPath
, allowing the server to load
the driver when Virtual DataPort is started.
Note that --externalDriver
parameter value can be a file or a directory. In this case, all files present at the directory
are used to load the driver.
Finally, the password (parameter --password
) is encrypted. To encrypt this password, execute the
script <DENODO_HOME>/bin/encrypt_password
and provide the password of this user account in MySQL. The
argument for --password
is encrypted:
followed by the result of this script.
Note that Virtual DataPort will store this password encrypted regardless of if you pass it encrypted to this script or not.
Example 5:
Configure the catalog to store the metadata in a Oracle database, using Kerberos for authentication.
regenerateMetadata --adapter oracle
--version 19c
--classPath "oracle-19c"
--databaseUri "jdbc:oracle:thin:@oracle-db:1521/orcl"
--driver "oracle.jdbc.OracleDriver"
--user "denodo_user"
--password ""
--useKerberos
--useKeyTab
--keyTabFile /my_files/krb.keytab
--schema "denodo_metadata"
--testConnections
--pingQuery "SELECT COUNT(*) FROM SYS.DUAL"
--driverProperties "{\"prop1\":\"value1\", \"prop2\":\"value2\"}"
In this example, we use parameter --useKerberos
indicating that Kerberos is going to be used for authentication.
Also, the --useKeyTab
parameter indicates that the authentication will be done using a keytab file, which path is
established through the --keyTabFile
parameter.
Example 6:
Configure the catalog to store the metadata in a Oracle database, using Kerberos for authentication.
regenerateMetadata --adapter oracle
--version 19c
--classPath "oracle-19c"
--databaseUri "jdbc:oracle:thin:@oracle-db:1521/orcl"
--driver "oracle.jdbc.OracleDriver"
--user ""
--password ""
--useKerberos
--useKrbCache
--krbCacheFile /my_files/krbCacheFile
--schema "denodo_metadata"
--testConnections
--pingQuery "SELECT COUNT(*) FROM SYS.DUAL"
--driverProperties "{\"prop1\":\"value1\", \"prop2\":\"value2\"}"
In this example, we use parameter --useKerberos
indicating that Kerberos is going to be used for authentication.
With the parameter --useKrbCache
we specify the use of the Kerberos credentials file, whose path is indicated
using the parameter --krbCacheFile
. Notice that the parameter --krbCacheFile
is optional and it only must
be used when you want to specify a Kerberos credentials file different than the default one.
Changing user password
In order to update the user and/or password used to access the external database, you have to execute this script giving also the rest of the configuration parameters. An easy way is using a configuration file exported with the option Export settings of the dialog Metadata database configuration, overriding the user credentials. See Example 1.
Important
Do not modify this password directly or any of the configuration parameters of this feature in the configuration file of Virtual DataPort.