From the Server set-up dialog you can configure all the settings of the Data Catalog server. It is divided into the following tabs:
Authentication. To enable the single sign-on with Kerberos in the Data Catalog.
Database. To configure an external database to store the metadata of the Data Catalog.
Permissions. To grant power users the privileges to do additional tasks.
AI Integration. To configure the assisted queries functionality available in the view details tab.
Configure the Virtual DataPort Servers¶
In the Servers section of the VDP servers tab you can configure the Virtual DataPort servers that are listed in the login dialog.
To add a new Virtual DataPort server click on the Add server button and enter the following values:
Name. The name that will be shown in the login page for this server.
URL. The connection URL of the server, which follows the pattern
//<host>:<port>/<database>. You should take into account the following considerations:
Only those users granted with the
CONNECTprivilege on the database will be able to connect to the Data Catalog.
If the server is configured with LDAP authentication for a specific database, use this database in the URL.
Description. A description for the server.
Saved queries are stored per user and Virtual DataPort server. The servers are identified by an internal identifier, so if a server is edited, it will maintain the same queries associated to it. However, if the server is removed, the queries are removed too.
Configure the Connection Settings to the Virtual DataPort Servers¶
In the Connection settings section of the VDP servers tab you can configure the global parameters of the connections created to execute queries against the Virtual DataPort Servers.
These parameters are:
Query timeout. Maximum time in milliseconds that the query will wait for the statement to be completed. If not indicated (or you enter 0 as a value), then it will wait until the execution is complete. By default it is 900,000 milliseconds.
Chunk timeout. Maximum time in milliseconds that the query will wait until it arrives a new chunk of results. Where this time is exceeded, Virtual DataPort returns the results extracted up to that moment. If not specified (or you enter 0 as a value), Virtual DataPort returns all the results together at the end of the statement run. By default it is 90,000 milliseconds.
Chunk size. Number of results that make up a chunk of results. When Virtual DataPort Tool obtains this number of results, it will return them to the Data Catalog, even though the Chunk timeout has not been reached. By default it is 1,000 rows.
Enable Single Sign-On with Kerberos¶
In the Authentication tab you can enable single sign-on with Kerberos in the Data Catalog.
Follow these steps:
Enable the Use Kerberos option.
Enter the Service Principal Name of the Data Catalog in the Server Principal field. You should enter the same value used to generated the keytab file.
Drag-and-drop the keytab file to the Keytab file field. As an alternative, you can click the field and select it from the files browser.
Consider to add a krb5 file to the Configuration file field if one of the following conditions is met and there is no krb5 file in the default location of your system:
The host where the Data Catalog runs does not belong to a Kerberos realm (e.g., a Windows Active Directory domain).
The host where the Data Catalog runs is Linux/Unix.
The service account in Active Directory configured for the Data Catalog has the option constrained delegation enabled.
You are in a cross-domain scenario. That is, the organization has several domains.
Otherwise, you can leave the field empty.
For more details on the krb5 file, go to Providing a Krb5 File for Kerberos Authentication.
In case you run into any issues, select the Activate Kerberos debug mode option. Otherwise, disable it.
For more details on debugging Kerberos issues, go to How to Debug Kerberos in Web Applications.
Click the Save button to confirm the Kerberos configuration. This configuration will take effect immediately. There is no need to restart the Data Catalog.
If no user is able to log in the Data Catalog due to an inappropriate configuration of the Kerberos authentication, remember that you can still connect to the Data Catalog using the local authentication or the authentication with login and password. Then you can modify the Kerberos settings or enable its debug mode.
Use an External Database for the Data Catalog¶
This section explains how to configure the Data Catalog to store its settings on an external database. This is necessary if you want to set up a cluster of Data Catalog servers so all the servers share the same settings and metadata.
By default, the Data Catalog stores the global settings and certain settings for each user in an embedded database (Apache Derby). For example, the saved queries for a user, the fields selected by the administrator to display by default on a view search, etc. You can configure Data Catalog to store all this in an external database.
The Data Catalog supports the following databases:
Make sure that MySQL (or the database in MySQL you are going to use) is configured with the options
Default Charset = utf8and
Collation = utf8_unicode_ci.
Amazon Aurora for MySQL and PostgreSQL
Azure SQL Server
The minimum version required of each database is:
SQL Server 2014
Amazon Aurora for MySQL 5.7 and Amazon Aurora for PostgreSQL 9.5
Follow these steps to store the metadata of the Data Catalog on an external database:
In the external database, create a catalog or a schema for the metadata of the Data Catalog.
Although you can use an existing schema, we suggest creating one to keep the tables separate from the tables of other applications. We suggest reserving 5 gigabytes of space for this schema. In most cases, less space will be required. However, we recommend a high value to avoid issues due to the lack of space in the database.
In this database, create a service account that has create, read and write privileges on that database.
Consider enabling the high availability features of this database to meet higher uptime requirements.
Copy the JDBC driver of the database to the directory
Take this into account:
To use Oracle 12, only copy the files
To use Oracle 18, only copy the files
To use Oracle 19, only copy the files
To use Oracle 21, only copy the files
To use SQL Server, you have available two different drivers:
For any version, use the Microsoft driver. This is the recommended option. To use it, only copy the file
For the 2014 and 2016 versions, the jTDS driver is also available. To use it, copy the file
To use Amazon Aurora for MySQL 5.7, only copy the file
To use Amazon Aurora for PostgreSQL 9.5, only copy the file
To use Azure SQL Server, only copy the file
To use Azure SQL Server via Active Directory, copy the files
Starting with the update 8.0u20220815, to use Azure SQL Server as the external metadata database with ActiveDirectoryPassword authentication method, copy the file
<DENODO_HOME>/lib/contribto the directory
<DENODO_HOME>/lib/data-catalog-extensions, in addition to the files already specified.
You may find the JDBC of other databases in
Log in the Data Catalog with an administrator account and export the metadata.
This is necessary because the metadata of the Data Catalog is not transferred automatically from the current database to the new one. You can skip this step if this is a fresh installation and you have not done any other change to this Data Catalog.
Stop all the components of the Denodo Platform. Then, execute
<DENODO_HOME>/bin/webcontainer_shutdownto make sure the web container is stopped.
Start Data Catalog and the other components of the Denodo Platform.
Log in the Data Catalog with an administrator account and go to Administration > Set-up and management > Server. In the Database tab you will find the dialog to configure the external database.
Provide the following information:
Database. Select the database you want to use.
Driver class. The name of the Java class of the JDBC driver to be used. The default value is usually correct.
URL. The connection URL to the database.
For SQL Server using the Microsoft driver 10.x and above, it is necessary to set the encrypt property to false. To do so, in the database URL, add encrypt=false at the end. For example:
Authentication. Authentication method for accessing the external database. It is only available for Oracle and SQL Server 2014 databases. Select one of these methods:
Login and Password: It is the authentication method by default. Use the credentials of the account used to connect to the database. Username (optional) and Password (optional).
Kerberos with password: Use Kerberos authentication, with the provided Username and Password (from the Active Directory account).
Kerberos with keytab: Use Kerberos authentication, with the provided Username (in this case, the Service Principal Name - SPN -) and Keytab file (no password needed). Drag-and-drop the keytab file to the Keytab file field or as an alternative, you can click the field and select it from the files browser.
Kerberos with Windows User: Use Single Sign-On (SSO) with Kerberos, doing pass-through with the user that launched the server (no user name nor password required).
When the selected database is
Derby Embedded, the fields Driver class, URL, Username and Password are not editable.
To configure the SQL Server 2014 with the jTDS driver, use the value
net.sourceforge.jtds.jdbc.Driveras the Driver class and enter a URL that follows the pattern
To speed up the queries, the Data Catalog creates a pool with connections to the database. You can configure this pool with the following optional settings:
Maximum pool size. The maximum number of actual connections to the database, including both idle and in-use connections.
Minimum idle connections. The minimum number of idle connections that the Data Catalog tries to maintain in the pool. If the idle connections dip below this value and total connections in the pool are less than Maximum pool size, the Data Catalog will make a best effort to add additional connections.
Connection timeout. The maximum number of milliseconds that the Data Catalog will wait for a connection from the pool. If this time is exceeded without a connection becoming available, an error will be thrown.
Ping query. The query that will be executed just before using a connection from the pool to validate that it is still alive. This is required only for legacy drivers that do not support the JDBC 4.0
Connection.isValid()API. If your driver supports JDBC 4.0 we strongly recommend not setting this property.
The jTDS driver for Microsoft SQL Server is considered legacy since it only supports JDBC 3.0. To use it with the Data Catalog you need to provide a value for the Ping query field.
Click the Save button.
The Data Catalog will check that it can reach the database and that the necessary tables exist:
If the JDBC driver for the selected database cannot be loaded, you will see the following warning.
The configuration can still be saved, but the JDBC driver has to be available before restarting the Data Catalog.
If for some reason the database cannot be reached, you will see the following warning.
Again, the configuration can be saved, but you have to fix the error before restarting the Data Catalog. Otherwise, it is not recommended to save it.
If the database does not have the necessary tables, you will see the following dialog.
Confirm if you want the Data Catalog to create the tables for you. It will use the user account you configured for the database, so make sure it has DDL privileges on the database. Alternatively, you can specify another user account with the right privileges.
Click the Cancel button if you want to run the script manually. You will find it on the folder
If the scripts for creating tables are executed manually, it will be necessary in each future platform update and before starting the Data Catalog again to manually execute the possible new scripts to update the database schema, since in this case the Data Catalog will not be able to do it automatically.
If you still want to run the scripts manually, go to the folder
<DENODO_HOME>/scripts/data-catalog/sql/db_scripts/<DATABASE_TYPE>and run these scripts in the schema created for this purpose and configured in the Data Catalog:
- In the case of starting from an empty schema, all the scripts present in the indicated folder must be executed in ascending version order.
In the case of a platform update, all the scripts present in the indicated folder that have not been executed in previous updates must be executed in ascending version order from the last script that has been executed.
In case you are setting-up a cluster of Data Catalog servers, all servers will share the same database. Therefore, you only need to create the tables in the database once.
Restart Data Catalog to apply the changes.
If the Data Catalog does not start due to an error in the database configuration you can restore the default database configuration manually. Edit the file
<DENODO_HOME>/conf/data-catalog/datasource.propertiesand modify its content to match the following (replace
<DENODO_HOME>with the path to the Denodo installation). The property
datasource.url.defaultwill contain the path to the database so you can copy its content into the
datasource.type=derby datasource.url.default=jdbc:derby:<DENODO_HOME>/metadata/data-catalog-database;create=true spring.datasource.driver-class-name=org.apache.derby.jdbc.EmbeddedDriver spring.datasource.url=jdbc:derby:<DENODO_HOME>/metadata/data-catalog-database;create=true spring.datasource.username= spring.datasource.password= datasource.password.encrypted=false
Restart Data Catalog again to apply the changes.
Import the metadata you exported on step #3.
If you are building a cluster of Data Catalogs, configure the load balancer to redirect all the request of the same web session to the same node of the cluster (i.e. sticky sessions).
Configure the Permissions¶
In the Permissions tab you can configure the privileges granted for a role. The privileges are divided into three categories: Management, Administration and Collaboration. Select a category and check all the privileges you want to assign for a role. All the users with that role will be granted with those privileges in the Data Catalog.
To assign privileges in the Permissions dialog, a user needs:
PERMISSIONSprivilege in Data Catalog, which gives access to the Permissions dialog.
ADMINprivilege for a database in Virtual DataPort, which is the minimum requirement to access all the roles available in the Virtual DataPort server.
The workflow requests have a specific category to assign permissions.
To assign request privileges in the Permissions dialog, the user needs the
Manage request permission in Data Catalog.
Configure the Assisted Queries¶
In the API Configuration section of the AI Integration tab you can configure the Assisted Query feature.
The general API configuration parameters are the following ones:
Enable query generation. This enables or disables the feature.
Language options. This option allows the possibility to use multiple language configurations for the query explanation.
User locale. The language that corresponds to the user locale configured in the Data Catalog is used.
Language. The language selected from a list of languages is used.
Browser language. The language configured in the web browser is used.
Execution mode. Choose wether if you have to execute the generated query yourself after the generation is done or if it is automatically executed.
API. The API which the Data Catalog will use to generate the queries.
Enabling this feature will allow the users of the Data Catalog to send metadata information to OpenAI or Azure OpenAI Service when using it. Make sure your organization is aware and agrees on this.
Enabling this feature will result in incurring expenses for requests made to an external API. Make sure that the organization is fully cognizant of this financial implication prior to enabling this option.
The utilization of the automatic execution mode will extend the duration of the assisted query generation process because after the query is generated it will automatically execute it, ultimately delaying the presentation of the final result. It is advisable to take this into consideration when assessing whether you wish to verify the query’s correctness against your requirements before execution. In such cases, opting for the manual mode is strongly recommended.
Currently, there are two options when configuring a remote API: OpenAI or Azure OpenAI Service.
In this section, the OpenAI API parameters are going to be explained.
API Key. This is the OpenAI API Key. It is mandatory to connect to OpenAI and for the functionality to work if you want to configure OpenAI as your API.
Organization ID. If configured, a header to specify which organization is used for an API request will be sent. This is useful for users who belong to multiple organizations.
Model. The model which is going to be used to generate the query. The dropdown values are the ones tested by Denodo. However, if you want to try a untested OpenAI model, you can configure it by pressing the edit icon.
For more information on the OpenAI API parameters go to the OpenAI API reference.
The models shown in the Model field could not work depending on your organization’s OpenAI account. You may or not have access to use them.
Configuring an untested OpenAI model can potentially result in erroneous behavior within the functionality.
Azure OpenAI Service API¶
In this section, the Azure OpenAI Service API parameters are going to be enumerated and explained.
Azure Resource Name. The name of your Azure resource.
Azure Development Name. The deployment name you chose when you deployed the model.
Api Version. The API version to use for this operation. This follows the YYYY-MM-DD format.
Api Key. The API Key.
Max Tokens. The max tokens allowed by the model you deployed.
For more information on the Azure API parameters go to the Azure OpenAI Service REST API reference.