Server Set-Up¶
From the Server set-up dialog you can configure all the settings of the Data Catalog server. It is divided into the following tabs:
VDP servers. To configure the Virtual DataPort servers available from the login page and the connection settings used in every query.
Authentication. To enable the single sign-on with Kerberos in the Data Catalog.
Database. To configure an external database to store the metadata of the Data Catalog.
Permissions. To grant power users the privileges to do additional tasks.
AI Integration. To configure the assisted queries functionality available in the view details tab.
Configure the Virtual DataPort Servers¶
In the Servers section of the VDP servers tab you can configure the Virtual DataPort servers that are listed in the login dialog.
Note
VDP Server configuration is disabled in Agora.
To add a new Virtual DataPort server click on the Add server button and enter the following values:
Name. The name that will be shown in the login page for this server.
URL. The connection URL of the server, which follows the pattern
//<host>:<port>/<database>
. You should take into account the following considerations:Only those users granted with the
CONNECT
privilege on the database will be able to connect to the Data Catalog.If the server is configured with LDAP authentication for a specific database, use this database in the URL.
Description. A description for the server.
Note
Saved queries are stored per user and Virtual DataPort server. The servers are identified by an internal identifier, so if a server is edited, it will maintain the same queries associated to it. However, if the server is removed, the queries are removed too.
Note
For Automated Cloud Mode you can configure the default server by editing the following properties in the file $DENODO_HOME/conf/data-catalog/DataCatalogBackend.properties
com.denodo.dc.vdp.port=<port>
com.denodo.dc.vdp.host=<hostname>
com.denodo.dc.vdp.servername=<servername>
Configure the Connection Settings to the Virtual DataPort Servers¶
In the Connection settings section of the VDP servers tab you can configure the global parameters of the connections created to execute queries against the Virtual DataPort Servers.
These parameters are:
Query timeout. Maximum time in milliseconds that the query will wait for the statement to be completed. If not indicated (or you enter 0 as a value), then it will wait until the execution is complete. By default it is 900,000 milliseconds.
Chunk timeout. Maximum time in milliseconds that the query will wait until it arrives a new chunk of results. Where this time is exceeded, Virtual DataPort returns the results extracted up to that moment. If not specified (or you enter 0 as a value), Virtual DataPort returns all the results together at the end of the statement run. By default it is 90,000 milliseconds.
Chunk size. Number of results that make up a chunk of results. When Virtual DataPort Tool obtains this number of results, it will return them to the Data Catalog, even though the Chunk timeout has not been reached. By default it is 1,000 rows.
Enable Single Sign-On with Kerberos¶
In the Authentication tab you can enable single sign-on with Kerberos in the Data Catalog.
Note
Kerberos configuration is disabled in Agora since SSO is automatically configured.
Follow these steps:
Enable the Use Kerberos option.
Enter the Service Principal Name of the Data Catalog in the Server Principal field. You should enter the same value used to generated the keytab file.
Drag-and-drop the keytab file to the Keytab file field. As an alternative, you can click the field and select it from the files browser.
Consider to add a krb5 file to the Configuration file field if one of the following conditions is met and there is no krb5 file in the default location of your system:
The host where the Data Catalog runs does not belong to a Kerberos realm (e.g., a Windows Active Directory domain).
The host where the Data Catalog runs is Linux/Unix.
The service account in Active Directory configured for the Data Catalog has the option constrained delegation enabled.
You are in a cross-domain scenario. That is, the organization has several domains.
Otherwise, you can leave the field empty.
See also
For more details on the krb5 file, go to Providing a Krb5 File for Kerberos Authentication.
In case you run into any issues, select the Activate Kerberos debug mode option. Otherwise, disable it.
See also
For more details on debugging Kerberos issues, go to How to Debug Kerberos in Web Applications.
Click the Save button to confirm the Kerberos configuration. This configuration will take effect immediately. There is no need to restart the Data Catalog.
Important
If no user is able to log in the Data Catalog due to an inappropriate configuration of the Kerberos authentication, remember that you can still connect to the Data Catalog using the local authentication or the authentication with login and password. Then you can modify the Kerberos settings or enable its debug mode.
Use an External Database for the Data Catalog¶
This section explains how to configure the Data Catalog to store its settings on an external database. This is necessary if you want to set up a cluster of Data Catalog servers so all the servers share the same settings and metadata.
Note
This option is disabled when accessing Data Catalog from Agora since Agora automatically configures an external database.
By default, the Data Catalog stores the global settings and certain settings for each user in an embedded database (Apache Derby). For example, the saved queries for a user, the fields selected by the administrator to display by default on a view search, etc. You can configure Data Catalog to store all this in an external database.
The Data Catalog supports the following databases:
MySQL
Note
Make sure that MySQL (or the database in MySQL you are going to use) is configured with the options
Default Charset = utf8
andCollation = utf8_unicode_ci
.Oracle
PostgreSQL
SQL Server
Amazon Aurora for MySQL and PostgreSQL
Azure SQL Server
Note
The minimum version required of each database is:
MySQL 5.7
Oracle 12c
PostgreSQL 11
SQL Server 2014
Amazon Aurora for MySQL 5.7 and Amazon Aurora for PostgreSQL 11
Follow these steps to store the metadata of the Data Catalog on an external database:
In the external database, create a catalog or a schema for the metadata of the Data Catalog.
Although you can use an existing schema, we suggest creating one to keep the tables separate from the tables of other applications. We suggest reserving 5 gigabytes of space for this schema. In most cases, less space will be required. However, we recommend a high value to avoid issues due to the lack of space in the database.
In this database, create a service account that has create, read and write privileges on that database.
Consider enabling the high availability features of this database to meet higher uptime requirements.
Copy the JDBC driver of the database to the directory
<DENODO_HOME>/lib/data-catalog-extensions
.Take this into account:
To use Oracle 12, only copy the files
ojdbc8.jar
andorai18n.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-12c
.To use Oracle 18, only copy the files
ojdbc8.jar
andorai18n.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-18c
.To use Oracle 19, only copy the files
ojdbc8.jar
andorai18n.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-19c
.To use Oracle 21, only copy the files
ojdbc8.jar
andorai18n.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-21c
.To use SQL Server, you have available two different drivers:
For any version, use the Microsoft driver. This is the recommended option. To use it, only copy the file
mssql-jdbc-10.2.0.jre8.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/mssql-jdbc-10.x
.For the 2014 and 2016 versions, the jTDS driver is also available. To use it, copy the file
denodo-jtds-1.3.1.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/denodo-jtds-1.3.1
.
To use Amazon Aurora for MySQL 5.7, only copy the file
mariadb-java-client-2.7.1.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/mariadb-2.7
.To use Amazon Aurora for PostgreSQL 11, only copy the file
postgresql-42.7.2
from<DENODO_HOME>/lib/extensions/jdbc-drivers/postgresql-11
.To use Azure SQL Server, only copy the file
mssql-jdbc-7.2.2.jre8.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/mssql-jdbc-7.x
.To use Azure SQL Server via Active Directory, copy the files
mssql-jdbc-7.2.2.jre8.jar
,adal4j-1.6.7.jar
,gson-2.9.0.jar
andoauth2-oidc-sdk-8.36.2.jar
from<DENODO_HOME>/lib/extensions/jdbc-drivers/mssql-jdbc-7.x
; andaccessors-smart.jar
,json-smart.jar
,javax.mail.jar
andnimbus-jose-jwt.jar
from<DENODO_HOME>/lib/contrib
.To use Azure SQL Server as the external metadata database with ActiveDirectoryPassword authentication method, copy the file
<DENODO_HOME>/lib/contrib/content-type.jar
to the directory<DENODO_HOME>/lib/data-catalog-extensions
, in addition to the files already specified.You may find the JDBC of other databases in
<DENODO_HOME>/lib/extensions/jdbc-drivers
.
Log in the Data Catalog with an administrator account and export the metadata.
This is necessary because the metadata of the Data Catalog is not transferred automatically from the current database to the new one. You can skip this step if this is a fresh installation and you have not done any other change to this Data Catalog.
Stop all the components of the Denodo Platform. Then, execute
<DENODO_HOME>/bin/webcontainer_shutdown
to make sure the web container is stopped.Start Data Catalog and the other components of the Denodo Platform.
Log in the Data Catalog with an administrator account and go to Administration > Set-up and management > Server. In the Database tab you will find the dialog to configure the external database.
Provide the following information:
Database. Select the database you want to use.
Driver class. The name of the Java class of the JDBC driver to be used. The default value is usually correct.
URL. The connection URL to the database.
Important
For SQL Server using the Microsoft driver 10.x and above, it is necessary to set the encrypt property to false. To do so, in the database URL, add encrypt=false at the end. For example:
jdbc:sqlserver://host:port;databaseName=database;encrypt=false
Authentication. Authentication method for accessing the external database. It is only available for Oracle and SQL Server 2014 databases. Select one of these methods:
Login and Password: It is the authentication method by default. Use the credentials of the account used to connect to the database. Username (optional) and Password (optional).
Kerberos with password: Use Kerberos authentication, with the provided Username and Password (from the Active Directory account).
Kerberos with keytab: Use Kerberos authentication, with the provided Username (in this case, the Service Principal Name - SPN -) and Keytab file (no password needed). Drag-and-drop the keytab file to the Keytab file field or as an alternative, you can click the field and select it from the files browser.
Kerberos with Windows User: Use Single Sign-On (SSO) with Kerberos, doing pass-through with the user that launched the server (no user name nor password required).
Note
When the selected database is
Derby Embedded
, the fields Driver class, URL, Username and Password are not editable.Note
To configure the SQL Server 2014 with the jTDS driver, use the value
net.sourceforge.jtds.jdbc.Driver
as the Driver class and enter a URL that follows the patternjdbc:jtds:sqlserver://<host>:<port>/<database>
To speed up the queries, the Data Catalog creates a pool with connections to the database. You can configure this pool with the following optional settings:
Maximum pool size. The maximum number of actual connections to the database, including both idle and in-use connections.
Minimum idle connections. The minimum number of idle connections that the Data Catalog tries to maintain in the pool. If the idle connections dip below this value and total connections in the pool are less than Maximum pool size, the Data Catalog will make a best effort to add additional connections.
Connection timeout. The maximum number of milliseconds that the Data Catalog will wait for a connection from the pool. If this time is exceeded without a connection becoming available, an error will be thrown.
Ping query. The query that will be executed just before using a connection from the pool to validate that it is still alive. This is required only for legacy drivers that do not support the JDBC 4.0
Connection.isValid()
API. If your driver supports JDBC 4.0 we strongly recommend not setting this property.Note
The jTDS driver for Microsoft SQL Server is considered legacy since it only supports JDBC 3.0. To use it with the Data Catalog you need to provide a value for the Ping query field.
Click the Save button.
The Data Catalog will check that it can reach the database and that the necessary tables exist:
If the JDBC driver for the selected database cannot be loaded, you will see the following warning.
The configuration can still be saved, but the JDBC driver has to be available before restarting the Data Catalog.
If for some reason the database cannot be reached, you will see the following warning.
Again, the configuration can be saved, but you have to fix the error before restarting the Data Catalog. Otherwise, it is not recommended to save it.
If the database does not have the necessary tables, you will see the following dialog.
Confirm if you want the Data Catalog to create the tables for you. It will use the user account you configured for the database, so make sure it has DDL privileges on the database. Alternatively, you can specify another user account with the right privileges.
Click the Cancel button if you want to run the script manually. You will find it on the folder
<DENODO_HOME>/scripts/data-catalog/sql/db_scripts
.
Important
If the scripts for creating tables are executed manually, it will be necessary in each future platform update and before starting the Data Catalog again to manually execute the possible new scripts to update the database schema, since in this case the Data Catalog will not be able to do it automatically.
If you still want to run the scripts manually, go to the folder
<DENODO_HOME>/scripts/data-catalog/sql/db_scripts/<DATABASE_TYPE>
and run these scripts in the schema created for this purpose and configured in the Data Catalog:- In the case of starting from an empty schema, all the scripts present in the indicated folder must be executed in ascending version order.
In the case of a platform update, all the scripts present in the indicated folder that have not been executed in previous updates must be executed in ascending version order from the last script that has been executed.
In case you are setting-up a cluster of Data Catalog servers, all servers will share the same database. Therefore, you only need to create the tables in the database once.
Restart Data Catalog to apply the changes.
If the Data Catalog does not start due to an error in the database configuration you can restore the default database configuration manually. Edit the file
<DENODO_HOME>/conf/data-catalog/datasource.properties
and modify its content to match the following (replace<DENODO_HOME>
with the path to the Denodo installation). The propertydatasource.url.default
will contain the path to the database so you can copy its content into thespring.datasource.url
property.datasource.type=derby datasource.url.default=jdbc:derby:<DENODO_HOME>/metadata/data-catalog-database;create=true spring.datasource.driver-class-name=org.apache.derby.jdbc.EmbeddedDriver spring.datasource.url=jdbc:derby:<DENODO_HOME>/metadata/data-catalog-database;create=true spring.datasource.username= spring.datasource.password.secret=
Restart Data Catalog again to apply the changes.
Import the metadata you exported on step #3.
If you are building a cluster of Data Catalogs, configure the load balancer to redirect all the request of the same web session to the same node of the cluster (i.e. sticky sessions).
Configure the Permissions¶
In the Permissions tab you can configure the privileges granted for a role. The privileges are divided into five categories: Management, Administration, Collaboration, User and Request. Select a category and check all the privileges you want to assign for a role. All the users with that role will be granted with those privileges in the Data Catalog.
If a role has been removed from Virtual DataPort it is displayed in red in the permissions table. Click on the icon next to its name to remove it from the Data Catalog too.
Note
To assign privileges in the Permissions dialog, a user needs:
The
PERMISSIONS
privilege in Data Catalog, which gives access to the Permissions dialog.The
ADMIN
privilege for a database in Virtual DataPort, which is the minimum requirement to access all the roles available in the Virtual DataPort server.
The User tab contains the VQL Shell permissions.
The workflow requests have a specific category to assign permissions.
Note
To assign request privileges in the Permissions dialog, the user needs the Manage request
permission in Data Catalog.
Configure the AI Integration¶
In the AI Integration tab of the server set-up configuration, you will be able to configure AI related content of the Data Catalog. This configuration is separated in different sections that are going to be explained below.
API Configuration¶
In the API configuration pill of the AI Integration tab you can configure the API which the Data Catalog will use to external API location for LLM services.
Currently, there are four options when configuring an API: Amazon Bedrock, Azure OpenAI Service, OpenAI or Custom.
Amazon Bedrock. This option allows to configure the official public Amazon Bedrock API.
Azure. This option allows to configure the official public Azure OpenAI Service API.
OpenAI. This option allows to configure the official public OpenAI API.
Custom. This option allows to configure a custom API.
Important
If you are configuring this API for the Assisted Query feature you must take into account that the fixed portion of the prompt provided by Denodo to the model occupies 3000 tokens, with an additional number of tokens reserved for the model’s response.
It’s crucial that the model being used supports a sufficient number of tokens not only for the fixed portion of the prompt and the reserved tokens for the response, but also for the schema (field names and their types) of the views it will interact with. While sending the view schema is essential for system functionality, having a token limit that accommodates additional information, such as field descriptions, associations with other views, example values, etc., is highly recommended.
Models supporting at least 10,000 tokens could be adequate. However, depending on the length of the view metadata (for example, the length of descriptions), this limit may prove insufficient. If the token count falls short for transmitting all necessary information, the Data Catalog automatically trims it down using the token reduction algorithm, prioritizing the view schema and essential function information.
Note
Azure OpenAI is disabled in Agora.
Amazon Bedrock API¶
In this section, the Amazon Bedrock API parameters are going to be enumerated and explained.
AWS access key ID. The identifier of your AWS access key pair.
AWS secret access key. The secret access key of your AWS access key pair.
AWS ARN IAM role. The AWS IAM role.
AWS region. The AWS region where you have access to the Amazon Bedrock service.
Note
Currently, the only Amazon Bedrock base model implemented is Claude 3 Sonnet. When you configure the Amazon Bedrock API, the requests are going to be sent to this base model.
See also
For more information on the AWS authentication parameters go to the Amazon AWS security credentials reference.
Azure OpenAI Service API¶
In this section, the Azure OpenAI Service API parameters are going to be enumerated and explained.
Azure resource name. The name of your Azure resource.
Azure deployment name. The deployment name you chose when you deployed the model.
API version. The API version to use for this operation. This follows the YYYY-MM-DD format.
API key. The API key.
Model token limit. The maximum number of tokens allowed by the model you deployed. If you do not introduce the model token limit value, a default value of 16384 model token limit will be assigned.
See also
For more information on the Azure OpenAI Service API parameters go to the Azure OpenAI Service REST API reference.
OpenAI API¶
In this section, the OpenAI API parameters are going to be explained.
API key. This is the OpenAI API key. This parameter is required.
Organization ID. If configured, a header to specify which organization is used for an API request will be sent. This is useful for users who belong to multiple organizations. This parameter is not required.
Model. The model which is going to be used to generate the query. The drop-down values are the ones tested by Denodo. However, if you want to try an untested OpenAI model, you can configure it by pressing the edit icon.
Important
The models shown in the Model field could not work depending on your organization’s OpenAI account. You may or not have access to use them.
Configuring an untested OpenAI model can potentially result in erroneous behavior within the functionality.
Model token limit. If you click on the edit icon, you can configure the model token limit parameter. If you configure a default model from the list, the Data Catalog will use the official model’s token limit value. If you select a custom model and you do not introduce the model token limit value, a default value of 16384 model token limit will be assigned.
See also
For more information on the OpenAI API parameters, go to the OpenAI API reference website.
Custom API¶
The custom API option enables the Denodo Platform to be integrated with additional APIs alongside the official ones provided by both OpenAI and Azure OpenAI Service. As a result, this feature introduces support for custom models, expanding user options beyond the default APIs with the official models. However, not all APIs are compatible with the Data Catalog due to specific requirements of the Assisted Query feature. The requirements depend on the custom API following the OpenAI chat completions API approach or the Azure OpenAI Service chat completions API approach. Depending on this approach the Custom compatibility mode parameter must be appropriately set. This parameter is now explained in detail.
Custom Compatibility Mode¶
The Custom compatibility mode parameter allows the Data Catalog to send and process requests depending on the custom API following the OpenAI chat completions API approach or the Azure OpenAI Service chat completions API approach:
OpenAI chat completions API approach. We define that a custom API follows this approach if it implements the official chat completions OpenAI API (see
https://platform.openai.com/docs/guides/text-generation/chat-completions-api
). In this case, the Custom compatibility mode parameter must be set to OpenAI. Not all the parameters of the chat completions OpenAI API are needed for the custom API to be compatible with the Data Catalog:Request body. The Data Catalog will make requests with a request body having the following parameters: model, messages and temperature.
Response body. The Data Catalog needs the following parameters in the response body: id, object, created, choices and usage
Azure OpenAI Service chat completions API. We define that a custom API follows this approach if the custom API implements the official chat completions Azure OpenAI Service API (see
https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions
). In this case, the Custom compatibility mode parameter must be set to Azure OpenAI Service. Not all the parameters of the chat completions Azure OpenAI Service API are needed for the custom API to be compatible with the Data Catalog:Request body. The Data Catalog will make requests with a request body having the following parameters: messages and temperature.
Response body. The Data Catalog needs the following parameters in the response body: id, object, created, choices and usage
Now the parameters of each custom compatibility mode are explained:
OpenAI¶
Authentication. Configure this option depending if the custom API requires authentication.
API key. The API key. If the authentication is turned on, this parameter is required.
Organization ID. If configured, a header to specify which organization is used for an API request will be sent. This is useful for users who belong to multiple organizations. Only available when the authentication is turned on. This parameter is not required.
Custom API URI. The URI of your custom API. This parameter is required.
Model. The model which is going to be used to generate the query. This parameter is required.
Model token limit. The maximum number of tokens allowed by your custom model. This parameter is required.
Azure OpenAI Service¶
Authentication. Configure this option depending if the custom API requires authentication.
API key. The API key. If the authentication is turned on, this parameter is required.
Custom API URI. The URI of your custom API. This parameter is required.
Model token limit. The maximum number of tokens allowed by your custom model. This parameter is required.
HTTP Proxy Configuration¶
In this section you can specify the HTTP proxy configuration:
Enable proxy. This toggle enables the connection via proxy. If this is enabled, the requests made to the API are going to be sent to the proxy.
Host. The hostname (IP or DNS name) of the proxy. This parameter is mandatory in order to use the proxy.
Port. The port number. This parameter is mandatory in order to use the proxy.
Proxy name. The user name.
Proxy password. The password.
Important
After configuring the proxy the Data Catalog must be restarted.
Assisted Query Configuration¶
In the Assisted Query pill of the AI Integration tab you can configure the Assisted Query feature.
Enable query generation. This enables or disables the feature.
Warning
Enabling this feature allows the Data Catalog to send metadata information to the LLM when using it. Only if the data usage is enabled, data of the view is also going to be processed and sent to the LLM in order to improve the results. Make sure your organization is aware and agrees on this.
Enabling this feature will result in incurring expenses for requests made to an external API. Make sure that the organization is fully cognizant of this financial implication prior to enabling this option.
Note
To use the Assisted Query feature the user has to have the data_catalog_assisted_query
role. See the Default Roles section for more information.
Use sample data. This enables the Data Catalog to send actual data of the view to the LLM. This can help improving the results when the LLM needs to create queries that contain conditions.
Note
For using the data usage option of the Assisted Query, the VDP cache
must be enabled. The section Configuring the Cache explains how to do this.
Sample data size. Configure the number of rows to retrieve for sampling. The sampling process will create a sample selecting a subset of the result rows. This sample will then be used to send data to the LLM. The maximum value is 500 rows. This parameter is only visible when the data usage is enabled.
Important
When the value of the Sample data size is changed, the Data Catalog must be restarted.
Language options. This option allows the possibility to use multiple language configurations for the query explanation.
User locale. The language that corresponds to the user locale configured in the Data Catalog is used.
Language. The language selected from a list of languages is used.
Browser language. The language configured in the web browser is used.
Execution mode. Choose weather if you have to execute the generated query yourself after the generation is done or if it is automatically executed.
Note
The utilization of the automatic execution mode will extend the duration of the assisted query generation process because after the query is generated it will automatically execute it, ultimately delaying the presentation of the final result. It is advisable to take this into consideration when assessing whether you wish to verify the query’s correctness against your requirements before execution. In such cases, opting for the manual mode is strongly recommended.