Server Set-Up¶
From the Server set-up dialog you can configure all the settings of the Data Marketplace server. It is divided into the following tabs:
VDP servers. Configure the Virtual DataPort servers available from the login page and the connection settings used in every query.
Authentication. Enable the single sign-on with Kerberos in the Data Marketplace.
Database. Configure an external database to store the metadata of Data Marketplace.
Permissions. Grant power users the privileges to do additional tasks.
LLM Configuration. Configure the LLM provider, the thinking LLM provider and proxy for Denodo Assistant features.
Vector DB. Configure the vector database integration to index the metadata and optionally data of Data Marketplace for the Global Assisted Query feature.
Configure the Virtual DataPort Servers¶
In the Servers section of the VDP servers tab you can configure the Virtual DataPort servers that are listed in the login dialog.
Note
VDP Server configuration is disabled in Agora.
Dialog to configure the Virtual DataPort servers listed in the login page¶
To add a new Virtual DataPort server click on the Add server button and enter the following values:
Name. The name that will be shown in the login page for this server.
URL. The connection URL of the server, which follows the pattern
//<host>:<port>/<database>. You should take into account the following considerations:Only those users granted with the
CONNECTprivilege on the database will be able to connect to the Data Marketplace.If the server is configured with LDAP authentication for a specific database, use this database in the URL.
Description. A description for the server.
Dialog to add a new Virtual DataPort server¶
Note
Saved queries are stored per user and Virtual DataPort server. The servers are identified by an internal identifier, so if a server is edited, it will maintain the same queries associated to it. However, if the server is removed, the queries are removed too.
Note
For Automated Cloud Mode you can configure the default server by editing the following properties in the file $DENODO_HOME/conf/data-catalog/DataCatalogBackend.properties
com.denodo.dc.vdp.port=<port>
com.denodo.dc.vdp.host=<hostname>
com.denodo.dc.vdp.servername=<servername>
Configure the Connection Settings to the Virtual DataPort Servers¶
In the Connection settings section of the VDP servers tab you can configure the global parameters of the connections created to execute queries against the Virtual DataPort Servers.
Dialog to configure the connection settings to the Virtual DataPort servers¶
These parameters are:
Query timeout. Maximum time in milliseconds that the query will wait for the statement to be completed. If not indicated (or you enter 0 as a value), then it will wait until the execution is complete. By default it is 900,000 milliseconds.
Chunk timeout. Maximum time in milliseconds that the query will wait until it arrives a new chunk of results. Where this time is exceeded, Virtual DataPort returns the results extracted up to that moment. If not specified (or you enter 0 as a value), Virtual DataPort returns all the results together at the end of the statement run. By default it is 90,000 milliseconds.
Chunk size. Number of results that make up a chunk of results. When Virtual DataPort Tool obtains this number of results, it will return them to the Data Marketplace, even though the Chunk timeout has not been reached. By default it is 1,000 rows.
Enable Single Sign-On with Kerberos¶
In the Authentication tab you can enable single sign-on with Kerberos in the Data Marketplace.
Note
Kerberos configuration is disabled in Agora since SSO is automatically configured.
Dialog to configure Kerberos authentication in the Data Marketplace¶
Follow these steps:
Enable the Use Kerberos option.
Enter the Service Principal Name of the Data Marketplace in the Server Principal field. You should enter the same value used to generated the keytab file.
Drag-and-drop the keytab file to the Keytab file field. As an alternative, you can click the field and select it from the files browser.
Consider to add a krb5 file to the Configuration file field if one of the following conditions is met and there is no krb5 file in the default location of your system:
The host where the Data Marketplace runs does not belong to a Kerberos realm (e.g., a Windows Active Directory domain).
The host where the Data Marketplace runs is Linux/Unix.
The service account in Active Directory configured for the Data Marketplace has the option constrained delegation enabled.
You are in a cross-domain scenario. That is, the organization has several domains.
Otherwise, you can leave the field empty.
See also
For more details on the krb5 file, go to Providing a Krb5 File for Kerberos Authentication.
In case you run into any issues, select the Activate Kerberos debug mode option. Otherwise, disable it.
See also
For more details on debugging Kerberos issues, go to How to Debug Kerberos in Web Applications.
Click the Save button to confirm the Kerberos configuration. This configuration will take effect immediately. There is no need to restart the Data Marketplace.
Important
If no user is able to log in the Data Marketplace due to an inappropriate configuration of the Kerberos authentication, remember that you can still connect to the Data Marketplace using the local authentication or the authentication with login and password. Then you can modify the Kerberos settings or enable its debug mode.
Use an External Database for the Data Marketplace¶
This section explains how to configure the Data Marketplace to store its settings on an external database. This is necessary if you want to set up a cluster of Data Marketplace servers so all the servers share the same settings and metadata.
Note
This option is disabled when accessing Data Marketplace from Agora since Agora automatically configures an external database.
By default, the Data Marketplace stores the global settings and certain settings for each user in an embedded database (Apache Derby). For example, the saved queries for a user, the fields selected by the administrator to display by default on a view search, etc. You can configure Data Marketplace to store all this in an external database.
The Data Marketplace supports the following databases:
Amazon Aurora for MySQL and PostgreSQL
Azure SQL Server
GCP Cloud SQL for MySQL and PostgreSQL
MySQL
Note
Make sure that MySQL (or the database in MySQL you are going to use) is configured with the options
Default Charset = utf8_mb4andCollation = utf8mb4_unicode_ciin the case of MySQL 5 orCollation = utf8mb4_0900_ai_ciin the case of MySQL 8.Oracle
Note
When using Oracle as an external database, it is essential to set the
MAX_STRING_SIZEparameter toEXTENDED, as explained in official Oracle documentation. The default setting,STANDARD, limits text fields to 4000 bytes, specifically theVARCHAR2andNVARCHAR2data types. This restriction significantly impacts the number of characters that can be stored, especially when dealing with multibyte characters.PostgreSQL
SQL Server
Note
The minimum version required of each database is:
Amazon Aurora for MySQL 5.7 and Amazon Aurora for PostgreSQL 11
GCP Cloud SQL for MySQL 5.7 and GCP Cloud SQL for PostgreSQL 11
MySQL 5.7
Oracle 12c
PostgreSQL 11
SQL Server 2014
Follow these steps to store the metadata of the Data Marketplace on an external database:
In the external database, create a catalog or a schema for the metadata of the Data Marketplace.
Although you can use an existing schema, we suggest creating one to keep the tables separate from the tables of other applications. We suggest reserving 5 gigabytes of space for this schema. In most cases, less space will be required. However, we recommend a high value to avoid issues due to the lack of space in the database.
In this database, create a service account that has create, read and write privileges on that database.
Consider enabling the high availability features of this database to meet higher uptime requirements.
Copy the JDBC driver of the database to the directory
<DENODO_HOME>/lib/data-catalog-extensions.Take this into account:
To use GCP Cloud SQL for PostgreSQL, only copy the latest driver available from
<DENODO_HOME>/lib/extensions/jdbc-drivers/postgresql.To use Oracle 12, only copy the files
ojdbc8.jarandorai18n.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-12c.To use Oracle 18, only copy the files
ojdbc8.jarandorai18n.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-18c.To use Oracle 19, only copy the files
ojdbc8.jarandorai18n.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-19c.To use Oracle 21, only copy the files
ojdbc8.jarandorai18n.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-21c.To use Oracle 23 or newer, only copy the files
ojdbc17.jarandorai18n.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle.To use SQL Server, you have available two different drivers:
For any version, use the Microsoft driver. This is the recommended option. To use it, only copy the file
mssql-jdbc-10.2.0.jre8.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/mssql-jdbc-10.x.For the 2014 and 2016 versions, the jTDS driver is also available. To use it, copy the file
denodo-jtds-1.3.1.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/denodo-jtds-1.3.1.
To use Amazon Aurora for MySQL 5.7, only copy the file
mariadb-java-client-2.7.1.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/mariadb-2.7.To use Amazon Aurora for PostgreSQL 11, only copy the file
postgresql-42.7.2from<DENODO_HOME>/lib/extensions/jdbc-drivers/postgresql-11.To use Azure SQL Server, only copy the file
mssql-jdbc-7.2.2.jre8.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/mssql-jdbc-7.x.To use Azure SQL Server via Active Directory, copy the files
mssql-jdbc-7.2.2.jre8.jar,adal4j-1.6.7.jar,gson-2.9.0.jarandoauth2-oidc-sdk-8.36.2.jarfrom<DENODO_HOME>/lib/extensions/jdbc-drivers/mssql-jdbc-7.x; andaccessors-smart.jar,json-smart.jar,javax.mail.jarandnimbus-jose-jwt.jarfrom<DENODO_HOME>/lib/contrib.To use Azure SQL Server as the external metadata database with ActiveDirectoryPassword authentication method, copy the file
<DENODO_HOME>/lib/contrib/content-type.jarto the directory<DENODO_HOME>/lib/data-catalog-extensions, in addition to the files already specified.You may find the JDBC of other databases in
<DENODO_HOME>/lib/extensions/jdbc-drivers.
Log in the Data Marketplace with an administrator account and export the metadata.
This is necessary because the metadata of the Data Marketplace is not transferred automatically from the current database to the new one. You can skip this step if this is a fresh installation and you have not done any other change to this Data Marketplace.
Stop all the components of the Denodo Platform. Then, execute
<DENODO_HOME>/bin/webcontainer_shutdownto make sure the web container is stopped.Start Data Marketplace and the other components of the Denodo Platform.
Log in the Data Marketplace with an administrator account and go to Administration > Set-up and management > Server. In the Database tab you will find the dialog to configure the external database.
Dialog to configure the database where the Data Marketplace stores its metadata¶
Provide the following information:
Database. Select the database you want to use.
Driver class. The name of the Java class of the JDBC driver to be used. The default value is usually correct.
URL. The connection URL to the database.
Important
For SQL Server using the Microsoft driver 10.x and above, it is necessary to set the encrypt property to false. To do so, in the database URL, add encrypt=false at the end. For example:
jdbc:sqlserver://host:port;databaseName=database;encrypt=falseAuthentication. Authentication method for accessing the external database. It is only available for Oracle and SQL Server 2014 databases. Select one of these methods:
Login and Password: It is the authentication method by default. Use the credentials of the account used to connect to the database. Username (optional) and Password (optional).
Kerberos with password: Use Kerberos authentication, with the provided Username and Password (from the Active Directory account).
Kerberos with keytab: Use Kerberos authentication, with the provided Username (in this case, the Service Principal Name - SPN -) and Keytab file (no password needed). Drag-and-drop the keytab file to the Keytab file field or as an alternative, you can click the field and select it from the files browser.
Kerberos with Windows User: Use Single Sign-On (SSO) with Kerberos, doing pass-through with the user that launched the server (no user name nor password required).
Denodo AWS Instance Credentials. Needed parameters for accessing to a Amazon Aurora datasources using instance credentials:
AWS IAM role ARN (optional): An IAM identity that you can create in your account that has specific permissions.
AWS region: AWS region where the Aurora database is deployed.
Database user: Database account that you want to access.
Maximum lifetime of a connection in the pool (minutes) (optional): Connection lifetime should be a little bit less than an AWS authentication token to avoid pool connections drop. Data Marketplace assumes 14 minutes by default since according to IAM database authentication for MariaDB, MySQL, and PostgreSQL the tokens have a default lifetime of 15 minutes.
AWS IAM Credentials. The parameters to authenticate using IAM Credentials are necessary in addition to the parameters for accessing to a Amazon Aurora datasources defined above:
AWS access key id: A unique identifier that specifies the user or entity making the request.
AWS secret access key: A secret string that is used to sign the requests you make to AWS.
Note
AWS IAM Credentials and Denodo AWS Instance Credentials options are only available when the database is Amazon Aurora MySQL or Amazon Aurora PostgreSQL. They require SSL and you have to follow instructions in Adding AWS RDS Certificate to JRE cacerts.
Note
When the selected database is
Derby Embedded, the fields Driver class, URL, Username and Password are not editable.Note
To configure the SQL Server 2014 with the jTDS driver, use the value
net.sourceforge.jtds.jdbc.Driveras the Driver class and enter a URL that follows the patternjdbc:jtds:sqlserver://<host>:<port>/<database>Schema. Select the schema you want to use. If the field is empty, database default schema will be used.
Note
This option is available only for databases that support this concept of schema: PostgreSQL and SQL Server.
To speed up the queries, the Data Marketplace creates a pool with connections to the database. You can configure this pool with the following optional settings:
Maximum pool size. The maximum number of actual connections to the database, including both idle and in-use connections.
Minimum idle connections. The minimum number of idle connections that the Data Marketplace tries to maintain in the pool. If the idle connections dip below this value and total connections in the pool are less than Maximum pool size, the Data Marketplace will make a best effort to add additional connections.
Connection timeout. The maximum number of milliseconds that the Data Marketplace will wait for a connection from the pool. If this time is exceeded without a connection becoming available, an error will be thrown.
Ping query. The query that will be executed just before using a connection from the pool to validate that it is still alive. This is required only for legacy drivers that do not support the JDBC 4.0
Connection.isValid()API. If your driver supports JDBC 4.0 we strongly recommend not setting this property.Note
The jTDS driver for Microsoft SQL Server is considered legacy since it only supports JDBC 3.0. To use it with the Data Marketplace you need to provide a value for the Ping query field.
Click the Save button.
The Data Marketplace will check that it can reach the database and that the necessary tables exist:
If the JDBC driver for the selected database cannot be loaded, you will see the following warning.
Warning when the JDBC driver cannot be loaded¶
The configuration can still be saved, but the JDBC driver has to be available before restarting the Data Marketplace.
If for some reason the database cannot be reached, you will see the following warning.
Warning when cannot connect to the database¶
Again, the configuration can be saved, but you have to fix the error before restarting the Data Marketplace. Otherwise, it is not recommended to save it.
If the database or schema (in PostgreSQL and SQL Server) does not have the necessary tables, you will see the following dialog.
Dialog to create the tables in the database or schema¶
Confirm if you want the Data Marketplace to create the tables for you. It will use the user account you configured for the database, so make sure it has DDL privileges on the database. Alternatively, you can specify another user account with the right privileges.
Click the Cancel button if you want to run the script manually. You will find it on the folder
<DENODO_HOME>/scripts/data-catalog/sql/db_scripts.Important
If the scripts for creating tables are executed manually, it will be necessary in each future platform update and before starting the Data Marketplace again to manually execute the possible new scripts to update the database schema, since in this case the Data Marketplace will not be able to do it automatically.
If you still want to run the scripts manually, go to the folder
<DENODO_HOME>/scripts/data-catalog/sql/db_scripts/<DATABASE_TYPE>and run these scripts in the schema created for this purpose and configured in the Data Marketplace:In the case of PostgreSQL and SQL Server, the table prefix must be replaced with the desired schema or removed to use the default schema.
The oracle folder contains scripts for Oracle 12 to Oracle 21. For Oracle 23 or newer use the scripts in the oracle23 folder.
In the case of starting from an empty schema, all the scripts present in the indicated folder must be executed in ascending version order.
In the case of a platform update, all the scripts present in the indicated folder that have not been executed in previous updates must be executed in ascending version order from the last script that has been executed.
If you are setting-up a cluster of Data Marketplace servers, all servers will share the same database. Therefore, you only need to create the tables in the database once.
Restart Data Marketplace to apply the changes.
If the Data Marketplace does not start due to an error in the database configuration you can restore the default database configuration manually. Edit the file
<DENODO_HOME>/conf/data-catalog/datasource.propertiesand modify its content to match the following (replace<DENODO_HOME>with the path to the Denodo installation). The propertydatasource.url.defaultwill contain the path to the database so you can copy its content into thespring.datasource.urlproperty.datasource.type=derby datasource.url.default=jdbc:derby:<DENODO_HOME>/metadata/data-catalog-database;create=true spring.datasource.driver-class-name=org.apache.derby.jdbc.EmbeddedDriver spring.datasource.url=jdbc:derby:<DENODO_HOME>/metadata/data-catalog-database;create=true spring.datasource.username= spring.datasource.password.secret=
Restart Data Marketplace again to apply the changes.
Import the metadata you exported on step #3.
If you are building a cluster of instances of Data Marketplace, configure the load balancer to redirect all the request of the same web session to the same node of the cluster (i.e. sticky sessions).
Configure the Permissions¶
In the Permissions tab you can configure the privileges granted to a role. The privileges are divided into seven categories: Management, Administration, Collaboration, User, Request, External element and Denodo Assistant. To grant or revoke privileges to a role you have to:
Navigate through the categories in the dialog.
Select or deselect the privileges you want for the role.
Click the Save button to apply changes.
After that, all users with that role will be able to perform those tasks associated to the selected privileges.
You can find a detailed explanation of each privilege in the Privileges on the Data Marketplace section.
Dialog to configure the user privileges¶
In this dialog a check box can have one of three states. Let us see what each one of them means:
The privilege is not granted to the role.
The privilege have been granted to some elements for this role in the advanced permissions dialog.
The privilege is granted to the role.
If a role has been removed from Virtual DataPort it is displayed in red in the
permissions table. Click the icon next to its name to remove it from
the Data Marketplace too.
Note
To configure privileges in the Permissions dialog, a user needs:
The
Permissionsprivilege in Data Marketplace, which gives access to the Permissions dialog.One of the following requirements to access all roles available in the Virtual DataPort server:
The
ADMINprivilege for a database in Virtual DataPort.The
assignprivilegesrole.The
create_rolerole.The
drop_rolerole.The
assign_all_privilegesrole.The
assign_all_rolesrole.The
manage_policiesrole.The
read_all_privilegesrole.
Take into account that the
allusersrole is only available for those users with the roleassign_all_privileges,assign_all_rolesorread_all_privileges.
By default, all privileges granted in the Permissions tab will apply to all elements in the catalog. However, for
the privileges to manage requests you can be more specific and restrict the databases, views or web services that will
be affected by the privilege. For instance, you can grant the Manage access privilege to a role but only for those
requests in which participate an element of a particular database. To access this advanced view of the Permissions
tab, click the icon under the Advanced column for the role you want to configure.
Dialog to grant a privilege to a role but only on specific elements of the catalog¶
To grant a privilege to a view or web service, go to the VDP element section and just select the corresponding check box for that element. Databases work slightly different:
Select a check box for a database and the privilege will be granted to all elements under the database, the current ones and the future ones. If the privilege is applicable for a database, it will be granted to the database too.
If you want to grant the privilege only to the database, then click the
icon next to the check box and select the Database option.
In this dialog a check box can have one of four states. Let us see what each one of them means:
The privilege is not granted to the element.
The privilege is granted only to the database. Its elements will not inherit the privilege.
According to the element, the meaning of this state can be different:
In case of databases, the privilege is granted to the database (if applicable) and all its elements.
In case of views and web services, the privilege is granted to the element.
The privilege is granted to the element and cannot be revoked because it is inherited from the database or from the global privilege. Remember that when a privilege is granted from the main dialog of the Permissions tab it will affect all elements in the catalog.
As you can see in the dialog, folders cannot be the target of a privilege. However, they allow you to grant or revoke
privileges to several elements at once. Click the icon next to the folder name and select one of the
privileges under the Select on all folder elements menu to grant the selected privilege on all current elements
under the folder. If you select a privilege under the Clear on all folder elements menu, the privilege will be
revoked.
The catalog may contain too many elements and finding one in particular could be difficult. Notice that you can filter
your catalog using the tools in the Elements title. Filter by element name typing a keyword in the search bar or by
element type using the button.
To grant a privilege to a external element, go to the External element section and select the corresponding check box for that element.
Dialog to grant a privilege to a role but only on specific external elements of the catalog¶
The list of external elements in the table could be too large. Notice that you can filter
the external elements using the tools in the Elements title. Filter by element name by typing a keyword in the search bar. You can also filter the external elements by tags, categories or external tool servers using the button.
Large Language Models (LLM) Configuration¶
In the LLM Configuration tab you will be able to configure LLM related content of the Data Marketplace. This configuration is separated into three sections: Provider Configuration, Thinking Provider Configuration and Proxy Configuration.
LLM Configuration tab.¶
Use the buttons at the top to perform one of the following actions:
Save. Stores the LLM configuration in the database.
Clear. Removes temporary changes that were made in the form but are not saved in the database.
Test Configuration. Checks whether the LLM configuration specified in the form is valid. An LLM configuration is considered valid if the LLM provider successfully responds to a test request.
Note
To see the HTTP logs for the Test Configuration process, use the same method described for the Assisted Query in the Log Assisted Query HTTP information section.
Now, these three sections of the LLM configuration are going to be explained in detail.
Provider Configuration¶
In the Provider Configuration pill of the LLM Configuration tab you can configure the large language model provider that the Denodo Assistant will use. Currently, there are four options when configuring a provider: Amazon Bedrock, Azure OpenAI Service, OpenAI or Other (OpenAI API Compatible).
Amazon Bedrock (Anthropic Claude). This option allows to configure the official public Amazon Bedrock API. At the moment, only Amazon Bedrock models from the Claude family of model provider Anthropic are supported.
Azure OpenAI Service. This option allows to configure the official public Azure OpenAI Service API or a custom one.
OpenAI. This option allows to configure the official public OpenAI API.
Other (OpenAI API Compatible). This option allows to configure an OpenAI API compatible provider.
Important
If you are configuring a provider for the Assisted Query feature you must take into account that the fixed portion of the prompt provided by Denodo to the model occupies 2300 tokens, with an additional number of tokens reserved for the model’s response.
It’s crucial that the model being used supports a sufficient number of tokens not only for the fixed portion of the prompt and the reserved tokens for the response, but also for the schema (field names and their types) of the views it will interact with. While sending the view schema is essential for system functionality, having a context window token limit that accommodates additional information, such as field descriptions, associations with other views or example values is highly recommended.
Models supporting at least 10,000 tokens of context window could be adequate. However, depending on the length of the view metadata (for example, the length of descriptions), this limit may prove insufficient. If the token count falls short for transmitting all necessary information, the Data Marketplace automatically trims it down using the token reduction algorithm, prioritizing the view schema and essential function information.
Amazon Bedrock (Anthropic Claude) Provider¶
In this section, the Amazon Bedrock (Anthropic Claude) provider parameters are going to be enumerated and explained.
Dialog with the Amazon Bedrock (Anthropic Claude) provider configuration parameters¶
Use custom URL. Enable if you want to specify the URL used for LLM requests.
Endpoint URL. Specify the URL used for LLM requests. Only visible if Use custom URL is enabled.
Authentication. Authentication method for accessing Amazon Bedrock. Choose between AWS IAM credentials or Denodo AWS instance credentials.
AWS IAM credentials. Use this option if you want to specify the AWS credentials.
AWS access key ID. Unique identifier that specifies the user or entity making the request.
AWS secret access key. Secret string that is used to sign the requests you make to AWS.
AWS IAM role ARN (optional). IAM identity with specific permissions.
AWS region. The AWS region where access to the Amazon Bedrock service is available.
Denodo AWS instance credentials. Use this option if the AWS credentials are configured in the instance where Denodo is running.
AWS IAM role ARN (optional). IAM identity with specific permissions.
AWS region. The AWS region where access to the Amazon Bedrock service is available.
Model ID. The identifier used within the Amazon Bedrock service to specify which model should be used. If you enter a custom Model ID, select a model of the “Claude” family of the “Anthropic” model provider.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. This includes both the input (Data Marketplace request) and the output (model response). If you configure a model from the list, the Data Marketplace will use the official model’s context window value. If you select a custom model and you do not introduce the context window value, a default value of 16385 tokens will be assigned to this parameter. You can choose the custom option to type the number you want.
Max output tokens. The maximum number of tokens that the model can use to generate the response. If you configure a model from the list, the Data Marketplace will use the official model’s max output tokens. If you select a custom model and you do not introduce a max output tokens value, a default value of 4096 tokens will be assigned to this parameter. You can choose the custom option to type the number you want.
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
Custom temperature. Specify if the model needs a specific setting for its temperature (creativity level). If enabled, the following parameter becomes available:
Temperature. If disabled, the Data Marketplace request will not include temperature. If the custom option is selected, a specific temperature value will be sent. The range for the Amazon Bedrock (Anthropic Claude) Provider is [0, 1].
See also
For more information on the AWS authentication parameters go to the Amazon AWS security credentials reference.
Azure OpenAI Service Provider¶
In this section, the Azure OpenAI Service provider parameters are going to be enumerated and explained.
The Use custom URL parameter allows configuring a custom Chat Completions URL when enabled. With this parameter we define that a specific API is eligible if it implements the official Chat Completions Azure OpenAI Service API (see https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#chat-completions). Not all the parameters of the Chat Completions Azure OpenAI Service API are needed for the custom API to be compatible with the Denodo Assistant:
Request body. The Denodo Assistant will make requests with a request body having the following parameters: messages and temperature.
Response body. The Denodo Assistant needs the following parameters in the response body: id, object, created, choices and usage.
If the Use custom URL parameter is disabled the following parameters are available:
Dialog with the Azure OpenAI Service provider configuration parameters when the Use custom URL parameter is disabled¶
Azure resource name. The name of your Azure resource.
Azure deployment name. The deployment name you chose when you deployed the model.
API version. The API version to use for this operation. This follows the YYYY-MM-DD format.
API key. The API key.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. This includes both the input (Data Marketplace request) and the output (model response). The default value is 16385.
Max output tokens. The maximum number of tokens that the model can use to generate the response. The default value is 4096 tokens.
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
Custom temperature. Specify if the model needs a specific setting for its temperature (creativity level). If enabled, the following parameter becomes available:
Temperature. If disabled, the Data Marketplace request will not include temperature. If the custom option is selected, a specific temperature value will be sent. The range for the Azure OpenAI Service Provider is [0, 2].
If the Use custom URL parameter is enabled the following parameters are available:
Dialog with the Azure OpenAI Service provider configuration parameters when the Use custom URL parameter is enabled¶
Authentication. Configure this option depending on whether the custom provider requires authentication.
API key. The API key. If the authentication is turned on, this parameter is required.
Chat Completions URL. The URL of your custom API. This parameter is required.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. This includes both the input (Data Marketplace request) and the output (model response). The default value is 16385.
Max output tokens. The maximum number of tokens that the model can use to generate the response. The default value is 4096 tokens.
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
Custom temperature. Specify if the model needs a specific setting for its temperature (creativity level). If enabled, the following parameter becomes available:
Temperature. If disabled, the Data Marketplace request will not include temperature. If the custom option is selected, a specific temperature value will be sent. The range for the Azure OpenAI Service Provider is [0, 2].
See also
For more information on the Azure OpenAI Service provider parameters go to the Azure OpenAI Service REST API reference.
OpenAI Provider¶
In this section, the OpenAI provider parameters are going to be explained.
Dialog with the OpenAI provider configuration parameters.¶
API key. This is the OpenAI API key. This parameter is required.
Organization ID (optional). A header to specify which organization an API request is sent from. This is useful for users who belong to multiple organizations. This parameter is not required.
Model. The model which is going to be used to generate the query. The drop-down values are the ones tested by Denodo. However, if you want to try an untested OpenAI model, you can configure it by pressing the edit icon.
Important
Access to the models listed in the Model field depends on your OpenAI account and organization. You may not have access to all of them.
Configuring an untested OpenAI model can potentially result in erroneous behavior within the functionality.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. This includes both the input (Data Marketplace request) and the output (model response). If you configure a model from the list, the Data Marketplace will use the official model’s context window value. If you select a custom model and you do not introduce the context window value, a default value of 16385 tokens will be assigned to this parameter. You can choose the custom option to type the number you want.
Max output tokens. The maximum number of tokens that the model can use to generate the response. If you configure a model from the list, the Data Marketplace will use the official model’s max output tokens. If you select a custom model and you do not introduce a max output tokens value, a default value of 4096 tokens will be assigned to this parameter. You can choose the custom option to type the number you want.
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
Custom temperature. Specify if the model needs a specific setting for its temperature (creativity level). If enabled, the following parameter becomes available:
Temperature. If disabled, the Data Marketplace request will not include temperature. If the custom option is selected, a specific temperature value will be sent. The range for the OpenAI Provider is [0, 2].
See also
For more information on the OpenAI provider parameters, go to the OpenAI provider reference website.
Other (OpenAI API Compatible)¶
In this section, the Other (OpenAI API Compatible) provider parameters are going to be enumerated and explained.
The Other (OpenAI API Compatible) option allows the Denodo Assistant to send and process requests from APIs following the OpenAI Chat Completions API approach. We define that an API follows this approach if it implements the official Chat Completions OpenAI API (see https://platform.openai.com/docs/api-reference/chat). Not all the parameters of the Chat Completions OpenAI API are needed for the API to be compatible with the Denodo Assistant:
Request body. The Denodo Assistant will make requests with a request body having the following parameters: model, messages and temperature.
Response body. The Denodo Assistant needs the following parameters in the response body: id, object, created, choices and usage.
Dialog with the Other (OpenAI API Compatible) provider configuration parameters¶
Authentication. Configure this option depending on whether the custom provider requires authentication.
API key. The API key. If the authentication is turned on, this parameter is required.
Organization ID (optional). A header to specify which organization an API request is sent from. This is useful for users who belong to multiple organizations. Only available when the authentication is turned on. This parameter is not required.
Chat Completions URL. The Chat Completions endpoint. This parameter is required.
Model. The model which is going to be used to generate the query. This parameter is required.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. This includes both the input (Data Marketplace request) and the output (model response). The default value is 16385.
Max output tokens. The maximum number of tokens that the model can use to generate the response. The default value is 4096 tokens.
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
Custom temperature. Specify if the model needs a specific setting for its temperature (creativity level). If enabled, the following parameter becomes available:
Temperature. If disabled, the Data Marketplace request will not include temperature. If the custom option is selected, a specific temperature value will be sent. Any value greater than 0 is valid.
In addition to OpenAI models such as gpt-oss-20b, several others support the Chat Completions API, including Google’s Gemini and models that can be served using Ollama, such as Llama 3.3, Mistral, and Phi-4. The following examples demonstrate how to configure both a Gemini model and models served via Ollama.
Vertex AI Platform. Follow these steps to use a deployed Gemini model via the Vertex AI Platform.
Select Other (OpenAI API Compatible) as the Provider.
If authentication is required, enter the API key. You can retrieve a temporary access token using the following command in Google Cloud Shell:
gcloud auth print-access-token
Specify the URI of the chat completions endpoint for the Vertex AI API. This URI must include your specific
{location}and{project}values:https://{location}-aiplatform.googleapis.com/v1beta1/projects/{project}/locations/{location}/endpoints/openapi/chat/completions.In the Model field, enter the full name of your deployed Gemini model, such as
google/gemini-2.0-flash.
By following these steps, Denodo can securely connect to and utilize your deployed Vertex AI model.
Ollama. Follow these steps to use a local LLM hosted by the Ollama server.
Before you begin, ensure you have the following:
The Ollama server is installed and running on your machine.
You have pulled the model you wish to use (e.g.,
gpt-oss:20borllama3.2:3b). The model must be available on your local Ollama server.ollama pull gpt-oss:20b
Follow these steps to complete the configuration:
Select Other (OpenAI API Compatible) as the Provider.
Leave the API Key field blank, as a locally running Ollama server typically does not require authentication.
Specify the URI for the Ollama chat completions endpoint. This will usually be your local machine’s address and the default Ollama port (11434), followed by the API path:
http://localhost:11434/v1/chat/completions.In the Model field, enter the name of the model you have pulled in Ollama, such as
gpt-oss:20borllama3.2:3b.
By following these steps, Denodo can connect to your local Ollama server and utilize the selected model.
Thinking Provider Configuration¶
Thinking models power the DeepQuery functionality within Denodo Assistant. A thinking model is a Large Language Model (LLM) trained to “think longer” and handle complex, multi-step tasks. Before providing a final answer, it generates an internal “chain of thought.”
While this reasoning process is generally not visible to the user, it allows the model to break down problems, consider multiple approaches, and arrive at more accurate and logical conclusions. Configuring a thinking provider is highly recommended to leverage the full potential of DeepQuery.
Thinking Provider Configuration tab.¶
To start using thinking models, activate the Enable thinking configuration toggle. This will reveal the provider selection and its specific parameters.
Unlike the standard provider configuration, thinking models have the following constraints:
Supported Providers: Only Amazon Bedrock (Anthropic Claude), Azure OpenAI Service and OpenAI are supported.
Fixed Temperature: Thinking models do not allow temperature configuration, as the creativity and internal logic are managed by the model’s own reasoning process.
Amazon Bedrock (Anthropic Claude) Thinking Provider¶
The configuration for the Amazon Bedrock (Anthropic Claude) thinking provider uses the same parameters as the standard provider, with the addition of the following parameter:
Thinking budget tokens. The maximum number of tokens that the model can allocate to its internal reasoning process.
Important
The Thinking budget tokens value must be strictly less than the Max output tokens value. Because reasoning tokens are part of the total output generated by the model, the remaining budget (Max output - Thinking budget) must be sufficient for the model to produce the final, visible response.
Azure OpenAI / OpenAI Thinking Provider¶
The configuration for Azure OpenAI Service and OpenAI thinking providers follows the same parameters as their respective standard providers (see Azure OpenAI or OpenAI), but includes the following reasoning-specific parameter:
Reasoning effort. This parameter specifies how much time and computational effort the model should dedicate to reasoning before generating a response. You can choose between:
Low: Provides faster responses with a more concise reasoning process.
Medium: A balanced setting suitable for most complex queries.
High: Instructs the model to perform the most thorough analysis possible, which may increase response time.
HTTP Proxy Configuration¶
In the Proxy Configuration pill of the LLM Configuration tab you can specify the HTTP proxy configuration:
Dialog with the proxy configuration parameters¶
Enable proxy. This toggle enables the connection via proxy. If this is enabled, the requests made to the API are going to be sent to the proxy. This option should be enabled if a proxy has been configured between the Data Marketplace and the provider.
Host. The hostname (IP or DNS name) of the proxy.
Port. The port number. This parameter is mandatory in order to use the proxy.
Proxy name (optional). The username.
Proxy password (optional). The password.
Important
If the proxy configured requires basic authorization, the following property must be added to the JVM properties of the web container:
-Djdk.http.auth.tunneling.disabledSchemes=""
Custom Headers Configuration¶
When the Add custom headers button is clicked, a configuration page is shown to manage the custom headers.
Popup with the custom headers configuration parameters¶
The Custom Headers Configuration page lets you manage headers using a dedicated table and two main actions. To add a new header, click the New button. To remove multiple selected headers, use the Delete button above the table. Within the table, you can enter the header’s name and value, specify if the value should be encrypted (treated as a secret) in the Data Marketplace database, and delete an individual header using the delete action.
Vector Database Configuration¶
In the Vector DB tab, you can configure the vector database for Data Marketplace.
This configuration is divided into four sections: Connection Configuration, Embedding Model Configuration, Index Configuration and Servers Configuration.
First, activate the metadata vectorization process using the Enable the metadata vectorization toggle. This will reveal the connection parameters.
Warning
If Data Marketplace is running in a cluster environment, you must restart all nodes after modifying the configuration to ensure the changes are correctly applied across the entire cluster.
Connection Configuration¶
Use this section to set up the connection details for the vector database.
Note
This option is disabled when accessing Data Marketplace from Agora, as Agora automatically configures the vector database connection.
Denodo supports the following vector database adapters:
PostgreSQL (PGVector)
GCP Cloud SQL for PostgreSQL
Amazon Aurora (PostgreSQL)
Oracle 26ai and higher
SQL Server 2025 or higher
Azure SQL Managed Instance
Important
Driver installation
Before configuring the vector database, copy the appropriate JDBC driver to the <DENODO_HOME>/lib/data-catalog-extensions folder:
PostgreSQL / GCP Cloud SQL / Amazon Aurora: use the PostgreSQL JDBC driver (version 15 or higher).
Oracle: use the Oracle JDBC driver (version 26ai or higher).
SQL Server / Azure SQL: use the Microsoft SQL Server driver (version 2025 or higher).
You must restart the Data Marketplace after adding the driver.
Important
For PostgreSQL-based databases (PostgreSQL, GCP Cloud SQL, and Amazon Aurora):
If the pgvector extension is not installed in the schema you intend to use, you must specify both the target schema and the schema where the extension is located (e.g., public) in the connection URL using the currentSchema parameter.
URL Example:
jdbc:postgresql://<host>:<port>/<database>?currentSchema=<target_schema>,<extension_schema>
PostgreSQL (PGVector)¶
PostgreSQL (PGVector) configuration parameters¶
URL. The database connection URL.
Username. Database username.
Password. Database user password.
Schema (optional). The database schema to use. If this field is empty and no schema is specified in the URL, the public schema will be used.
GCP Cloud SQL for PostgreSQL¶
GCP Cloud SQL for PostgreSQL configuration parameters¶
URL. The database connection URL.
Username. Database username.
Password. Database user password.
Schema (optional). The database schema to use. If this field is empty and no schema is specified in the URL, the public schema will be used.
Amazon Aurora (PostgreSQL)¶
Amazon Aurora (PostgreSQL) configuration parameters¶
URL. The database connection URL.
Authentication. The authentication method for accessing Amazon Aurora. Choose between Login and password, AWS IAM credentials, or Denodo AWS instance credentials.
Login and password. This is the default authentication method.
Username. Database username.
Password. Database user password.
AWS IAM credentials. Use this option to specify explicit AWS credentials.
AWS access key ID. Unique identifier for the user or entity making the request.
AWS secret access key. Secret string used to sign AWS requests.
AWS IAM role ARN (optional). IAM identity with specific permissions.
AWS region. The AWS region where the Amazon Aurora service is available.
Denodo AWS instance credentials. Use this option if the AWS credentials are pre-configured in the instance where Denodo is running.
AWS IAM role ARN (optional). IAM identity with specific permissions.
AWS region. The AWS region where the Amazon Aurora service is available.
Schema (optional). The database schema to use. If this field is empty and no schema is specified in the URL, the public schema will be used.
Oracle 26ai and higher¶
Oracle 26ai and higher configuration parameters¶
URL. The database connection URL.
Username. Database username.
Password. Database user password.
Embedding Model Configuration¶
Denodo supports these embedding model providers: Amazon Bedrock (Titan), Azure OpenAI Service, OpenAI and Other (OpenAI API Compatible).
Amazon Bedrock (Titan)¶
Dialog with the Amazon Bedrock (Titan) configuration parameters¶
Use custom URL. Enable if you want to specify the URL used for LLM requests.
Endpoint URL. Specify the URL used for LLM requests. Only visible if Use custom URL is enabled.
Authentication. Authentication method for accessing Amazon Bedrock. Choose between AWS IAM credentials or Denodo AWS instance credentials.
AWS IAM credentials. Use this option if you want to specify the AWS credentials.
AWS access key ID. Unique identifier that specifies the user or entity making the request.
AWS secret access key. Secret string that is used to sign the requests you make to AWS.
AWS IAM role ARN (optional). IAM identity with specific permissions.
AWS region. The AWS region where access to the Amazon Bedrock service is available.
Denodo AWS instance credentials. Use this option if the AWS credentials are configured in the instance where Denodo is running.
AWS IAM role ARN (optional). IAM identity with specific permissions.
AWS region. The AWS region where access to the Amazon Bedrock service is available.
Model ID. The identifier used within the Amazon Bedrock service to specify which model should be used. If you enter a custom Model ID, select a model of the “Titan” family.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. This includes both the input (Data Marketplace request) and the output (model response).
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
For more details on Amazon Titan embedding models, go to the Amazon Titan embedding models reference. For more information on the AWS authentication parameters, go to the Amazon AWS security credentials reference.
Azure OpenAI Service¶
Dialog with the Azure OpenAI Service configuration parameters¶
Azure resource name. The name of your Azure resource.
Azure deployment name. The deployment name you chose when you deployed the model.
API version. The API version to use for this operation. This follows the YYYY-MM-DD format.
API key. The API key.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. This includes both the input (Data Marketplace request) and the output (model response). The default value is 16385.
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
Enable proxy. Enable this if there is a proxy between the Data Marketplace and the Azure OpenAI Service.
Host. The hostname (IP or DNS name) of the proxy.
Port. The port number. This parameter is mandatory in order to use the proxy.
Proxy name (optional). The username.
Proxy password (optional). The password.
For more information on the Azure OpenAI Service API parameters, go to the Azure OpenAI Service REST API reference.
OpenAI¶
Dialog with the OpenAI configuration parameters¶
API key. This is the OpenAI API key. This parameter is required.
Organization ID (optional). A header to specify which organization an API request is sent from. This is useful for users who belong to multiple organizations. This parameter is not required.
Model. Embedding model to use. The drop-down values are the ones tested by Denodo. However, if you want to try an untested OpenAI model, you can configure it by pressing the edit icon.
Important
Access to the models listed in the Model field depends on your OpenAI account and organization. You may not have access to all of them.
Configuring an untested OpenAI model can potentially result in erroneous behavior within the functionality.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. If you configure a model from the list, the Data Marketplace will use the official model’s context window value. If you select a custom model and you do not introduce the context window value, a default value of 8191 tokens will be assigned to this parameter. You can choose the custom option to type the number you want.
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
For more details, see the OpenAI embedding models documentation https://platform.openai.com/docs/guides/embeddings#embedding-models. For OpenAI and Other (OpenAI API Compatible) embedding models, you can set up a proxy without authentication using the following JVM properties of the Virtual DataPort server and the web container:
-Dhttps.proxyHost=proxyHost.com
-Dhttps.proxyPort=1234
Other (OpenAI API Compatible)¶
Dialog with the Other (OpenAI API Compatible) configuration parameters¶
Authentication. Configure this option depending on whether the custom provider requires authentication.
API key. The API key. If the authentication is turned on, this parameter is required.
Embeddings URL. The embeddings endpoint. This parameter is required.
Model. Embedding model to use. This parameter is required.
Context window (tokens). The maximum amount of tokens that the model can process in a single interaction. The default value is 8191.
Custom headers (optional). Configure custom headers to be sent with the request to the LLM. For details, see the custom headers configuration section.
The Other (OpenAI API Compatible) option allows the Denodo Assistant to send and process requests from APIs following the OpenAI Create Embeddings approach. An API follows this approach if it adheres to the official Create Embeddings API (see https://platform.openai.com/docs/api-reference/embeddings/create).
There are models that adhere to the OpenAI Create Embeddings approach and are not made by OpenAI. These include Ollama models, such as nomic-embed-text, mxbai-embed-large or all-minilm, which can even be run locally. To configure these models, ensure you use the v1 endpoint: /v1.
Index Configuration¶
The Index Configuration allows users to configure how Denodo handles the indexation:
Dialog with the index configuration parameters¶
Batch Size. Number of elements to index at once.
Refresh metadata interval. Interval in seconds at which the system looks for updates and refreshes the necessary metadata.
Servers Configuration¶
The Servers Configuration section lets users enable and configure the vectorization of data for each server. By default, the vectorization is disabled for all servers. The current vectorization state is represented by a table containing the status for each of the servers.
The servers configuration status is represented by a table that shows the current status of the vectorization for each server.¶
You can start the vectorization of the server’s data following these steps:
First, click on the
options button and then click on the
Edit button.
Enable the vectorization using the Enable server metadata vectorization toggle and the following options will be visible:
Dialog with the VDP Server vectorization configuration parameters¶
Configure the following parameters:
Username. VDP server username. This user must have metadata permissions for all the views that need to be vectorized. If the Use sample data is enabled, the user will also need execute permissions in all columns of all the views that need to be vectorized.
Password. VDP server user password.
Use sample data. This enables the Data Marketplace to vectorize actual data from the view. This can help improve the results when the LLM needs to create queries that contain conditions.
Sample data size. Configure the number of rows to retrieve for sampling. This will create a subset of rows to vectorize, which can improve query results. The maximum value is 500 rows.
Important
If you modify the permissions of the user account used for vectorization, a re-vectorization is required to reflect these changes in the vector database.
You can now check your configuration using the Test Configuration button, or save it.
Important
Whenever you save the server or connection configuration the previous vectorization will be overwritten. A dialog will prompt you to confirm this action.
Dialog warning the user of the delete process of the previous vectorization.¶
While the vectorization process is ongoing, the table will refresh automatically every 10 seconds to show you the current status. You can also manually refresh the table via the
Refresh Table button. If you want to see how many elements are left, you can hover the mouse over the pending status:
Hovering the mouse over the pending status will show the number of elements left to vectorize.¶
The vectorization process for the selected server is completed. There are two possible outcomes:
From here, you can choose to delete the current vectorization or recreate it using the options button if needed.
