JDBC Data Sources¶
To create a JDBC data source, use the statement CREATE DATASOURCE JDBC.
CREATE [ OR REPLACE ] DATASOURCE JDBC <name:identifier>[EMBEDDED_MPP]
[ ID = <literal> ]
[ FOLDER = <literal> ]
[ VAULT_SECRET = <literal> ]
DRIVERCLASSNAME = <literal>
DATABASEURI = <literal>
[
<credentials>
| USE_KERBEROS ( <kerberos_credentials> )
| <credentials> USE_KERBEROS_AT_RUNTIME ( <kerberos_credentials> )
| <credentials> USE_KERBEROS_AT_INTROSPECTION ( <kerberos_credentials> )
| USE_OAUTH ( <oauth_configuration> )
| <credentials> USE_OAUTH_AT_RUNTIME ( <oauth_configuration> )
| <credentials> USE_OAUTH_AT_INTROSPECTION ( <oauth_configuration> )
| WITH PASS-THROUGH SESSION CREDENTIALS ( <pass_through_options> )
| WITH PASS-THROUGH SESSION CREDENTIALS (
[
PASS-THROUGH_KERBEROS_TARGET_PRINCIPAL = <literal>
| CONSTRAINED_DELEGATION_PROPERTY_NAME = <literal>
]
) [
USE_KERBEROS_AT_INTROSPECTION ( <kerberos_credentials> )
| USE_AWS_IAM_CREDENTIALS_AT_INTROSPECTION ( <aws_iam_credentials> )
]
| WITH PASS-THROUGH SESSION CREDENTIALS (
[ REQUIRE_OAUTH ]
[ USE_OAUTH_AT_INTROSPECTION ]
[
USING_TOKEN_PASS_THROUGH
| USING_TOKEN_EXCHANGE_FLOW
| USING_OBO_FLOW
] ( <oauth_configuration> )
)
| USE_AWS_IAM_CREDENTIALS ( <aws_iam_credentials> )
| <credentials> USE_AWS_IAM_CREDENTIALS_AT_RUNTIME ( <aws_iam_credentials> )
| <credentials> USE_AWS_IAM_CREDENTIALS_AT_INTROSPECTION ( <aws_iam_credentials> )
| <gcp_credentials>
]
[ WITH PROXY_CONNECTIONS ]
[ CLASSPATH = <literal> ]
[ DATABASENAME = <literal> DATABASEVERSION = <literal>]
[ ISOLATIONLEVEL =
{ TRANSACTION_NONE
| TRANSACTION_READ_COMMITTED
| TRANSACTION_READ_UNCOMMITTED
| TRANSACTION_REPEATABLE_READ
| TRANSACTION_SERIALIZABLE
}
]
[ IGNORETRAILINGSPACES = { TRUE | FALSE } ]
[ FETCHSIZE = <integer> ]
[ <data load configuration> ]
[ CREDENTIALS_VAULT (
STATUS { ON | DEFAULT }
[ PROVIDER
{ CYBERARK (
APPLICATION_ID = <literal>
{ AGENT
| AGENT_LESS (
CLIENT_KEY = <literal> [ ENCRYPTED ]
[ CLIENT_KEY_PASSWORD = <literal> [ ENCRYPTED ] ]
)
}
)
}
]
)
]
[
<pool configuration 1>
| <pool configuration 2>
| <pool configuration 3>
]
[ <external data configuration> ]
[ PROPERTIES ( <literal> = <literal> [ VAR ] [, <literal> = <literal> [ VAR ] ]* ) ]
[ KERBEROSPROPERTIES ( <literal> = <literal> [, <literal> = <literal> ]* ) ]
[ OAUTHPROPERTIES ( <literal> = <literal> [, <literal> = <literal> ]* ) ]
[ TRANSFER_RATE_FACTOR = <double> ]
[ PROCESSING_UNITS = <integer> ]
[ CPUS_PER_PROCESSING_UNIT = <integer> ]
[ INTERNAL_TRANSFER_RATE = <double> ]
[ <data infrastructure> ]
[ DESCRIPTION = <literal> ]
[ SOURCECONFIGURATION ( [ <source configuration property>
[, <source configuration property> ]* ] ) ]
<pass_through_options> ::=
[ <credentials> ]
[ USE_KERBEROS ]
[ PASSTHROUGH_KERBEROS_TARGET_PRINCIPAL = <literal> ]
[ CONSTRAINED_DELEGATION_PROPERTY_NAME = <literal> ]
<credentials> ::=
USERNAME = <literal> USERPASSWORD = <literal> [ ENCRYPTED ]
| <vault_credentials>
<kerberos_credentials> ::=
KRB_USERNAME = <literal> KRB_USERPASSWORD = <literal> [ ENCRYPTED ]
| KRB_USERNAME = <literal> KRB_KEYTAB = <literal>
<aws_iam_credentials> ::=
AWS_ACCESS_KEY_ID = <literal> AWS_SECRET_ACCESS_KEY = <literal> [ ENCRYPTED ] [ AWS_DATABASE_USER = <literal> ]
| AWS_ACCESS_KEY_ID = <literal> AWS_SECRET_ACCESS_KEY = <literal> [ ENCRYPTED ] AWS_IAM_ROLE_ARN = <literal> [ AWS_DATABASE_USER = <literal> ]
| AWS_IAM_ROLE_ARN = <literal> [ AWS_DATABASE_USER = <literal> ]
| [ AWS_DATABASE_USER = <literal> ]
<gcp_credentials> ::=
GCP (
OAUTH_TYPE = <literal>
PRIVATE_KEY = <literal> [ ENCRYPTED ]
PROJECT_ID = <literal>
SERVICE_ACCOUNT_EMAIL = <literal>
)
<vault_credentials> ::=
FROM_VAULT
| FROM_VAULT (VAULT_SECRET = <literal>, FIELD_AT_SECRET = DEFAULT)
<oauth_configuration> ::=
TOKEN_ENDPOINTURL = <literal>
CLIENT_IDENTIFIER = <literal>
CLIENT_SECRET = <literal> [ ENCRYPTED ]
OAUTH_USER = { <literal> | <vault_credentials> }
OAUTH_PASSWORD = { <literal> [ ENCRYPTED ] | <vault_credentials> }
SCOPE = <literal>
[ OAUTH_TOKEN_PROPERTY_NAME = <literal> ]
<pool configuration 1> ::=
VALIDATIONQUERY = <literal>
INITIALSIZE = <integer>
MAXACTIVE = <integer>
<pool configuration 2> ::=
VALIDATIONQUERY = <literal>
INITIALSIZE = <integer>
MAXACTIVE = <integer>
EXHAUSTEDACTION = <integer>
<pool configuration 3> ::=
VALIDATIONQUERY = <literal>
INITIALSIZE = <integer>
MAXIDLE = <integer>
MINIDLE = <integer>
MAXACTIVE = <integer>
EXHAUSTEDACTION = <integer>
TESTONBORROW = <boolean>
TESTONRETURN = <boolean>
TESTWHILEIDLE = <boolean>
[ <pool sub-configuration 1> ]
<pool sub-configuration 1> ::=
TIMEBETWEENEVICTION = <integer>
NUMTESTPEREVICTION = <integer>
MINEVICTABLETIME = <integer>
[ <pool sub-configuration 2>]
<pool sub-configuration 2> ::=
POOLPREPAREDSTATEMENTS = <boolean>
MAXOPENPREPAREDSTATEMENTS = <integer>
<data infrastructure> ::=
DATA_INFRASTRUCTURE (
[ PROVIDER = <literal> ]
[ REGION = <literal> ]
)
<source configuration property> ::=
ALLOWLITERALASPARAMETER = <property value>
| DELEGATE_BINARY_ORDERBY_COLLATION = <property value>
| DELEGATE_ORDERBY_COLLATION_MODIFIER = <property value>
| DELEGATEAGGREGATEFUNCTIONS = {
DEFAULT
| ( <function:identifier> [, <function:identifier> ]* ] )
}
| DELEGATEALLOPERATORS = <property value>
| DELEGATEANALYTICFUNCTIONSLIST = {
DEFAULT
| ( <function:identifier> [, <function:identifier> ]* ] )
}
| DELEGATEARRAYLITERAL = <property value>
| DELEGATECOMPOUNDFIELDPROJECTION = <property value>
| DELEGATEGROUPBY = <property value>
| DELEGATEHAVING = <property value>
| DELEGATEINNERJOIN = <property value>
| DELEGATEINTERSECTION = <property value>
| DELEGATEINVALIDNUMBERLITERALSASNULL = <property value>
| DELEGATEJOIN = <property value>
| DELEGATELEFTFUNCTION = <property value>
| DELEGATELEFTLITERAL = <property value>
| DELEGATELITERALEXPRESSION = <property value>
| DELEGATEMIXEDAGGREGATEEXPRESSION = <property value>
| DELEGATENATURALOUTERJOIN = <property value>
| DELEGATENOTCONDITION = <property value>
| DELEGATE_OFFSET_RESTRICTION = <delegate offset restriction value>
| DELEGATEOPERATORSLIST = {
DEFAULT
| ( <operator:identifier> [, <operator:identifier> ]* ] )
}
| DELEGATEORCONDITION = <property value>
| DELEGATEORDERBY = <property value>
| DELEGATEPROJECTION = <property value>
| DELEGATEREGISTERLITERAL = <property value>
| DELEGATERIGHTFIELD = <property value>
| DELEGATERIGHTFUNCTION = <property value>
| DELEGATERIGHTLITERAL = <property value>
| DELEGATESCALARFUNCTIONS = {
DEFAULT
| ( <function:identifier> [, <function:identifier> ]* ] )
}
| DELEGATESELECTDISTINCT = <property value>
| DELEGATESELECTION = <property value>
| DELEGATEUNION = <property value>
| SUPPORTSAGGREGATEFUNCTIONSOPTIONS = <property value>
| SUPPORTSBRANCHOUTERJOIN = <property value>
| SUPPORTSEQOUTERJOINOPERATOR = <property value>
| SUPPORTSEXPLICITCROSSJOIN = <property value>
| SUPPORTSFULLEQOUTERJOIN = <property value>
| SUPPORTSFULLNOTEQOUTERJOIN = <property value>
| SUPPORTSFUSINGINUSINGANDNATURALJOIN = <property value>
| SUPPORTSJOINONCONDITION = <property value>
| SUPPORTSNATURALJOIN = <property value>
| SUPPORTSPREPAREDSTATEMENT = <property value>
| SUPPORTSUSINGJOIN = <property value>
<property value> ::=
true
| false
| DEFAULT
<delegate offset restriction value> ::=
DEFAULT
| 'NONE'
| 'FETCH'
| 'ORDER_BY'
| 'FETCH_ORDER_BY'
| 'NO_ORDER_BY'
| 'FETCH_NO_ORDER_BY'
<data load configuration> ::=
DATA_LOAD_CONFIGURATION (
[ USE_FOR_QUERY_OPTIMIZATION = { false | DATA_MOVEMENT | PARALLEL_PROCESSING } ]
[ BATCHINSERTSIZE = <integer> ]
[ BULK_LOAD_CONFIGURATION (
<workdir configuration>
| <athena configuration>
| <redshift configuration 1>
| <redshift configuration 2>
| <hdfs configuration>
| <databricks configuration>
| <oracle configuration>
| <SQL Server configuration>
| <db2 configuration>
| <gcp bulk credentials>
)
]
[ TARGET_CATALOG = <literal> [ ESCAPE ] ]
[ TARGET_SCHEMA = <literal> [ ESCAPE ] ]
[ USE_EXTERNAL_TABLES (
ON_MOVE_READ = <boolean>,
ON_MOVE_WRITE = <boolean>
)
]
[ CREATE_TABLE_TEMPLATES ( <create_table_template> [, <create_table_template> ]* )
<default_create_table_templates> ]
)
<external data configuration> ::=
EXTERNAL_DATA_CONFIGURATION (
FILE_SYSTEM { HDFS | S3 | ADLS }
AUTHENTICATION ( <aws_authentication> | <azure_authentication> )
[HADOOP_CUSTOM_PROPERTIES = ( <literal> = <literal> [, <literal> = <literal> ]* ) ]
EXTERNAL_DATA_ROUTES (
<external_data_routes_list>
)
)
<aws_authentication> ::=
[ AWS_ACCESS_KEY_ID = { <literal> | <vault_credentials> }]
[ AWS_SECRET_ACCESS_KEY = { <literal> [ ENCRYPTED ] | <vault_credentials> } ]
[ AWS_IAM_ROLE_ARN = <literal> ]
[ AWS_VAULT_SECRET = <literal> ]
<azure_authentication> ::= <azure_authentication_shared_key> |
<azure_authentication_oauth_client> |
<azure_authentication_oauth_managed_identity>
<azure_authentication_shared_key> ::=
AZURE_AUTH_TYPE = SHARED_KEY
SHARED_KEY = <literal> [ ENCRYPTED ]
<azure_authentication_oauth_client> ::=
AZURE_AUTH_TYPE = OAUTH_CLIENT_CREDENTIALS
OAUTH_CLIENT_ID = <literal> | <vault_credentials>
[ OAUTH_CLIENT_SECRET = <literal> [ ENCRYPTED ] | <vault_credentials> ]
OAUTH_TOKEN_ENDPOINT = <literal>
OAUTH_VAULT_SECRET = <literal>
<azure_authentication_oauth_managed_identity> ::=
AZURE_AUTH_TYPE = OAUTH_MANAGED_IDENTITY
[ OAUTH_CLIENT_ID = <literal> | <vault_credentials> ]
[ OAUTH_TENANT_ID = <literal> ]
[ OAUTH_TOKEN_ENDPOINT = <literal> ]
<external_data_routes_list> ::=
<external_data_routes_template> |
<external_data_routes_list> COMMA <external_data_routes_template>
<external_data_routes_template> ::= <identifier_name> EQ <literal>
<create_table_template> ::=
<template_name:identifier> = <template_definition:literal>
[ DEFAULT( <parameter_default_value> [, <parameter_default_value> ]* ) ]
Note: You can use parameters within the template definition. Syntax: @{parameter_name}
<parameter_default_value> ::= <parameter_name:literal> = <parameter_value:literal>
<default_create_table_templates> ::=
DEFAULT CREATE_TABLE_TEMPLATE <template_name:identifier>
| DEFAULT CREATE_TABLE_TEMPLATES (
[ DATA_MOVEMENT = { DATA_SOURCE_DEFAULT | <data_movement_template_name:identifier> }, ]
[ CACHE = { DATA_SOURCE_DEFAULT | <cache_template_name:identifier> }, ]
[ CACHE_MANAGEMENT = { DATA_SOURCE_DEFAULT | <cache_management_template_name:identifier> }, ]
[ REMOTE_TABLE = { DATA_SOURCE_DEFAULT | <remote_table_template_name:identifier> }, ]
[ SUMMARY = { DATA_SOURCE_DEFAULT | <summary_template_name:identifier> } ]
)
To modify a JDBC data source, use ALTER DATASOURCE JDBC.
ALTER DATASOURCE JDBC <name:identifier>
[ ID = <literal> ]
[ FOLDER = <literal> ]
[ VAULT_SECRET = <literal> ]
[ DRIVERCLASSNAME = <literal> ]
[ DATABASEURI = <literal> ]
[
<credentials>
| USE_KERBEROS ( <kerberos_credentials> )
| <credentials> USE_KERBEROS_AT_RUNTIME ( <kerberos_credentials> )
| <credentials> USE_KERBEROS_AT_INTROSPECTION ( <kerberos_credentials> )
| WITH PASS-THROUGH SESSION CREDENTIALS ( <pass_through_options> )
| WITH PASS-THROUGH SESSION CREDENTIALS ( )
[ USE_KERBEROS_AT_INTROSPECTION ( <kerberos_credentials> )
| USE_AWS_IAM_CREDENTIALS_AT_INTROSPECTION ( <aws_iam_credentials> ) ]
| USE_AWS_IAM_CREDENTIALS ( <aws_iam_credentials> )
| <credentials> USE_AWS_IAM_CREDENTIALS_AT_RUNTIME ( <aws_iam_credentials> )
| <credentials> USE_AWS_IAM_CREDENTIALS_AT_INTROSPECTION ( <aws_iam_credentials> )
| <gcp_credentials>
[ WITH PROXY_CONNECTIONS ]
]
[ CLASSPATH = <literal> ]
[
DATABASENAME = <literal>
DATABASEVERSION = <literal>
]
[ ISOLATIONLEVEL =
TRANSACTION_NONE
| TRANSACTION_READ_COMMITTED
| TRANSACTION_READ_UNCOMMITTED
| TRANSACTION_REPEATABLE_READ
| TRANSACTION_SERIALIZABLE
]
[ IGNORETRAILINGSPACES = { true | false } ]
[ FETCHSIZE = <integer> ]
[ BATCHINSERTSIZE = <integer> ]
[ CREDENTIALS_VAULT (
STATUS { ON | DEFAULT }
[ PROVIDER
{ CYBERARK (
APPLICATION_ID = <literal>
{ AGENT
| AGENT_LESS (
CLIENT_KEY = <literal> [ ENCRYPTED ]
[ CLIENT_KEY_PASSWORD = <literal> [ ENCRYPTED ] ]
)
}
)
}
]
)
]
[
<pool configuration 1>
| <pool configuration 2>
| <pool configuration 3>
]
[ <external data configuration> ]
[ PROPERTIES ( <literal> = <literal> [ VAR ] [, <literal> = <literal> [ VAR ] ]* ) ]
[ KERBEROSPROPERTIES ( <literal> = <literal> [, <literal> = <literal> ]* ) ]
[ USEEXTERNALTABLES (
ONMOVEREAD = <boolean>,
ONMOVEWRITE = <boolean>
)
]
[ DATAMOVEMENT_TARGET = { true | false } ]
[ TARGET_CATALOG = <literal> [ ESCAPE ] ]
[ TARGET_SCHEMA = <literal> [ ESCAPE ] ]
[ TRANSFER_RATE_FACTOR = <double> ]
[ PROCESSING_UNITS = <integer> ]
[ CPUS_PER_PROCESSING_UNIT = <integer> ]
[ INTERNAL_TRANSFER_RATE = <double> ]
[ CREATE_TABLE_TEMPLATES ( <create_table_template> [, <create_table_template> ]* ) ]
[ <default_create_table_templates> ]
[ <data infrastructure> ]
[ DESCRIPTION = <literal> ]
[ SOURCECONFIGURATION ( [ <source configuration property>
[, <source configuration property> ]* ] ) ]
<pool configuration 1> ::= (see CREATE DATASOURCE JDBC)
<pool configuration 2> ::= (see CREATE DATASOURCE JDBC)
<pool configuration 3> ::= (see CREATE DATASOURCE JDBC)
<pool sub-configuration 1> ::= (see CREATE DATASOURCE JDBC)
<pool sub-configuration 2> ::= (see CREATE DATASOURCE JDBC)
<source configuration property> ::= (see CREATE DATASOURCE JDBC)
<credentials> ::= (see CREATE DATASOURCE JDBC)
<create_table_template> ::= (see CREATE DATASOURCE JDBC)
<default_create_table_templates> ::= (see CREATE DATASOURCE JDBC)
Explanation of some of the parameters of these statements:
OR REPLACE
: If present and a data source with the same name exists, the current definition is substituted with the new one.FOLDER
: name of the folder where the data source will be stored.DRIVERCLASSNAME
: The driver class to be used for connection to the data source.DATABASEURI
: The connection URL to the database.The authentication methods available to connect to a database are the following:
<credentials>
: provide theUSERNAME
andPASSWORD
to connect to the database to execute queries and for the introspection process (i.e. to display the tables/views of the database in the “Create base view” dialog of the data source).USE_KERBEROS ( <kerberos_credentials> )
: use Kerberos to connect to the database to execute queries and for the introspection process (i.e. to display the tables/views of the database in the “Create base view” dialog of the data source).<credentials> USE_KERBEROS_AT_RUNTIME ( <kerberos_credentials> )
: use Kerberos to connect to the database to execute queries, but login and password for the introspection process.<credentials> USE_KERBEROS_AT_INTROSPECTION ( <kerberos_credentials> )
: use login and password to connect to the database to execute queries, but use Kerberos for the introspection process.WITH PASS-THROUGH SESSION CREDENTIALS ( <pass_through_options> )
: use login and password or AWS IAM credentials for the introspection process and the credentials of the client that connected to the Virtual DataPort server to execute queries. The credentials used to run queries can be Kerberos or login/password depending on the authentication the client used to connect to the Virtual DataPort server. If<pass_through_options>
specifiesUSE_KERBEROS
, the Server will use the login/password to create the Kerberos ticket.USE_AWS_IAM_CREDENTIALS ( <aws_iam_credentials> )
: use AWS IAM credentials to connect to the database to execute queries and for the introspection process (i.e. to display the tables/views of the database in the “Create base view” dialog of the data source).<credentials> USE_AWS_IAM_CREDENTIALS_AT_RUNTIME ( <aws_iam_credentials> )
: use AWS IAM credentials to connect to the database to execute queries, but login and password for the introspection process.<credentials> USE_AWS_IAM_CREDENTIALS_AT_INTROSPECTION ( <aws_iam_credentials> )
: use login and password to connect to the database to execute queries, but use AWS IAM credentials for the introspection process.If you create a data source with this option, but you want to query a view of this data source with other credentials than the ones used to connect to the Server, add the parameters
USERNAME
andPASSWORD
to theCONTEXT
. These two parameters are only taken into account when the data source has been created with the optionWITH PASS-THROUGH SESSION CREDENTIALS
.For example, if
view1
has been created with the optionWITH PASS-THROUGH SESSION CREDENTIALS
and you execute this:SELECT * FROM view1 CONTEXT( USERNAME = 'admin' , PASSWORD = 'd4GvpKA5BiwoGUFrnH92DNq5TTNKWw58I86PVH2tQIs/q1RH9CkCoJj57NnQUlmvgvvVnBvlaH8NFSDM0x5fWCJiAvyia70oxiUWbToKkHl3ztgH1hZLcQiqkpXT/oYd' ENCRYPTED , DOMAIN = 'ACME_DOMAIN')
the Server will connect to the Web service with the username
admin
, passwordpassword
and domainacme_domain
, ignoring the credentials used by the user to connect to the Server.It is mandatory to add the token
ENCRYPTED
and enter the password encrypted. To encrypt the password, execute the statementENCRYPT_PASSWORD
. For example:ENCRYPT_PASSWORD 'my_secret_password';
WITH PASS-THROUGH SESSION CREDENTIALS () USE_KERBEROS_AT_INTROSPECTION ( <kerberos_credentials> )
: use Kerberos authentication for the introspection process and the credentials of the client that connected to the Virtual DataPort server to execute queries. The credentials used to run queries can be Kerberos or login/password depending on the authentication the client used to connect to the Virtual DataPort server.
Important
There are important implications of using “pass-through session credentials”. To read about them, search “pass-through credentials” on the section Importing JDBC Sources of the Administration Guide.
WITH PROXY_CONNECTIONS
: if present, the data source will use the feature “proxy authentication” of Oracle. The section How Oracle Proxy Authentication Works of the Administration Guide explains how this feature works.CLASSPATH
: Path to the JAR file containing the JDBC driver for the specified source (optional).DATABASENAME
andDATABASEVERSION
: Name and version of the database to be accessed.ISOLATIONLEVEL
: sets the desired isolation level for the queries and transactions executed in the database. If not present, the data source uses the default isolation level of the database.IGNORETRAILINGSPACES
: Iftrue
, the Server removes the space characters at the end oftext
type values of the results returned by these data source’s views.FETCHSIZE
: gives the JDBC driver a hint as to the number of rows that should be fetched from the database when more rows are needed.With the clause
CREDENTIALS_VAULT
, you can customize the Credentials Vault configuration at the Data Source level, which is particularly useful when you need to manage credentials for multiple databases with varying configurations. By allowing different configurations for different data sources, you can achieve a more secure and efficient credential management process.BATCHINSERTSIZE
: when the data source has to insert several rows into the database of this data source, it can insert them in batches. This number sets the number of queries per batch.This value is used only when inserting rows into the database of this data source as a result of moving data from another data source into this one. See more about this in the section Data Movement of the Administration Guide.
The section The Pool of Connections of the JDBC Data Sources below explains the parameters of “<pool configuration 1>”, “<pool configuration 2>” and “<pool configuration 3>”.
Parameters of the pool of stored procedures:
POOLPREPAREDSTATEMENTS
: iftrue
, the pool of prepared statements is enabled.MAXOPENPREPAREDSTATEMENTS
: maximum number of opened prepared statements. Only taken into account ifPOOLPREPAREDSTATEMENTS
istrue
.
USEEXTERNALTABLES
: options regarding the use of the database’s proprietary APIs to read and write data from/to this database.Writing data into a database using its proprietary API is called “Bulk data load”.
There are two reasons that the Server may write data using the bulk data load APIs of the database:
When performing a data movement. The section Data Movement of the Administration Guide explain what they are.
When loading the cache of a table.
The section Bulk Data Load of the Administration Guide explains in detail how this process works.
ONMOVEREAD
(only taken into account when the database is Netezza): iftrue
, when the Execution Engine reads data from the Netezza database to perform a data movement, it will do so using its “External tables” feature. Setting this totrue
is equivalent to selecting the check box “Use external tables for data movement” of the “Read settings”, on the “Read & Write” tab of the data source.ONMOVEWRITE
: iftrue
, when the Execution Engine writes data to this database to perform a data movement, it does so using its proprietary API. Setting this toyes
is equivalent to selecting the check box “Use Bulk Data Load APIs” of the “Write settings”, on the “Read & Write” tab of the data source.
DATAMOVEMENT_TARGET
: iftrue
, the data source can be the target of a data movement. Setting this totrue
is equivalent to selecting the check box “Can be data movement target” of the “Write settings”, on the “Read & Write” tab of the data source.BULK_LOAD_CONFIGURATION
: settings of the bulk data load API of the database. The settings you can indicate depend on the database adapter so it is better to change them from the administration tool.CREATE_TABLE_TEMPLATES
: property to configure the SQL commands that VDP executes for creating the cache tables, data movements tables, remote tables, summaries, temporary views, and materialized views. TheDATA_SOURCE_DEFAULT
is the default table creation command used by Denodo when the user did not set a custom table creation command. See section Cache Table Creation Templates for more information.
Conventions to modify the table creation templates using an ALTER DATASOURCE command:
If the
ALTER DATASOURCE
command does not include theCREATE_TABLE_TEMPLATES
token, the VDP server will keep the current values for the templates.If the
ALTER DATASOURCE
command includes an empty section:CREATE_TABLE_TEMPLATES ()
, the VDP server will remove all the data source templates, and theCREATE_TABLE_TEMPLATES
section will not appear in the data source VQL.If the
ALTER DATASOURCE
command includes theCREATE_TABLE_TEMPLATES
section specifying one or more templates, those will replace the existing ones in the data source.
Data source configuration parameters (
SOURCECONFIGURATION
). Virtual DataPort allows indicating specific characteristics of the underlying data sources, so that they are taken into account when executing statements on them. See section Data Source Configuration Properties for further details.PROPERTIES
: list of name/value pairs that will be passed to the JDBC driver when creating connection with this database. If the tokenVAR
is present, the property will be considered an expression instead of a literal.KERBEROSPROPERTIES
: list of name/value pairs that will be passed to the JDBC driver when creating connection with this database. The properties on this list are meant to configure the Kerberos authentication mechanism between the Virtual DataPort server and the database. See the section Connecting to a JDBC Source with Kerberos Authentication of the Administration Guide for more details about Kerberos authentication in JDBC data sources.The following optional parameters specify information about the data source. The cost optimizer uses these values when evaluating the cost of an execution plan. The default values are usually correct so you should not specify these parameters unless you have a deep knowledge of the cost optimizer.
TRANSFER_RATE_FACTOR
: relative measure of the speed of the network connection between the Denodo server and the data source. Use the default value (1) if the data source is accessible through a conventional 100 Mbps LAN. Use higher values for faster networks and lower values for data sources accessible through a WAN.PROCESSING_UNITS
: In parallel databases, the number of SPUs.CPUS_PER_PROCESSING_UNIT
: In parallel databases, the number of CPUs per SPU.INTERNAL_TRANSFER_RATE
: transfer rate in kilobytes per millisecond.
The Pool of Connections of the JDBC Data Sources¶
The JDBC data sources of Denodo are created with a connection pool. A connection pool is a cache of connections to a database. At run time, when a query sent by an application involves sending a query to this data source, the execution engine requests a connection to the pool of the data source. If there are idle connection in the pool, it returns one. If there are no idle connections, the pool creates one. Then, the execution engine executes the query on this connection and once the query finishes, it returns the connection back to the pool. This connection is now available for the next query.
Using a connection pool significantly improves the performance of the queries involving a database. The reason is that creating a new connection is costly in terms of time and resources. For each new connection, Denodo has to establish a network connection with the database, the database has to authenticate the client, allocate new memory to the new connection, etc. If the connection is already created, this process is avoided.
Connection pools provide the following benefits:
Reduce the number of times a new connection has to be created.
Reduce the time of getting a connection to execute a query.
Provide a way of limiting the amount of connections opened to a database.
These are the reasons new JDBC data sources have a connection pool and in general, it should not be disabled.
The exact behavior of the connection pool depends on its settings.
Parameters of the Connection Pool of a JDBC Data Source¶
The connection pool of a JDBC data source is configured with the following parameters. Most of them cannot be changed from the administration tool, only with the statement ALTER DATA SOURCE JDBC
.
MAXACTIVE
(default value: 20): maximum number of connections that can be opened at a given time. These are the connections currently used to run queries plus the connections idle in the pool. Note that theMAXACTIVE
connection limit is applied to the XA and non-XA connection pools independently. So, a default value of 20 will result in a total max connection pool limit of 40 (20 for XA connection and 20 for non-XA connections).You can change this parameter graphically, in the dialog in the Connection Pool Configuration, field Maximum number of active connections.
If negative (e.g. -1), there is no limit to the number of connections in the pool at one time.
Enter 0 to disable the connection pool of the data source. In this case, the other parameters of the connection pool are ignored.
In this case, for each query, the data source opens a new connection to the database and once the query finishes, it closes the connection. If the query is part of a transaction (the client executed
BEGIN
earlier but did not executeCOMMIT
orROLLBACK
yet), the execution engine keeps the connection open until the transaction finishes. That is because as transactions in Denodo are distributed, the transaction needs to be confirmed or rolledback in all the databases involved in the query. See more about transactions in the section Transactions in Virtual DataPort.
Denodo uses a different connection pool for transactional queries (XA pool). Denodo offers additional property parameters to control and limit the XA connection pool specifically. When both the
MAXACTIVE
parameter and theXA.MAXACTIVE
parameters are set, the lower limit will apply. To change that, you can specify a property to modify this behavior at data source level:-- This command changes the maximum number of XA connections to 10 for the data source 'dataSourceName' in the database 'databaseName'. SET 'com.denodo.vdb.misc.datasource.JDBCDataSource.pool.databaseName.dataSourceName.xa.maxActive' = '10';
MAXACTIVE
is ignored ifEXHAUSTEDACTION
is set to 2 (see the explanation below).MAXIDLE
(default value: -1): maximum number of connections that can sit idle in the pool at any time.If -1 or lower, there is no limit to the number of connections that may sit idle at one time.
If 0 or higher and the parameters
TIMEBETWEENEVICTION
andMINEVICTABLETIME
are greater than 0, the connections that have been idle for more than a certain time will be closed.
MINIDLE
(default value: 0): minimum number of connections that are idle in the pool of connections. Useful to guarantee that there are always idle connections in the pool so a query never has to wait for the pool to open a new connection. In order to satisfyMINIDLE
, the pool will never create more thanMAXACTIVE
connections. With XA pool configuration the value ofMINIDLE
is the same asINITIALSIZE
.EXHAUSTEDACTION
(default value: 1): specifies the behavior of the pool when the pool is empty (all the connections are running queries). The possible values are:0
: the data source returns an error and the query will fail.1
: the pool waits until a connection is available, or the maximum waiting time is reached. If the maximum wait time is reached, the query fails.The default maximum wait time is 30 seconds but it can be changed. If the maximum wait time is
-1
, the data source will wait indefinitely until a connection is available. To change this property, execute the following command from the VQL Shell and restart the Virtual DataPort server.-- This represents 20 seconds because the value of this property is in milliseconds. SET 'com.denodo.vdb.misc.datasource.JDBCDataSource.pool.maxWait' = '20000';
After changing the maximum wait time and restarting, the existing JDBC data sources will still have the same maximum wait time but if they are modified, their maximum wait time will change to the new value.
2
: the data source will create a new connection if there are no idle connections. This makesMAXACTIVE
meaningless.
If XA connection pool is enabled only supports the value
1
(EXHAUSTED_BLOCK
).INITIALSIZE
(default value: 4): number of connections with which the pool is initialized.This value is ignored when the data source is created with the authentication option “Pass-through session credentials”. With this option, the Server will create one pool of connections for each user account that connects to this database and initially, these pools will only have one connection regardless of the value of
INITIALSIZE
. This is to prevent creating too many unnecessary connections.This property can be changed to specify a different value for XA connections.
-- This command changes the number of XA connections with which the pool is initialized to 1 for the data source 'dataSourceName' in the database 'databaseName'. SET 'com.denodo.vdb.misc.datasource.JDBCDataSource.pool.databaseName.dataSourceName.xa.initialIdle' = '1';
VALIDATIONQUERY
(default value: depends on the adapter): SQL query executed by the connection pool to check if a connection is still valid; also known as “ping query”. It is only used when at least one ofTESTONBORROW
,TESTONRETURN
orTESTWHILEIDLE
aretrue
.You can change this parameter graphically, in the dialog in the Connection Pool Configuration, field Ping query.
When you create a JDBC data source using the administration tool and select a database adapter other than Generic, the data source is automatically created with the appropriate validation query for that database. However, if you select Generic, you have to provide a valid ping query. This query is executed often so its footprint has to be as low as possible. In addition, the table queried by the pool has to exist and the user needs to have privileges to run it.
Examples of ping queries:
SELECT 1 SELECT 1 FROM dual SELECT 1 FROM view LIMIT 0
The ping query can contain the interpolation variable
USER_NAME
. At runtime, this variable will be replaced with the user name of the Virtual DataPort user that runs the query.For example, let us say that the ping query of a JDBC data source is
CALL DBMS_SESSION.SET_IDENTIFIER('@{USER_NAME}')
.If the user
scott
executes a query that involves this data source, Virtual DataPort will execute the queryCALL DBMS_SESSION.SET_IDENTIFIER('scott')
over the connection returned by the pool, to check that this connection is still valid and not stale.If
TESTONBORROW
is true, the ping query is executed every time before the actual query is sent to the database. Being able to put in this query the username that is currently running the query can be useful to execute a statement on the database or for auditing purposes.TESTONBORROW
(default value: true): if true and the parameterVALIDATIONQUERY
is not empty, the pool will execute theVALIDATIONQUERY
on the selected connection before returning it to the execution engine. If theVALIDATIONQUERY
fails, the pool will discard the connection and select another one. If there are no more idle connections, it will create one.This property can be changed to specify a different value for XA connections.
-- This command disables the ping query for XA connections for the data source 'dataSourceName' in the database 'databaseName'. SET 'com.denodo.vdb.misc.datasource.JDBCDataSource.pool.databaseName.dataSourceName.xa.testOnBorrow' = 'false';
TESTONRETURN
(default value: false): if true and the parameterVALIDATIONQUERY
is not empty, the pool will execute theVALIDATIONQUERY
on the connections returned to the pool. If theVALIDATIONQUERY
fails, the pool will discard the connection. This parameter is not supported in XA connection pools.
The pool can be configured to periodically check that the connections in the pool are still valid and/or to close the connections that have sat idle in the pool for more than a certain period of time.
To check if a connection is still valid, the pool executes the VALIDATIONQUERY
. If it fails, the pool discards the connection. The main reasons for the validation query to fail are:
The database has a mechanism that closes connections after a certain period of inactivity.
A firewall placed in between the Denodo server and the database automatically closes any connection after a certain period of inactivity.
The main goals of enabling this option are:
Make sure that the connections of the pool are valid at all times. That way, when a connection is needed, the connections of the pool are always valid.
Close connections that have not been used in a while thus freeing resources in the database.
The task of verifying connections and closing the idle ones is performed by the “connection eviction thread” (there is one per pool of connections). Periodically, this thread awakes and perform these checks.
The parameters that control the connection eviction thread are:
TIMEBETWEENEVICTION
(default value: -1): how long in milliseconds the eviction thread should sleep before “runs” of examining idle connections. If negative, the eviction thread is not launched.Eviction runs contend with the execution engine for access to the pool, so if this thread runs too frequently, performance issues may result.
With XA connections this parameter has a fixed value of 300000 milliseconds.
MINEVICTABLETIME
(default value: 1800000 - 30 minutes): minimum amount of time in milliseconds that a connection may sit idle in the pool before it is eligible for eviction.When is less than 1, no connections will be evicted from the pool due to idle time. In this case, the eviction thread will only discard connections that return an error when executing the validation query.
Even if a connection has been idle for more than
MINEVICTABLETIME
, it will not be closed if the number of idle connections was less thanMINIDLE
after closing that connection.This parameter has no effect if
TIMEBETWEENEVICTION
is less than 1.With XA connections this parameter has a fixed value of 600000 milliseconds.
TESTWHILEIDLE
(default value: false): if true and the parameterVALIDATIONQUERY
is not empty, the connections will be validated by the connection eviction thread. To validate a connection, the thread runs the validation query on the connection. If it fails, the pool drops the connection from the pool.This parameter has no effect if
TIMEBETWEENEVICTION
is less than 1.This parameter is not supported in XA connection pools.
NUMTESTPEREVICTION
(default value: 3): number of connections examined in each run of the connection eviction thread. The reason for not examining all the connections of the pool in each run of the thread is because while a connection is being examined, it cannot be used by a query.This parameter has no effect if
TIMEBETWEENEVICTION
is less than 1.With XA configuration, if this parameter is not used, all the objects in the pool are eligible for eviction.
Recommended Settings of the Connection Pool in Environments with a Firewall¶
If there is a firewall between the host where the Denodo server runs and the database, and the firewall is configured to close inactive connections after a certain period of inactivity, you must enable the “connection eviction” thread of that data source. By doing that, the connection pool will periodically execute the validation query, which will maintain the connections alive. Otherwise, the firewall will close these connections. When a firewall closes a connection, it does not usually notify the participants of that connection that the connection is going to be closed. The problem is that if the pool executes the validation query on a connection that has already been dropped by the firewall, the pool still thinks this is a valid connection and it may take several minutes until the query fails with a timeout. During this time, the execution engine will be waiting for the pool to return a connection. For this reason, it is better to configure the eviction thread to run and test the idle connections.
To enable the connection eviction thread, follow these steps:
In the administration tool, open the VQL Shell.
Execute
DESC VQL DATASOURCE JDBC <name of the data source>;
.Copy the part that starts with
VALIDATIONQUERY
up untilMINEVICTABLETIME
.Run
ALTER DATASOURCE JDBC <name of the data source>
followed by what you copied in the previous step, with these changes:Set
TESTWHILEIDLE
totrue
Set
TIMEBETWEENEVICTION
to300000
(five minutes)
For example,
ALTER DATASOURCE JDBC ds_jdbc_oracle_finance
VALIDATIONQUERY = 'Select 1'
INITIALSIZE = 4
MAXIDLE = -1
MINIDLE = 0
MAXACTIVE = 20
EXHAUSTEDACTION = 1
TESTONBORROW = true
TESTONRETURN = false
TESTWHILEIDLE = true
TIMEBETWEENEVICTION = 300000
NUMTESTPEREVICTION = 3
MINEVICTABLETIME = 1800000;
With this change, every five minutes (300,000 milliseconds), the pool will examine three connections that are idle from the pool (NUMTESTPEREVICTION
). For each one of these three connections, the pool will do the following:
The pool will close a connection if it has been idle for more than 30 minutes (1,800,000 seconds).
If the connection has been idle for less time, it will run the query
Select 1
(VALIDATIONQUERY
) on this connection. If the validation query fails, it will drop this connection.
Another alternative for when the firewall closes inactive connections is to disable the pool of connections. However, this has performance implications because when doing this, instead of reusing the connections, the data source will have to open a new connection for each query instead of reusing them. This will increase the execution time of all the queries.
To disable the connection pool of a data source, follow the same steps as above, set the parameter MAXACTIVE
of the data source to 0.