Server Set-up¶
This configuration area is divided in several sections.
Server¶
Server Ports¶
The Scheduler server uses three port numbers for client communications: the server execution port, the server stoppage port, and the auxiliary port.
Important
When the Scheduler server and the Administration Tool or other clients are installed in different machines, you may need to change the interface where the server listens to connections. To change this, open the Denodo Control Center, open the “Configuration” dialog and change the “RMI Host” of the “JVM Options”. See more details in the section Denodo Platform Configuration of the Installation Guide. The reason for this is that in some environments, when listening to ‘localhost’, the server does not allow connections from remote hosts.
Note
Where the connection between customers and the Scheduler server has to go through a firewall, this must be configured to allow for access to the execution port and the auxiliary port.
Note
Server Port configuration is disabled in Agora.
The port changes will take effect the next time the Scheduler server restarts.
Threads Pool¶
Note
Threads Pool configuration is disabled in Agora.
The Scheduler server allows you to execute various extraction jobs simultaneously. Additionally, the VDP, VDPCache and VDPIndexer jobs allow the same job to run different queries concurrently on the same data source, varying the parameters.
You can specify the maximum number of jobs that the server will execute concurrently with the parameter Maximum number of concurrent jobs (by default 20). A change to the number of concurrent jobs will take effect the next time the Scheduler server is restarted.
With regards to VDP, VDPCache and VDPIndexer jobs, the Scheduler server uses a pool of reusable threads for managing the execution of multiple queries which the same job can generate. The parameters that can be configured are as follows:
Default number of threads (corePoolSize). This represents the number of threads in the pool from which the inactive threads are reused (20 by default). Whilst there are fewer threads than these in the pool, new threads will continue to be created. When a thread is requested and the number of threads in the pool is the same as or more than this value, inactive threads are returned, if they exist; otherwise, new threads will continue to be created until obtaining the value established by the following parameter. Intuitively, this parameter indicates the number of threads that the system should have active simultaneously in normal load conditions.
Maximum number of threads (maximumPoolSize). Represents the maximum number of pool threads (by default 60).
Keep alive timeout (milliseconds). Specifies the maximum time in milliseconds that an inactive thread stays in the pool, if the number of threads exceeds the total indicated in the Default number of threads (by default 0). If the value is 0, the threads created above this value end, once the execution of their job has been completed. Otherwise, the ones that exceed the time specified by this parameter will end.
This thread pool works this way:
The executor service has an internal queue, used for holding tasks before they are executed.
The Default number of threads parameter is the number of core threads that will be instantiated and kept in the pool.
When a new task is submitted,
if fewer than the Default number of threads threads are running, the Executor always prefers adding a new thread rather than queuing.
if the Default number of threads or more threads are running, the Executor always prefers queuing a request rather than adding a new thread. So:
if the internal queue is not full, the task is queued.
If this task is in the queue more than the allowed time, it is dequeued and its execution is discarded.
If internal queue is full, a new thread is created unless this would exceed the Maximum number of threads, in which case, the task will be rejected.
There is a rejection policy to reattempt the enqueue of the task a certain number of times, at different intervals.
The tasks will be waiting in the queue for a thread for a maximum amount of time.
Mail Settings¶
This section allows you to modify the properties of the outgoing mail server (Handler Section) used to send reports on the execution of jobs.
You can specify the following information:
SMTP server. Machine name in which the e-mail server runs.
Port. Port number in which the server runs. By default, 25.
Protocol. Protocol used to send the e-mails: SMTP or SMTPS.
From. The e-mail address used by Scheduler to send the mail.
Subject. The subject of the mail. The subject may contain variables, which will be replaced by their corresponding values at execution time (they take their value from the job data). The allowed variables are:
projectName
,projectID
,jobName
,jobID
,jobStartTime
andjobResult
, with the format@{<var_name>}
. It is possible to formatjobStartTime
variable by specifying output date pattern. The syntax is@{jobStartTime,outputDateFormat:yyyy-MM-dd}
. As an example:Denodo Scheduler Notification - @{jobResult} [@{projectName} - @{jobName}_@{jobID}] at @{jobStartTime,outputDateFormat:yyyy-MM-dd HH:mm:ss}
Trust servers. If set to “*”, all hosts are trusted. If set to a whitespace separated list of hosts, those hosts are trusted. Otherwise, it depends on the certificate the server presents.
Timeout. The maximum time to wait while sending mail to the server, once the connection is established. Set “0” to wait indefinitely.
Connection Timeout. The maximum time to wait to establish a connection to the server. Set “0” to wait indefinitely.
Enable TLS. If true, enables the use of the STARTTLS command (if supported by the server) to switch the connection to a TLS-protected connection before issuing any login commands. Note that an appropriate trust store must be configured so that the client will trust the server’s certificate. Defaults to false.
Enable debug. If true, it allows to trace the mail connection. In addition, it is necessary to add the following logger to the
<DENODO_HOME>/conf/scheduler/log4j2.xml
file:<Logger name="com.denodo.scheduler.core.commons.utils.MailUtil" level="DEBUG" />
Send as HTML. If true, the mailed reports are transformed to HTML format with the XSLT template
<DENODO_HOME>/work/scheduler/mail/XMLtoHTML.xsl
. Otherwise the report is sent as a plain string.
Note
These properties will be stored in the metadata database if they are commented out or deleted from the local configuration file <DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties
. This way, they will be shared among different Scheduler servers in a cluster environment. If you want to overwrite the configuration of any of these properties in one node, you can uncomment it (search for Mail/<property>
) in that local configuration file.
Note
To customize the mailed reports, edit the XSLT file located at <DENODO_HOME>/work/scheduler/mail/XMLtoHTML.xsl
. A copy of the original file is stored at <DENODO_HOME>/setup/scheduler/mail/XMLtoHTML.xsl
for reference.
Note
From the context menu View reports section, users with global administration privileges can check the appearance of mailed reports for jobs configured with Mail Handler.
Additionally, if the outgoing mail server requires authentication, user name (Username) and its password (Password) must be specified.
Once configured the mail properties, the link Test mail configuration allows you to test if it is properly configured. You have to specify a list of email addresses (separated by a “,”) a test mail will be sent to and click on Send email. If the configuration is right, those recipients will receive an email with a body informing that it is a test email.
Authentication¶
Virtual DataPort Settings¶
Denodo Scheduler servers can delegate the authentication of the users to a Denodo Virtual DataPort server (see section Authentication). Moreover, when Scheduler connects to Virtual DataPort to authenticate a user, it also retrieves its roles (which determine the user access rights in Scheduler).
In this way, Scheduler needs to know information about the Virtual DataPort server used for authentication and roles retrieval. This section allows specifying the following information:
Host. Machine name in which the Virtual DataPort server runs.
Port. Port number in which the Virtual DataPort server runs.
Database (optional). Name of the Virtual DataPort server database, which should be used for the authentication (useful when the database uses LDAP authentication).
Changes in any of these parameters require the user to logout and login again to take effect.
Note
As commented previously in this document, the non-local users, together with its roles, must be created in the Virtual DataPort server by an administrator user, and VDP-based authentication is only possible if connection with the configured Virtual DataPort server is possible (otherwise, only local-based authentication can take place).
Kerberos Settings¶
Important
To enable single sign-on (SSO), you have to enable Kerberos in both the Scheduler administration tool and the Scheduler server. This section covers the steps for the Scheduler server.
Select Use Kerberos.
In the box Server Principal enter the “Service Principal Name” (SPN) used to create the keytab file. That is, the SPN with the Fully Qualified Domain Name (FQDN) of the server where the Scheduler Server is running. For example, “HTTP/denodo-prod.subnet1.contoso.com@CONTOSO.COM”.
Note
It is not necessary for the SPN of the Scheduler server to start with
HTTP
. This is only mandatory when web browsers request a Kerberos ticket, as they do it for the service “HTTP/<host name of the URL you are accessing>”. Here, the Kerberos ticket is requested by the Scheduler administration tool, not by the browser. However, in the case of the Scheduler administration tool, it is mandatory that the SPN starts withHTTP
, since in this case the browser requests the Kerberos ticket. Therefore, on the Scheduler server you could use a SPN such as “SCHED/denodo-prod.subnet1.contoso.com@CONTOSO.COM”, if you wish.In the box Keytab file enter the path to the keytab file.
Leave the Configuration file box empty unless the host where this Scheduler server runs does not belong to a Kerberos realm (e.g., a Windows Active Directory domain). If this host does not belong to a Kerberos realm, do one of the following:
Enter the path to the
krb5.conf
orkrb5.ini
file with the Kerberos settings.Or follow the steps described in the appendix Enabling Kerberos Authentication Without Joining a Kerberos Realm of the Installation Guide.
Note
This is the only setting you have to configure to use Kerberos authentication in the Virtual DataPort data sources
We recommend selecting the check box Activate Kerberos debug mode the first time you set up Kerberos in case you run into any issues. Once Kerberos has been set up, disable this.
When this option is enabled, the Java Runtime Environment logs messages related to Kerberos in the standard output but not in the log files. To see these messages, launch the Scheduler server from the command line with the script
<DENODO_HOME>/bin/scheduler_startup
.Important
In Windows environments, you must redirect the server’s output to a file. Otherwise, you will not be able to check the log messages related to Kerberos.
To redirect these debug messages to a file, start the Server like this:
For Windows:
cd <DENODO_HOME> cd bin scheduler_startup.bat > scheduler_kerberos_logging.log 2>&1
For Linux:
cd <DENODO_HOME> cd bin ./scheduler_startup.sh > scheduler_kerberos_logging.log 2>&1
Changes in this parameter requires the server to be restarted to take effect.
Database¶
Database Metadata Settings¶
Denodo Scheduler comes with an out-of-the-box embedded database (Apache Derby) where stores all of its metadata (projects, jobs, etc.). Nevertheless, you can specify another database to store the metadata.
Note
This section explains how to configure the metadata database through the Scheduler Web Administration Tool. If you want to configure it manually, see instructions in How to Manually Configure the Metadata Database.
Note
This option is disabled when accessing Scheduler from Agora since Agora automatically configures an external database.
As a first step, we suggest creating a schema in the metadata database exclusively dedicated to store all the data related to the Scheduler 9. While not mandatory, it is strongly recommended to keep things better organized and avoid naming conflicts.
As for the recommended size for the external database, it is difficult to give an exact number, since it is very variable depending on the specific metadata and the number of job executions. Each job execution generates a report, so the more executions, the more reports and the more size. The size of the reports depends on the number of errors during the job execution, the number of loading and indexing processes, the number of parameters (if the job is parameterized), etc. As a recommendation, reserve 5GB of space for the database (in most cases, it should be enough with less space, but we recommend a high value to avoid data corruption in case of lack of space).
How To Configure The External Metadata Database¶
There is an automatic mode (if you like, you can still follow the old mode) to create the schemas and tables for the external metadata database, by following the steps below:
Make a backup of the
<DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties
file, just in case you end up with an unusable Scheduler server and you need to restore the default database settings.Use the Database Metadata Settings form to configure the settings for the new database. You have to provide the following information:
Database Name. Name of the JDBC adapter to be used to access the relational database. In section Extensions, the adapters distributed with Denodo Scheduler are discussed as well as how new ones can be added. When selecting an adapter, the connection URI, name of the driver class, and classpath fields are filled in automatically. In the case of a connection URI, a connection template for the database will appear which should be modified according to the remote server that you want to access. You can choose among these databases (which are a subset from the drivers shown in Drivers):
Amazon Aurora (MySQL)
Amazon Aurora (PostgreSQL)
Azure SQL Server
Derby (embedded or server)
MySQL
Oracle
PostgreSQL
SQL Server. Select this when using the jTDS or Microsoft driver for SQL Server.
Note
The minimum version required for of each database is: Derby 10.15.1.3, MySQL 5.7, Oracle 12c, PostgreSQL 11 and Microsoft SQL Server 2014.
Connection URI. Database access URI.
Driver class name. Name of the Java class of the JDBC adapter to be used.
Driver class path. Path for the folder that contains the JAR files with the implementation classes needed by the JDBC adapter.
Authentication. Select the authentication method for accessing the external database. You can choose among these ones:
Login and Password. Use the provided Username (optional) and Password (optional).
Kerberos with password. Use Kerberos authentication, with the provided Username and Password (from the Active Directory account).
Kerberos with keytab. Use Kerberos authentication, with the provided Username (in this case, the Service Principal Name - SPN -) and the uploaded Keytab file (no password needed).
Kerberos with Windows User. Use Single Sign-On (SSO) with Kerberos, doing pass-through with the user that launched the server (no user name or password needed).
Note
Kerberos authentication options are only available when the database is Oracle or SQLServer.
Optionally, export your current metadata (all your projects and elements) by checking the parameter Export metadata, so that you can later import them in the new configured database. This is recommended if you want to keep your current metadata in the new database (otherwise, it will be lost after the database is changed).
Note
If there is any error (any setting is bad configured or the connection fails against the new database), you will receive an alert message informing you about this situation and warning you that if you accept the changes your Scheduler server could not be started any more. Nevertheless, if you end up with an unusable Scheduler server, you can restore the default database settings by restoring the backup of the
<DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties
file from step 4.Restart the Scheduler server. From now on,
Scheduler is running against the new configured database.
The metadata tables used by Scheduler are automatically created in the new configured database.
Note
In case of failure (for instance, the user configured in Scheduler has no permissions to create tables in the configured database), full .sql scripts for each supported DBMS are provided at
<DENODO_HOME>/scripts/scheduler/sql
.If you exported the metadata in step 2, import it in the new database (as explained in Import).
Consider enabling the high availability features of the external database to meet higher uptime requirements.
How To Configure The External Metadata Database (deprecated)¶
This way of configuring the external metadata database is deprecated, and the How To Configure The External Metadata Database is the recommended one. Still, you can use this mode by following next steps:
Execute the scripts from
<DENODO_HOME>/scripts/scheduler/sql
to create the metadata tables used by Scheduler in the new database. There are two script files for each supported database:First, load
tables_<db_name>.sql
Then, load
drivers_metadata.sql
For instance, if you want Oracle 12 to be the new database for the Scheduler metadata, you need to connect to the desired Oracle 12 instance and execute the scripts
tables_oracle12.sql
anddrivers_metadata.sql
located in<DENODO_HOME>/scripts/scheduler/sql/oracle
.Similar steps should be done with any other of the supported databases.
Copy the drivers of the new database to
<DENODO_HOME>/lib/scheduler-extensions
.For instance, if you want Oracle 12 to be the new database for the Scheduler metadata, copy only the files ojdbc6.jar and orai18n.jar from
<DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-12c
to<DENODO_HOME>/lib/scheduler-extensions
.Similar steps should be done with any other of the supported databases.
Make a backup of the
<DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties
file, just in case you end up with an unusable Scheduler server and you need to restore the default database settings.Restart the Scheduler server.
Use the Database Metadata Settings form to configure the settings for the new database. You have to provide the following information:
Database Name. Select the database name which corresponds with the configured parameters. You can choose among these databases:
Amazon Aurora (MySQL)
Amazon Aurora (PostgreSQL)
Azure SQL Server
Derby (embedded or server)
MySQL
Oracle
PostgreSQL
SQL Server. Select this when using the jTDS or Microsoft driver for SQL Server.
Connection URI. Database access URI.
Driver class name. Name of the Java class of the JDBC adapter to be used.
Authentication. Select the authentication method for accessing the external database. You can choose among these ones:
Login and Password. Use the provided Username (optional) and Password (optional).
Kerberos with password. Use Kerberos authentication, with the provided Username and Password (from the Active Directory account).
Kerberos with keytab. Use Kerberos authentication, with the provided Username (in this case, the Service Principal Name - SPN -) and the uploaded Keytab file (no password needed).
Kerberos with Windows User. Use Single Sign-On (SSO) with Kerberos, doing pass-through with the user that launched the server (no user name or password needed).
Note
Kerberos authentication options are only available when the database is Oracle or SQLServer.
Optionally, export your current metadata (all your projects and elements) by checking the parameter Export metadata, so that you can later import them in the new configured database. This is recommended if you want to keep your current metadata in the new database (otherwise, it will be lost after the database is changed).
Note
If there is any error (the jar files are not copied to the right folder, any setting is bad configured or the metadata tables are not already created in the new database), you will receive an alert message informing you about this situation and warning you that if you accept the changes your Scheduler server could not be started any more. Nevertheless, if you end up with an unusable Scheduler server, you can restore the default database settings by restoring the backup of the
<DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties
file from step 4.Restart the Scheduler server again.
From now on, Scheduler is running against the new configured database.
If you exported the metadata in step 5, import it in the new database (as explained in Import).
Warning
Never start a non-clustered instance (see next section to know how to configure clustering) against the same set of database tables that any other instance is running against. You may get serious data corruption, and will experience erratic behavior.
How to Limit the Number of Connections to the Database¶
Denodo Scheduler uses a database connection pool to connect to the metadata database.
For each Scheduler server, it is possible to configure the maximum and the initial number of connections to the external metadata database by manually setting the following properties in the <DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties
file:
ComplexDataSource/maxConnections. The maximum number of database connections that can be created in the pool of connections. Its default value is 100. According to Quartz’s documentation, it is recommended to be at least the number of worker threads in the thread pool plus three. You may need additional connections if you are making frequent calls to the Scheduler API.
ComplexDataSource/initialSize. The initial number of database connections that are created in the pool of connections when Scheduler starts. Its default value is 2.
Note
Currently, there is no graphical support to configure these settings.
Cluster Settings¶
Clustering works by having each node of the cluster share the same database. So, when enabling this feature you also have to configure the Scheduler server to use the common database (see Database Metadata Settings). As can be seen in the warning of the previous section, you should never configure a non-clustered server to use the same database as any other server, so you must not configure a common database without enabling the clustering features (note that the changes will be effective after restarting the server).
You can configure the following parameters:
Cluster. Enable it in order to turn on clustering features. As stated in the previous subsection, this property must be set to “true” if you are having multiple instances of Scheduler using the same set of database tables.
Node identifier. Can be any string, but must be unique for each Scheduler server in the cluster. You may use the value “AUTO” as the node identifier if you wish the identifier to be generated for you.
Note
When the Node Identifier is “AUTO”, the auto-generated name can be greater than the allowed number of characters (45) to store in database, so a Data Truncation exception would be thrown in that case. If so, set a fixed name (unique for each server in the cluster) with the appropriate size.
Check interval. Sets the frequency (in milliseconds) at which this instance checks-in with the other instances of the cluster. Affects the quickness of detecting failed instances. Recommended values are, for instance, 2000 or 3000 milliseconds.
These changes will take effect the next time the Scheduler server restarts.
Warning
If you run clustering on separate machines (i.e. if you are not running all the Scheduler server instances in the same machine), the clocks of all the machines where the instances of the Scheduler server are running must be synchronized using some form of time-sync service (daemon) that runs very regularly. The time-sync service used must assure that the clocks are within a second of each other. Otherwise, various strange behaviors can result, including nodes presuming other nodes have crashed and performing fail-over (duplicate firing) of jobs. To synchronize the clocks of all nodes in the cluster use a time service such as NTP (see also the NIST Internet time service).
Configuring a cluster of Scheduler servers is the key for a High Availability (HA) scenario and provides the following features:
Load-balancing: occurs automatically, with each node of the cluster firing jobs as quickly as it can. When a trigger’s firing time occurs, the first node to acquire it (by placing a lock on it) is the node that will fire it. Only one node will fire the job for each firing. It will not necessarily be the same node each time.
Fail-over: occurs when one of the nodes fails while in the midst of executing one or more jobs. When a node fails, the other nodes detect the condition and identify the jobs in the database that were in progress within the failed node and they are re-executed by the remaining nodes. Notice that the complete job would be re-executed, so if it has some exporters that generate files, they might be duplicated (unless they are configured to be overwritten). Besides, the recommendation is to export these files to a shared drive to be accessible to any server in the cluster.
Server configuration settings (the ones explained in this section) can be synchronized among servers by using the Import/Export feature (see Import/Export Backups and Use of the Import/Export Scripts for Backup) and including the configuration.
How to Add a Scheduler Server to a Cluster¶
If you have a Scheduler server that you want to add to a cluster, you will have to perform the following steps:
On the Scheduler server you want to add to the cluster, go to Cluster Settings and enable clustering.
Stop and start Scheduler server so it starts in cluster mode.
Modify Scheduler Database Metadata Settings to use the same database that use the other servers that are in the cluster.
All the nodes must be configured with the same encryption key. See Installation Encryption Key for more details.
Note
Always change database after the server is in cluster mode. You must never configure a non-clustered server to use the same database as any other server.
Important
Enabling clustering without the same encryption key in all nodes will make the Scheduler servers to not work properly and a metadata reset will be required.
How to Remove a Scheduler Server from Cluster¶
If you have configured a cluster of several Scheduler servers and you want to remove one of them from the cluster, you will have to perform the following steps:
Stop all the other Scheduler servers.
On the Scheduler server you want to remove from the cluster, go to Cluster Settings and disable clustering.
As the current database is configured like the other servers that are in the cluster, you will have to modify its configuration in order to not get corrupted metadata (you must never configure a non-clustered server to use the same database as any other server).
If this server is the last one in the cluster, you can keep the database configuration (as it is supposed to be the only server with that database configuration).
Start all Scheduler servers.
Scheduler HA Best Practices¶
If you have configured a cluster of several Scheduler servers, all of them must have access to the resources needed to execute the jobs (as any of the servers could execute any of the jobs).
Configure the jobs to export their generated files (in a CSV, SQL and/or custom exporters) to a shared drive, so that those files are accessible to any server in the cluster.
Modify the paths for uploaded files to point to a shared folder, so that the uploaded items can be accessible from any server in the cluster. In each server, edit the
<DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties
to modify the following property:CSVDataSourceStorer/csvFolder
: stores the CSV files uploaded by the users when creating CSV data sources.