Server Set-up

This configuration area is divided in several sections.

Server

Server Ports

The Scheduler server uses three port numbers for client communications: the server execution port, the server stoppage port, and the auxiliary port.

Important

When the Scheduler server and the Administration Tool or other clients are installed in different machines, you may need to change the interface where the server listens to connections. To change this, open the Denodo Control Center, open the “Configuration” dialog and change the “RMI Host” of the “JVM Options”. See more details in the section Denodo Platform Configuration of the Installation Guide. The reason for this is that in some environments, when listening to ‘localhost’, the server does not allow connections from remote hosts.

Note

Where the connection between customers and the Scheduler server has to go through a firewall, this must be configured to allow for access to the execution port and the auxiliary port.

The port changes will take effect the next time the Scheduler server restarts.


Threads Pool

The Scheduler server allows you to execute various extraction jobs simultaneously. Additionally, the VDP, VDPCache and VDPIndexer jobs allow the same job to run different queries concurrently on the same data source, varying the parameters.

You can specify the maximum number of jobs that the server will execute concurrently with the parameter Maximum number of concurrent jobs (by default 20). A change to the number of concurrent jobs will take effect the next time the Scheduler server is restarted.

With regards to VDP, VDPCache and VDPIndexer jobs, the Scheduler server uses a pool of reusable threads for managing the execution of multiple queries which the same job can generate. The parameters that can be configured are as follows:

  • Default number of threads (corePoolSize). This represents the number of threads in the pool from which the inactive threads are reused (20 by default). Whilst there are fewer threads than these in the pool, new threads will continue to be created. When a thread is requested and the number of threads in the pool is the same as or more than this value, inactive threads are returned, if they exist; otherwise, new threads will continue to be created until obtaining the value established by the following parameter. Intuitively, this parameter indicates the number of threads that the system should have active simultaneously in normal load conditions.

  • Maximum number of threads (maximumPoolSize). Represents the maximum number of pool threads (by default 60).

  • Keep alive timeout (milliseconds). Specifies the maximum time in milliseconds that an inactive thread stays in the pool, if the number of threads exceeds the total indicated in the Default number of threads (by default 0). If the value is 0, the threads created above this value end, once the execution of their job has been completed. Otherwise, the ones that exceed the time specified by this parameter will end.

This thread pool works this way:

  • The executor service has an internal queue, used for holding tasks before they are executed.

  • The Default number of threads parameter is the number of core threads that will be instantiated and kept in the pool.

  • When a new task is submitted,

    • if fewer than the Default number of threads threads are running, the Executor always prefers adding a new thread rather than queuing.

    • if the Default number of threads or more threads are running, the Executor always prefers queuing a request rather than adding a new thread. So:

      • if the internal queue is not full, the task is queued.

        If this task is in the queue more than the allowed time, it is dequeued and its execution is discarded.

      • If internal queue is full, a new thread is created unless this would exceed the Maximum number of threads, in which case, the task will be rejected.

        There is a rejection policy to reattempt the enqueue of the task a certain number of times, at different intervals.

      • The tasks will be waiting in the queue for a thread for a maximum amount of time.

Mail Settings

This section allows you to modify the properties of the outgoing mail server (Handler Section) used to send reports on the execution of jobs.

You can specify the following information:

  • SMTP server. Machine name in which the e-mail server runs.

  • Port. Port number in which the server runs. By default, 25.

  • Protocol. Protocol used to send the e-mails: SMTP or SMTPS.

  • From. The e-mail address used by Scheduler to send the mail.

  • Subject. The subject of the mail. The subject may contain variables, which will be replaced by their corresponding values at execution time (they take their value from the job data). The allowed variables are: projectName, projectID, jobName, jobID, jobStartTime and jobResult, with the format @{<var_name>}. It is possible to format jobStartTime variable by specifying output date pattern. The syntax is @{jobStartTime,outputDateFormat:yyyy-MM-dd}. As an example:

    Denodo Scheduler Notification - @{jobResult} [@{projectName}
    - @{jobName}_@{jobID}] at @{jobStartTime,outputDateFormat:yyyy-MM-dd HH:mm:ss}
    
  • Trust servers. If set to “*”, all hosts are trusted. If set to a whitespace separated list of hosts, those hosts are trusted. Otherwise, it depends on the certificate the server presents.

  • Enable TLS. If true, enables the use of the STARTTLS command (if supported by the server) to switch the connection to a TLS-protected connection before issuing any login commands. Note that an appropriate trust store must be configured so that the client will trust the server’s certificate. Defaults to false.

  • Enable debug. If true, it allows to trace the mail connection. In addition, it is necessary to add the following logger to the <DENODO_HOME>/conf/scheduler/log4j2.xml file:

    <Logger name="com.denodo.scheduler.core.commons.utils.MailUtil" level="DEBUG" />
    

Additionally, if the outgoing mail server requires authentication, user name (Username) and its password (Password) must be specified.

Once configured the mail properties, the link Test mail configuration allows you to test if it is properly configured. You have to specify a list of email addresses (separated by a “,”) a test mail will be sent to and click on Send email. If the configuration is right, those recipients will receive an email with a body informing that it is a test email.

Authentication

Virtual DataPort Settings

Denodo Scheduler servers can delegate the authentication of the users to a Denodo Virtual DataPort server (see section Authentication). Moreover, when Scheduler connects to Virtual DataPort to authenticate a user, it also retrieves its roles (which determine the user access rights in Scheduler).

In this way, Scheduler needs to know information about the Virtual DataPort server used for authentication and roles retrieval. This section allows specifying the following information:

  • Host. Machine name in which the Virtual DataPort server runs.

  • Port. Port number in which the Virtual DataPort server runs.

  • Database (optional). Name of the Virtual DataPort server database, which should be used for the authentication (useful when the database uses LDAP authentication).

Changes in any of these parameters require the user to logout and login again to take effect.

Note

As commented previously in this document, the non-local users, together with its roles, must be created in the Virtual DataPort server by an administrator user, and VDP-based authentication is only possible if connection with the configured Virtual DataPort server is possible (otherwise, only local-based authentication can take place).

Kerberos Settings

Important

To enable single sign-on (SSO), you have to enable Kerberos in both the Scheduler administration tool and the Scheduler server. This section covers the steps for the Scheduler server.

  1. Select Use Kerberos.

  2. In the box Server Principal enter the “Service Principal Name” (SPN) used to create the keytab file. That is, the SPN with the Fully Qualified Domain Name (FQDN) of the server where the Scheduler Server is running. For example, “HTTP/denodo-prod.subnet1.contoso.com@CONTOSO.COM”.

    Note

    It is not necessary for the SPN of the Scheduler server to start with HTTP. This is only mandatory when web browsers request a Kerberos ticket, as they do it for the service “HTTP/<host name of the URL you are accessing>”. Here, the Kerberos ticket is requested by the Scheduler administration tool, not by the browser. However, in the case of the Scheduler administration tool, it is mandatory that the SPN starts with HTTP, since in this case the browser requests the Kerberos ticket. Therefore, on the Scheduler server you could use a SPN such as “SCHED/denodo-prod.subnet1.contoso.com@CONTOSO.COM”, if you wish.

  3. In the box Keytab file enter the path to the keytab file.

  4. Leave the Configuration file box empty unless the host where this Scheduler server runs does not belong to a Kerberos realm (e.g., a Windows Active Directory domain). If this host does not belong to a Kerberos realm, do one of the following:

    1. Enter the path to the krb5.conf or krb5.ini file with the Kerberos settings.

    2. Or follow the steps described in the appendix Enabling Kerberos Authentication Without Joining a Kerberos Realm of the Installation Guide.

    Note

    This is the only setting you have to configure to use Kerberos authentication in the Virtual DataPort data sources

  5. We recommend selecting the check box Activate Kerberos debug mode the first time you set up Kerberos in case you run into any issues. Once Kerberos has been set up, disable this.

    When this option is enabled, the Java Runtime Environment logs messages related to Kerberos in the standard output but not in the log files. To see these messages, launch the Scheduler server from the command line with the script <DENODO_HOME>/bin/scheduler_startup.

    Important

    In Windows environments, you must redirect the server’s output to a file. Otherwise, you will not be able to check the log messages related to Kerberos.

    To redirect these debug messages to a file, start the Server like this:

    For Windows:

    cd <DENODO_HOME>
    cd bin
    scheduler_startup.bat > scheduler_kerberos_logging.log 2>&1
    

    For Linux:

    cd <DENODO_HOME>
    cd bin
    ./scheduler_startup.sh > scheduler_kerberos_logging.log 2>&1
    

Changes in this parameter requires the server to be restarted to take effect.

Database

Database Metadata Settings

Denodo Scheduler comes with an out-of-the-box embedded database (Apache Derby) where stores all of its metadata (projects, jobs, etc.). Neverthless, you can specify another database to store the metadata, by following the steps below:

  1. Create a new schema in the metadata database to store all the data related to the Scheduler. This is not a mandatory step, but it is recommended, to keep things better organized and avoid naming conflicts.

    • As for the recommended size for the external database, it is quite difficult to give an exact number, since it is very variable depending on the specific metadata and the number of job executions. Each job execution generates a report, so the more executions, the more reports and the more size. The size of the reports depends on the number of errors during the job execution, the number of loading and indexing processes, the number of parameters (if the job is parameterized), etc. As a recommendation, reserve 5GB of space for the database (in most cases, it should be enough with less space, but we recommend a high value to avoid data corruption in case of lack of space).

  2. Execute the scripts from <DENODO_HOME>/scripts/scheduler/sql to create the metadata tables used by Scheduler in the new database. There are two script files for each supported database:

    • First, load tables_<db_name>.sql

    • Then, load drivers_metadata.sql

    For instance, if you want Oracle 12 to be the new database for the Scheduler metadata, you need to connect to the desired Oracle 12 instance and execute the scripts tables_oracle12.sql and drivers_metadata.sql located in <DENODO_HOME>/scripts/scheduler/sql/oracle-12c.

    Similar steps should be done with any other of the supported databases.

  3. Copy the drivers of the new database to <DENODO_HOME>/lib/scheduler-extensions.

    For instance, if you want Oracle 12 to be the new database for the Scheduler metadata, copy only the files ojdbc6.jar and orai18n.jar from <DENODO_HOME>/lib/extensions/jdbc-drivers/oracle-12c to <DENODO_HOME>/lib/scheduler-extensions.

    Similar steps should be done with any other of the supported databases.

  4. Make a backup of the <DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties file, just in case you end up with an unusable Scheduler server and you need to restore the default database settings.

  5. Restart the Scheduler server.

  6. Use the Database Metadata Settings form to configure the settings for the new database. You have to provide the following information:

    • Database Name. Select the database name which corresponds with the configured parameters. You can choose among these ones:

      • Amazon Aurora (MySQL)

      • Amazon Aurora (PostgreSQL)

      • Azure SQL Server

      • Derby (embedded or server)

      • MySQL

      • Oracle

      • PostgreSQL

      • SQL Server. Select this when using the JTDS or Microsoft driver for SQL Server.

      Note

      Denodo Scheduler has been tested with the following databases: Derby 10, MySQL 5.6, MySQL 5.7, Oracle 11g, Oracle 12c, PostgreSQL 9.5 and SQL Server 2014.

    • Connection URI. Database access URI.

    • Driver class name. Name of the Java class of the JDBC adapter to be used.

    • Authentication. Select the authentication method for accessing the external database. You can choose among these ones:

      • Login and Password. Use the provided Username (optional) and Password (optional).

      • Kerberos with password. Use Kerberos authentication, with the provided Username and Password (from the Active Directory account).

      • Kerberos with keytab. Use Kerberos authentication, with the provided Username (in this case, the Service Principal Name - SPN -) and the uploaded Keytab file (no password needed).

      • Kerberos with Windows User. Use Single Sign-On (SSO) with Kerberos, doing pass-through with the user that launched the server (no user name or password needed).

      Note

      Kerberos authentication options are only available when the database is Oracle or SQLServer.

    • Optionally, export your current metadata (all your projects and elements) by checking the parameter Export metadata, so that you can later import them in the new configured database. This is recommended if you want to keep your current metadata in the new database (otherwise, it will be lost after the database is changed).

    Note

    If there is any error (the jar files are not copied to the right folder, any setting is bad configured or the metadata tables are not already created in the new database), you will receive an alert message informing you about this situation and warning you that if you accept the changes your Scheduler server could not be started any more. Nevertheless, if you end up with an unusable Scheduler server, you can restore the default database settings by restoring the backup of the <DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties file from step 4.

  7. Restart the Scheduler server again.

    1. From now on, Scheduler is running against the new configured database.

  8. If you exported the metadata in step 6, import it in the new database (as explained in Import).

Warning

Never start a non-clustered instance (see next section to know how to configure clustering) against the same set of database tables that any other instance is running against. You may get serious data corruption, and will experience erratic behavior.

Cluster Settings

Clustering works by having each node of the cluster share the same database. So, when enabling this feature you also have to configure the Scheduler server to use the common database (see Database Metadata Settings). As can be seen in the warning of the previous section, you should never configure a non-clustered server to use the same database as any other server, so you must not configure a common database without enabling the clustering features (note that the changes will be effective after restarting the server).

You can configure the following parameters:

  • Cluster. Enable it in order to turn on clustering features. As stated in the previous subsection, this property must be set to “true” if you are having multiple instances of Scheduler using the same set of database tables.

  • Node identifier. Can be any string, but must be unique for each Scheduler server in the cluster. You may use the value “AUTO” as the node identifier if you wish the identifier to be generated for you.

    Note

    When the Node Identifier is “AUTO”, the auto-generated name can be greater than the allowed number of characters (45) to store in database, so a Data Truncation exception would be thrown in that case. If so, set a fixed name (unique for each server in the cluster) with the appropriate size.

  • Check interval. Sets the frequency (in milliseconds) at which this instance checks-in with the other instances of the cluster. Affects the quickness of detecting failed instances. Recommended values are, for instance, 2000 or 3000 milliseconds.

These changes will take effect the next time the Scheduler server restarts.

Warning

If you run clustering on separate machines (i.e. if you are not running all the Scheduler server instances in the same machine), the clocks of all the machines where the instances of the Scheduler server are running must be synchronized using some form of time-sync service (daemon) that runs very regularly. The time-sync service used must assure that the clocks are within a second of each other. Otherwise, various strange behaviors can result, including nodes presuming other nodes have crashed and performing fail-over (duplicate firing) of jobs. To synchronize the clocks of all nodes in the cluster use a time service such as NTP (see also the NIST Internet time service).

Configuring a cluster of Scheduler servers is the key for a High Availability (HA) scenario and provides the following features:

  • Load-balancing: occurs automatically, with each node of the cluster firing jobs as quickly as it can. When a trigger’s firing time occurs, the first node to acquire it (by placing a lock on it) is the node that will fire it. Only one node will fire the job for each firing. It will not necessarily be the same node each time.

  • Fail-over: occurs when one of the nodes fails while in the midst of executing one or more jobs. When a node fails, the other nodes detect the condition and identify the jobs in the database that were in progress within the failed node and they are re-executed by the remaining nodes. Notice that the complete job would be re-executed, so if it has some exporters that generate files, they might be duplicated (unless they are configured to be overwritten). Besides, the recommendation is to export these files to a shared drive to be accessible to any server in the cluster.

Server configuration settings (the ones explained in this section) can be synchronized among servers by using the Import/Export feature (see Import/Export Backups and Use of the Import/Export Scripts for Backup) and including the configuration.

How to Remove a Scheduler Server from Cluster

If you have configured a cluster of several Scheduler servers and you want to remove one of them from the cluster, you will have to perform the following these steps:

  • Stop all the other Scheduler servers.

  • On the Scheduler server you want to remove from the cluster, go to Cluster Settings and disable clustering.

  • As the current database is configured like the other servers that are in the cluster, you will have to modify its configuration in order to not get corrupted metadata (you should never configure a non-clustered server to use the same database as any other server).

    • If this server is the last one in the cluster, you can keep the database configuration (as it is supposed to be the only server with that database configuration).

  • Start all Scheduler servers.

Scheduler HA Best Practices

If you have configured a cluster of several Scheduler servers, all of them must have access to the resources needed to execute the jobs (as any of the servers could execute any of the jobs).

  • Configure the jobs to export their generated files (in a CSV, SQL and/or custom exporters) to a shared drive, so that those files are accessible to any server in the cluster.

  • Modify the paths for uploaded files to point to a shared folder, so that the uploaded items can be accessible from any server in the cluster. In each server, edit the <DENODO_HOME>/conf/scheduler/ConfigurationParameters.properties to modify the following property:

    • CSVDataSourceStorer/csvFolder: stores the CSV files uploaded by the users when creating CSV data sources.