General Architecture

Denodo Scheduler is a tool for time-based scheduling of automatic data extraction jobs from different data sources. In particular, it allows the configuration of different extraction jobs to be defined through its Web administration tool, persistently store this information, and plan the execution of these jobs against corresponding data servers as desired.

Denodo Scheduler allows extraction jobs to be defined against Denodo Virtual DataPort servers, and the obtained data can be exported in different formats to different repositories. It also allows jobs to preload data regularly in the Virtual DataPort cache.

On a general level and for all jobs, it is possible to configure your time-based scheduling (when and how often it should be executed), and for the extraction jobs the way in which the results obtained by the job will be exported. The available exporters are: - Dumping the final results in a database. - Indexing the final results in an indexing server. - Dumping the final results in a CSV-type file (it can also be used to generate MS-Excel compliant files).

It also allows the programmer to create new exporters for ad-hoc needs.

In the figure Denodo Scheduler Architecture the server’s basic architecture is shown. In addition to the jobs, Scheduler lets users define the data sources to be used by the jobs and the exporters. Denodo Scheduler allows data sources to be defined for Denodo Virtual DataPort servers, relational databases, delimited files, Scheduler Index servers and Elasticsearch servers.

When defining a job, it is possible to specify a query parameterized by a series of variables, along with the possible values for these variables, thus several queries are executed against the corresponding server.

Denodo Scheduler Architecture

Denodo Scheduler Architecture

The following briefly describes two typical examples of the use of Denodo Scheduler.

Example: data extraction and exportation

Suppose you want to periodically extract information from customers accessible via a Virtual DataPort view that only allows querying one customer at a time specifying its Tax ID, and returns as a response information of interest about the customer specified. The list of all the Tax IDs to be queried is available in an internal database accessible via JDBC. The set of data extracted must be dumped to another internal database also accessible via JDBC. The steps to be followed to carry out this job with Denodo Scheduler are as follows:

  1. Add a new JDBC-type data source to Scheduler to access the database that contains the Tax IDs of the required customers (see section JDBC Data Sources to find out how to add JDBC data sources).

  2. Add another new JDBC data source to Scheduler to access the database in which the extracted data will be dumped into.

  3. Add a new VDP-type data source to Scheduler (or use an existing one) to access the Virtual DataPort server containing the view to be queried (see section VDP Data Sources to find out how to add VDP data sources).

  4. Create a VDP-type job in Scheduler and configure it to use the VDP data source defined in step 3 (see section Configuring New Jobs). The VDP job will query the view for each different value specified for the Tax ID field. To get the different values of the Tax ID field, a query on the JDBC data source defined in step 1 will be used.

  5. Create a JDBC-type exporter for the VDP job (see section Exporters Section). This exporter will use the JDBC data source defined in step 2.

  6. Finally, configure the frequency with which you want to execute the job in Scheduler (see section Time-based Job Scheduling Section).

Example: data caching

Suppose you want to periodically refresh the cache of a Virtual DataPort view. The steps to be followed to carry out this job with Denodo Scheduler are as follows:

  1. Add a new VDP-type data source to Scheduler (or use an existing one) to access the Virtual DataPort server containing the view to be queried (see section VDP Data Sources to find out how to add VDP data sources).

  2. Create a VDPCache-type job in Scheduler and configure it to use the VDP data source defined the previous step (see section Configuring New Jobs). Configure the view whose cache should be refreshed and the different available caching options.

  3. Finally, configure the frequency with which you want to execute the job in Scheduler (see section Time-based Job Scheduling Section).