Introduction

The Denodo Technologies product suite provides advanced functionalities for the time-based scheduling of jobs integrating data from disperse and heterogeneous sources that may be poorly structured.

Denodo Scheduler allows the scheduling and execution of data extraction and integration jobs, as defined on a Denodo Virtual DataPort server.

In combination with the Denodo Virtual DataPort module, Denodo Scheduler allows scheduling any job that involves the collection of data from various dispersed and heterogeneous sources, combining such information and exporting it to different types of repositories. It can also be used to preload data regularly in the Virtual DataPort cache or to index data to be used from the Data Catalog. See the Virtual DataPort Administration Guide and Data Catalog Guide for further information.

The main features of Denodo Scheduler include:

  • Scheduling flexible batch jobs on Denodo Virtual DataPort.

  • Generation of detailed reports on the outcome of job execution, including detailed information on errors. The reports can be sent by e-mail to the addresses that have been configured.

  • The results obtained by a job can be exported to a CSV file, a database, or to an index. It also allows for the inclusion of new exporters developed for specific purposes.

  • Support for data extraction of sources with limited query capabilities. For example, consider a Web service or a Web site that allows information to be obtained on a particular company based on their Tax ID. It is possible to define a job that obtains the different Tax IDs of a database server or CSV file and queries the Web service or the Web site using each of them.

  • Persistent jobs. If you restart the system while a job is being executed, the job can continue its execution from the last query that was successfully executed. In the previous example, if the complete job obtains information from 1000 companies and the system restarts after the first 200 queries, after the start of the system you can launch the execution of the job in the state in which it ended, continuing from query 201 instead of starting again from the first one.

  • Transparent retries in the event of failure.

  • Possibility of configuring a parallel execution of the different queries involved in the same job.