You can translate the document:

Goal

This document describes how to create a VDP type job in the Scheduler component of the Denodo Platform. It also covers the different features and options available for Scheduler by providing a step by step walkthrough.

Content

Scheduler is a component available in the Denodo Platform that simplifies the time based automation of certain administrative tasks like Cache Pre loading, Generating Statistics, Clean Cache database, etc. It can also be used as a simple, lightweight ETL tool, as it provides the option to Extract (collect data through VQL Queries), Transform (VDP Views and operations) and Load data either to Relational sources or flat files through exporters.

Scheduler Administration Tool is the User Interface used to create, modify and run jobs. These actions will be described in this document.

Scenario

Consider there is a base view “bv_hr_employees” created in Denodo Virtual DataPort and the requirement is to extract the data from the base view and export the resultant data into an external relational database through JDBC. Detailed steps to perform this activity has been described in this document.

Step 1: Configuring Virtual DataPort data sources in Denodo Scheduler

Data Sources in Denodo Scheduler are used to configure the extraction jobs to execute queries and to extract data from different sources where the data is located. It is managed under the Data Sources section of the Scheduler Administration Tool. You can configure the data source in Denodo Scheduler by following the below steps:

  • In the Scheduler Administration tool, click on Data sources > Add data source option and select VDP from the drop-down menu.

  • A new tab will be opened to create a VDP data source.

  • To create a connection to your Virtual DataPort instance, fill all required fields, for instance:

  • Data source name: ds_vdp
  • Connection URI : //localhost:9999/admin
  • Username: admin
  • Password: *****
  • Use Kerberos : disable
  • Query timeout (ms): 0
  • Chunk size: 100
  • Chunk timeout (ms): 90000
  • Enable pool: Enable
  • Initial size of the pool: 0
  • Maximum active connections in the pool: 30
  • Maximum idle connections in the pool: 20
  • Maximum time to wait for a connection from the pool (ms): -1
  • Validation query: select 1 from dual()
  • Test connections: Enable

  • Once the details are filled, Click on save, to save the data source created. You will notice a pop up notification stating the data source was created successfully.

Similarly, we can also create other data sources like JDBC, CSV etc;.

For this demo, we are configuring a JDBC data source to an Oracle database where we will export the data of our Scheduler job. Fill all required fields, for instance:

  • Project name: default
  • Data source name: ds_jdbc
  • Database adapter: Oracle 11g
  • Connection URI: jdbc:oracle:thin:@my-server:1521:xe
  • Username: myuser
  • Password: **********
  • Initial size of the pool: 0
  • Maximum active connections in the pool: 30
  • Maximum idle connections in the pool: 20
  • Maximum time to wait for a connection from the pool (ms): -1
  • Validation query: SELECT COUNT(*) FROM SYS.DUAL
  • Test connections: Enable

Ensure that you create an empty table where you would like to insert the exported data. For this demo, we have created an employees table with the same fields.

Step 2: Creating a VDP type Scheduler Job

Once the data source is created successfully,  you can create Scheduler jobs using this data source. For example, let’s create a VDP job that allows you to extract data from Denodo Virtual DataPort. To do this, you could follow the below steps:

  • Click on the Jobs option in the header menu bar.
  • Click ‘Add job’ and choose the VDP option from the drop down menu.
  • In the Details section you need to choose the project, set a job name as “employee_details” and description for the job.

  • Navigate to the Extraction Section tab, choose the VDP data source from the drop down which we configured as ‘ds_vdp’.

  • For the Parameterized query field you can specify the query you are going to run against the data source to extract the data. As the name indicates the query could be static, hardcoded values, or dynamic with parameters, prefixed with the ‘@’ character, in place of filtering or other operations.  
  • For example (hardcoded value): SELECT * from bv_hr_employees
  • Similarly you can specify the query with interpolation variables.

  • Navigate to the Exporters section tab, choose the exporter as JDBC from the drop-down list (which is configured in the Data Sources Management) where you need to provide all the required details like Data Source, Schema name, Table name, Error Management etc (Screenshot provided below)
  • Schema name: Specify the existing schema name where the table is stored. For    instance, the schema name is website_sys.
  • Table name: Specify an existing table where the tuples will be inserted. We will add the table name as sales_export.
  • Error Management: The Error management settings which allows for rollback settings when exportation error occurs. You can choose to do not rollback, rollback if there are errors, however, If Export as transaction is enabled, the Error Management options of each individual JDBC exporter are overridden and the entire exchange will be performed as a single transaction and rolled back if there is any error.
  • Delete table content: If the table used above has data. If you want the delete the content of the table before inserting the exported tuples, then you can check the Delete table content option
  • Update tuple if entry already exists in DB: If this option is checked, the JDBC exporter will update the tuples first instead of inserting it, the tuples that were not updated will be inserted.
  • Export job identifier, job name, project name and execution time fields: When selected, the project name, job name, job identifier and job start time among the fields for the exported tuples will be included.

        

        

  • Navigate to the Retry section tab and enable retries.
  • Under the “What to retry” option, select the specific option based on your requirement.
  • With “All queries if any error is returned” option, the whole job is executed again if the job is interrupted or if any error occurs during execution.
  • The option “Only queries which return an error”, retries only when there have been retryable errors in the execution of the job meaning it only repeats those queries that have returned an error or have not started to be executed.
  • We can also configure a job to retry when the Exported documents are less than the expected number by selecting “All queries if exported documents are less than” with a provided value. In this case, the whole job is repeated when the number of exported documents is less than the specified threshold.

  • Navigate to Handlers section and click on the “Add handler” option.
  • Select the Mail Handler and provide the required details like email addresses and select the appropriate option for “when to send” option.
  • In order to use the Mail Handler, it is necessary to configure email server under “Administration” > “Server” > “Mail Settings” option from the Scheduler Web Administration Tool.

  • In case you have imported a custom handler like “Email Exported Files Custom Handler”, “Denodo SFTP Exported Files Custom Handler” you can also see these options in the drop down when you click on “Add handler”. You can download these custom handlers from the Denodo Support site.

  • Navigate to the Trigger section tab and click on the “Add trigger” option.
  • You could either provide a cron expression directly under the “cron expression” option or provide the Cron value under Cron section manually. You could refer to the section Syntax for the Cron Expression of the Scheduler Administration Guide which explains in detail about each field. 
  • For example, to trigger the job “employee_details” to run every day at 12:00:00 PM, you could provide the cron expression as mentioned below:

  • Along with the cron expression, you can configure a dependent job over which the current job is dependent under the Dependencies tab in the Trigger section.

  • Under the dependencies section,you could set a Timeout in milliseconds and if you check the option "Execute this job even though any of its dependencies finish with error", the job will be executed having its dependencies finished successfully or not.

Once you provide all the necessary details, you could save the job and go to the job management section to see the created job.

Executing the Scheduler Job

You can execute the Scheduler job in different ways:

  • If you have configured the trigger section of the job by providing a cron expression, the job will be executed automatically based on the cron expression.

  • You can also execute the job manually using the start options available under the “Processed (Tuples/Errors)” column. To see these options, click on the three dots indicators in the Processed (Tuples/Errors) column of the job.

  • You can also make use of the Scheduler Client API to run a job programmatically. There is a client demo available in the folder "<DENODO_HOME>/samples/scheduler/scheduler-api" folder that contains many options which illustrates the Scheduler access. For instance, the option "-start" allows you to start the job in the Scheduler. You can use these scripts to start the jobs from third party Scheduler tools.

  • Starting from Denodo 8.0, Scheduler provides a new REST API option, using which you can be used to manage Denodo Scheduler jobs using REST Clients

References

Data Sources

Creating and Scheduling Jobs

Scheduler REST Client API

Scheduler RMI Client API

Disclaimer
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here