How to detect changes in data sources

Applies to: Denodo 8.0 , Denodo 7.0 , Denodo 6.0
Last modified on: 22 May 2020
Tags: Data Governance Stored procedures

Download document

You can translate the document:

Goal

This document describes how to detect changes in data sources combining the GET_SOURCE_CHANGES and GET_VIEWS stored procedures and how to automate the detection using Denodo Scheduler. This functionality can be used from the Denodo VDP Administration Tool clicking on the Source Refresh button of any base view.

Content

Getting the modified fields in every base view

Important note: The instructions provided below have been tested in Denodo 7.0 and 8.0 versions. Please review the section: “Compatibility with Denodo 6.0” to get alternatives for the Denodo 6.0 version.

Denodo provides a stored procedure called GET_SOURCE_CHANGES that detects the differences between the current schema of a base view and its underlying data source. The syntax is:


GET_SOURCE_CHANGES (database_name : text, table_name : text)

        
This stored procedure returns a row for each field of the view and also a row for each field that is present in the data source, but not in the base view.  If a field is an array or a register, there will also be a row for each one of its subfields.

Sometimes, if the virtual database has a lot of base views and the data source schemas are changing quite frequently, the manual execution of this stored procedure for every base view can be cumbersome.


It is possible to combine the
GET_SOURCE_CHANGES and GET_VIEWS stored procedures to detect which fields of the views have changed.

You can create a new view database_change that will show the modified fields in all the base views of your VDP server. The following VQL can be executed to create that database_change view:

CREATE VIEW database_change AS

SELECT source_change.db_name AS db_name, source_change.table_name AS table_name, source_change.field AS field, source_change.type AS type, source_change.old_type AS old_type, source_change.modification AS modification, source_change.depth AS depth

FROM GET_VIEWS()  AS base_view

NESTED INNER JOIN GET_SOURCE_CHANGES() AS source_change ON (base_view.input_view_type =0 and base_view.database_name = source_change.db_name AND base_view.name = source_change.table_name)  

WHERE source_change.modification <> '';

VQL Explanation:

The next steps explain the operations done in the database_change view that has just been created executing the VQL:

  1. The goal is to execute the procedure GET_SOURCE_CHANGES() for any base view and get the fields that have been modified from the output of that procedure.

  1. A NESTED INNER JOIN is performed between the GET_VIEWS() and GET_SOURCE_CHANGES() stored procedures in order to ensure one execution of the GET_SOURCE_CHANGES per row in the GET_VIEWS() result. The join conditions are: base_view.database_name = source_change.db_name AND base_view.name = source_change.table_name.

  1. In order to get just the base views from the GET_VIEWS() stored procedure, a base_view.input_view_type =0 condition is added.

  1. In addition to this, a filter is added to show only the modified fields. There is a field in the new view called “modification”. This field describes the change detected in the field, for example:

  1. If “modification” is empty for a field it means that the field has not been modified. To add this filter, the following condition has been added to the VQL: source_change.modification <> ''.

  1. Below an example of the output of the base view created with the VQL attached:

Automating the detection of the changes in Scheduler


Once we have a view that returns the list of changes in the data sources of Virtual DataPort database, we can create a Scheduler job to automate the detection of these changes. Go to the Scheduler Administration Tool and follow these steps:

  1. Create a new job of type “VDP”.

  1. Fill the Extraction section as follows:
  1. In the data source field, select a data source that points to the Virtual DataPort database where the ‘database_change’ view has been created.
  2. In the “Parameterized query” field, use the following query

SELECT table_name, field, type, old_type, modification, depth

FROM database_change

 

  1. Go to the Exporters section and add a CSV exporter. Configure it as follows:
  1. Use the following as Output file name:  @{field:tablename}.csv.
  2. Select UTF-8 as encoding.
  3. Select “Overwrite old file”.
  4. Use comma (,) as separator.
  5. Select the “Include header” option.
  6. Leave all the other  parameters with their default values.

  1. To receive email notifications that include the CSV file generated in the previous step a Denodo Connects tool called “Denodo Custom Handle Email Exported Files” can be used. This plug-in can be found in the Denodo Support Site under the Resources > Denodo Connects section. Follow the instructions in the Custom Handler Email Exported Files - User Manual to install it.

In order to send emails, Denodo Scheduler needs to be also properly configured with the values of an email server. You can find more details in the Mail configuration section of the Scheduler Administration Guide.

  1. Once the plugin is installed go to the Handlers section of the job and add a new “email-exported-files” handler. Configure it as follows:
  1. In the Emails test box, add the values of the emails that will receive the notification separated by commas.
  2. Add Values for the Subject, Body and Footer according to the notification emails.
  3. Check Write File in Email.
  4. Check Use HTML in Email content.
  5. Leave the other options as default .

  1. In the Triggers section specify the time expression that sets when the analysis will be executed. It uses a UNIX “cron expression”.  You can find more details on how to configure this screen in the Time-based Job Scheduling section of the Scheduler Administration Guide.

  1. Save the job. It can be executed manually by clicking on the “Play” icon next to it. If there is any change an email outlining the changes similar to this will be sent:

Compatibility with Denodo 6.0

The Denodo 6.0 version does not include the GET_SOURCE_CHANGES and GET_VIEWS stored procedures. As an alternative, you can use SOURCE_CHANGES and CATALOG_VIEWS stored procedures. Keep in mind that the syntax for using these procedures may be different from the one explained in the document.

In addition, the procedures available in Denodo 6.0 only return results from the views of the database where the query is executed. If you want to automatically detect changes for different databases, you need to create different data sources in Scheduler pointing to every database and then create a new job for each data source.

References

Virtual DataPort VQL Guide: GET_SOURCE_CHANGES

Virtual DataPort VQL Guide: GET_VIEWS

Scheduler Administration Guide: Server Configuration - Mail configuration

Scheduler Administration Guide: Time-based Job Scheduling Section

Denodo Connects: Denodo Distributed File System Custom Wrapper - User Manual

Virtual DataPort VQL Guide: CATALOG_VIEWS (Denodo 6.0)

Virtual DataPort VQL Guide: SOURCE_CHANGES (Denodo 6.0)

Questions

Ask a question
You must sign in to ask a question. If you do not have an account, you can register here

Featured content

DENODO TRAINING

Ready for more? Great! We offer a comprehensive set of training courses, taught by our technical instructors in small, private groups for getting a full, in-depth guided training in the usage of the Denodo Platform. Check out our training courses.

Training