USER MANUALS

VDPCache Extraction Section

To configure the extraction section for VDPCache-type jobs a VDP-type data source needs to be selected. Once selected, several load processes to preload the cache of VDP views may be created, by clicking on Add Load Process.

Additionally, it is possible to configure the number of load processes to run concurrently during the job execution (Concurrency Level).

When clicking on Add Load Process, you can create a new Cache Load Process, which allows you to perform incremental or full cache loads.

Note

Until the Denodo 8.0 20210209 Update it was known as Non-incremental load process, as it was only valid for full cache loads.

Cache Load Process

The configuration of each load process (Example of the extraction section of a VDPCache job) is similar to the extraction section of a VDP-type job (we could think of this job like a conglomerate of VDP-type jobs with the same data source). It consists of:

  • The VDP View to preload. All the fields from the selected view are projected. User can introduce view name. Views suggestions will appear in this combo box filtered by the input of the user. Views in this combo box are retrieved from the VDP data source configured for the job (all views from every database for which the user has permissions are listed, not only the ones from the database specified in the data source). Note that the corresponding VDP server must be running. It is also possible to select the view by clicking on the lens icon. It will show a popup, with a tree organized in a structure obtained from VDP, with databases and folders. Click desired database and folders to show views, check desired view and click OK button to select it.

Example of the view tree selector for cache view

Example of the view tree selector for VDPCache job

  • The Field Name. If you select a field from the view, then the condition fieldName > '@LAST_REFRESH_DATE' will be added to the query. The cache load will be incremental if this field is selected.

  • The Query Conditions. It is the where part of the parameterized queries already seen for VDP-type jobs. Query conditions are optional and there are two combo boxes to help the user to create them:

    • One to add fields of the selected view to the condition (mandatory fields are indicated). This combo is updated every time a new view is selected (if the VDP data source is accessible).

    • Another one to add operators to the condition. This combo contains all the possible operators that can be used in a condition.

    Query conditions can have parameters, and the way to define their sources and their level of concurrency is identical to the one seen for VDP jobs.

    You can use the variable '\@LAST_REFRESH_DATE' to reference the time of the last execution, when doing a incremental cache load.

    When saving the job configuration, the query conditions for every load process are validated against VDP. If there are errors, the names of the load processes with errors are shown in red, and the detailed information about each error is shown in the details of the corresponding load process.

Note

When any of the symbols ‘@’, ‘\’, ‘^’, ‘{‘, ‘}’ appear in the constant parts of Query Conditions, they must be escaped by the character ‘\’ (i.e. \@, \\, \^, \{, \}).

  • The query Context. Allows to specify properties for the CONTEXT clause of the query which is sent to VDP. For instance, you could specify 'simplify' = 'off' in order to not apply simplifications to the query when executing it. Note that the options 1 applicable to the CONTEXT clause that are configured through the rest of parameters cannot be specified here.

  • An option to Invalidate the cache before being loaded. This option can have one of these four values:

    1. None: do not invalidate the cache.

    2. All rows: when this option is selected the option 'cache_invalidate' = 'all_rows' is added to the CONTEXT clause of the query which is sent to VDP. In this case, all tuples from the cache are invalidated, regardless of the query.

    3. Matching rows: when this option is selected the option 'cache_invalidate' = 'matching_rows' is added to the CONTEXT clause of the query which is sent to VDP. In this case, only the tuples that match the rows returned by the query are invalidated from the cache. This is the default option for new jobs.

    4. Matching PK: when this option is selected the option 'cache_invalidate' = 'matching_pk' is added to the CONTEXT clause of the query that Scheduler sends to VDP. In this case, no rows are invalidated from the cache. VDP updates the tuples that match the primary key of the rows returned by the query and inserts the tuples that do not match. This option is available when the selected view has primary key.

  • An option to indicate if the processes of loading and invalidating the cache data (if configured, see previous point) are executed in the same transaction (Load / Invalidate data in a single transaction). If this option is checked, the option 'cache_atomic_operation = 'true' is added to the CONTEXT clause of the query which is sent to VDP. When it is not checked, the option 'cache_atomic_operation' = 'false' is added to the CONTEXT clause. By default, this option is checked.

  • An option to indicate if the cache should be populated even when a query finishes with errors (Populate the cache even when a query finishes with errors). If this option is checked, the option 'cache_load_on_error' = 'true' is added to the CONTEXT clause of the query which is sent to VDP. When it is not checked, the option 'cache_load_on_error' = 'false' is added to the CONTEXT clause. By default, this option is not checked.

Given the value of these five fields, the query sent to the VDP data source will be as follows:

SELECT *
FROM view
[ WHERE <conditions> ]
CONTEXT(
'cache_preload' = 'true',
'cache_wait_for_load' = 'true',
'cache_return_query_results' = 'false',
[ 'cache_invalidate' = { 'all_rows' | 'matching_rows' | 'matching_pk' },
'cache_atomic_operation' = { 'true' | 'false' }, ]
'cache_load_on_error' = { 'true' | 'false' } )

where

<conditions> ::= <fieldName> > '@LAST_REFRESH_DATE'
                            | <query_conditions>
                            | <fieldName> > '@LAST_REFRESH_DATE' AND <query_conditions>

As can be seen:

  • The query always indicates an explicit cache load ('cache_preload' = 'true').

  • The results of the query are not returned to Scheduler ('cache_return_query_results' = 'false') and that is why this job type has no Exporters section.

  • It waits to return the control to Scheduler until all the results of the query have been cached ('cache_wait_for_load' = 'true').

  • Optionally, the cache can be invalidated before being loaded ('cache_invalidate' = '{all_rows | matching_rows | matching_pk}') and the processes of loading and invalidating the cache can be executed in a single transaction or not ('cache_atomic_operation' = {'true' | 'false'}).

  • The cache can be populated even when a query finishes with errors or not ('cache_load_on_error' = {'true' | 'false'}).

There is also a field Load Process Name, where the user may specify a user-friendly name for each load process. This name will appear in the list of load processes to better identify them. By default, it is filled in with the name of the view to be preloaded (note that it is not possible to assign the same name to several load processes).

Example of the extraction section of a VDPCache job

Example of the extraction section of a VDPCache job

General Considerations

Since this type of job is a little bit different from the other extraction jobs (remember that the tuples returned by the queried views are not sent to Scheduler, so it has not Exporters section), the meaning of the values of the “Extracted” and “Processed” columns of the table of jobs is explained below:

  • The “Extracted” column shows the number of tuples extracted from the queried VDP views although they are not sent to Scheduler.

  • The “Processed” column shows the name of a virtual exporter called “CacheLoader”, and the number of tuples inserted in the cache of the queried VDP views.

Both columns show the aggregation of the values of each individual load process. If there are some warnings or errors, you can finally see the individual report for that load process. Besides, if the load process has been configured with a parameterized query which is executed several times, you can see the report for each parameter. Both columns, “Extracted” and “Processed”, also show the number of errors produced during each process, i.e., the number of extraction errors and the number of cache insertion errors, respectively (as in the rest of job types, if there are no errors, then the number of errors is not shown).

Note

When the connection to VDP is lost, the user can save the job as draft in order to not lose the work.

1

These properties are cache_preload, cache_wait_for_load, cache_return_query_results, cache_invalidate, cache_atomic_operation and cache_load_on_error.

Add feedback