Configuring the Cache of a View¶
Before enabling the cache in a specific view, you have to enable the cache module in the Server (see section Configuring the Cache).
To enable the cache of a view, open the view, click Options and select one of the following cache modes:
Partial. When a query is executed against the view, the Server checks if the cache contains the data required to answer the query. If not, it queries the data source directly.
Full. The Server assumes that the cache contains all the tuples of the view. Therefore, queries on this view will always use the cached data and will never “hit” the data source. The data source will be only accessed for refreshing cache data. The cache is always loaded explicitly, meaning that you need to execute query to load the cache.
Full incremental. With this mode, the queries to this view will be “incremental queries”. That is, the queries to this view merge the results obtained from the cache with the most recent data retrieved from the source. The main benefit of this mode is that the queries will always return fully up to date results without needing to retrieve the full result set from the data source, just the rows that are added/changed since the last cache refresh.
To be able to use this mode, the view has to have a field that contains when a row was last inserted/updated. For example, if the view has a field
last_modified_date
, you have to enter a condition likelast_modified_date > '@LAST_REFRESH_DATE'
.When executing the query, the variable
@LAST_REFRESH_DATE
is replaced with the timestamp of the last time a query to load the cache was executed and finished successfully.As it is a variant of the full mode, unlike with partial mode, the cache of the view still needs to be loaded explicitly (i.e. you have to execute a query with the appropriate parameters to load the cache).
Warning
The incremental mode has several requirements and limitations that you need to be aware of. The section Incremental Mode lists them.
The section Cache Modes explains in detail how these cache modes work.
Time to Live (seconds) is the Time to live in seconds of the entries stored in the view’s cache. Once this limit is reached, these entries expire and the next query will retrieve the data from the data source, instead of from the cache. If you select Never expire, the Server will never invalidate the entries stored in the cache. However, you still can invalidate the cache manually by clicking on Invalidate cache.
The Default time to live is the Time to live defined for the database in the “Cache configuration” dialog of the database (Administration > Database Management dialog). If the database does not set the “Time to live”, the cache module uses the global value of this property, which is defined in the “Cache” tab of the Server Configuration dialog.
The rows obtained from the source are inserted in cache in batches. The Batch insert size (rows) determines the number of rows per batch. The Default value is the Batch size defined for the database in the “Cache configuration” dialog of the database (Administration > Database Management dialog). If the database does not set the “Batch size”, the cache module uses the global value of this property, which is defined in the “Cache” tab of the Server Configuration dialog.
The entries of the cache of a view expire when they reach their Time to live. However, you can invalidate these entries manually. To do it, click on Invalidate cache and select the following options:
Invalidate all. Remove all the cache content for the view.
Invalidate partially. Remove only the entries that match the condition in the Edit condition dialog.
Invalidate on cascade. If selected, the cache module also invalidates the cache of the subviews of this view. This option is disabled for views with “Full” cache mode. In addition, if you invalidate on cascade a view, the cache of the subviews whose cache mode is “Full” will not be invalidated.
See the section Invalidate Cache for information about invalidating the cache content of more than one view at a time.
Cache Indexes¶
When enabling the cache for a view, the Cache Module creates a table in the cache’s database that will store the cached data of this view. You can define “Cache indexes” for this view to speed up the retrieval of data from the view’s cache.
The difference between the cache indexes and the indexes defined in the “Indexes” tab is that the latter ones represent the indexes defined in the source. The cache module actually creates the “Cache indexes” in the cache’s database, when they are defined in this tab.
When the cache is enabled for a view and its cache mode is “Full”, the indexes propagated to its derived views and to the ODBC and JDBC clients, are the “Cache indexes” and not the indexes defined in the “Indexes” tab. That is because the queries of the view will always retrieve the data from the cache and not from the source.
To create a cache index, follow these steps:
If the view has indexes (see the “Indexes” tab), you can replicate them in the table that will store the view’s cached data, by clicking on Import indexes from view.
Note that the indexes will not be created until you click Ok to close the dialog.
To define more indexes in the cache table, click Create index and follow these steps:
Enter the name of the index.
Note
Depending on the database you are using to store the cache, the name of the index may need to be unique across the entire Virtual DataPort database and not just the view you are editing. That is because some databases do not allow defining two indexes with the same name, even if they are defined over different tables.
Select Is unique if the index represents a unicity index.
In the left side of the dialog, there is a list of the fields of the view. Select the ones that belong to the index and click Add >>. To select several fields at once, hold Ctrl or Shift and click on the fields you want to select.
To change the order of the fields in the index, select the check box of the field you want to move and click on and .
To remove a field from the index, select its check box in the list of the right side and click << Remove.
Click Ok to create the index.
To edit a cache index, select its check box and click Edit selected index.
To remove a cache index, select its check box and click Remove selected index.
Cache Refresh Jobs¶
When a view has a cache configured to store data in the cache database’s table, you can load the data manually or create a VDPCache job in Scheduler to do so. This section shows the cache refresh jobs configured for the view and a summary of the most relevant information. You will also have links to the Scheduler Web Administration Tool to get more detailed information about the jobs, manage them, and create new ones.
Note
The Cache refresh jobs feature is only available in the Design Studio of the Solution Manager.
The table with the VDPCache jobs will load the cache jobs when the collapsible section is expanded the first time. Afterward, to update the table, use the button Reload jobs. The table contains the following information on the cache refresh jobs of the view:
Job. The name of the VDPCache job. This is a link that will open a new tab for the job details in the Scheduler Web Administration Tool.
State. The current state of the job. In particular, a job can be found running, not running, waiting, or having been disabled.
Last execution. This shows the last time the job was executed. It will be blank if it has never been executed.
Next execution. This shows the next time the job will be executed. It will appear blank if it has been disabled or, according to its time-based scheduling, will not run again.
Result of the last execution. Shows the completion state of the latest job execution. A job can be completed correctly, end with an error condition detected, end with a warning, stopped by the user, or misfired (that means the scheduled execution didn’t run). If the job ran and didn’t finish correctly, this field will be a link to the execution details section of the Scheduler Web Administration Tool to get more detailed information on the last execution result.
Cached tuples. Number of tuples/documents that have been extracted in the last execution of the job.
To create a new VDPCache job, click the Add refresh job in Scheduler button. It will open the Scheduler Web Administration Tool in the dialog in a new tab to create a new VDPCache job with some of the fields already filled.
How This Works¶
When a Virtual DataPort server requests a license to Solution Manager, Solution Manager returns, among other things, the environment identifier and the cluster identifier of Solution Manager to which this Virtual DataPort server belongs.
When a user logs into a Virtual DataPort server with Design Studio, Virtual DataPort returns the environment identifier and the cluster identifier.
When a user opens a view that has cache enabled and expands the section Cache refresh jobs, Design Studio does this:
It sends a request to the API of the Solution Manager to obtain the Scheduler server’s hostname registered in the same environment and cluster as this Virtual DataPort.
If there are multiple Scheduler servers in the same cluster, the enabled one will be selected first. If none of them are enabled, then a disabled one will be chosen.
It sends a request to the Scheduler API to obtain the information about the jobs that refresh the view’s cache.
The Design Studio will connect to the Scheduler Web Administration Tool deployed in the same Tomcat of the Solution Manager installation.
Pre-Requites to Use this Feature¶
This feature is not available for users of Denodo Express.
This feature is not available if your subscription of Denodo does not include the Solution Manager. That is, if you have a standalone license, this feature will not work (if you do not know what this is, it is unlikely you have a standalone license).
Use the Design Studio of the Solution Manager; this feature is not available in the Design Studio of the installations of the Denodo Platform.
Enable and configure the Denodo Security Token Authentication in your Denodo Platform installations.
In Solution Manager: - The Scheduler server that loads the cache of the views of the Virtual DataPort you connect to has to be registered on the same environment and cluster. - If the cluster has multiple Virtual DataPort servers, all must have the same metadata (either having a shared metadata database or just having the same elements). - If the cluster has multiple Scheduler servers, all must share the same metadata database.
Configuration Steps¶
Configure the Virtual DataPort servers to get the license from the License Server of the Solution Manager. This feature will not work if the Virtual DataPort server uses a standalone license.
Edit the
<SOLUTION_MANAGER_HOME>/conf/design-studio/DesignStudioBackend.properties
and change the value of the properties.scheduler.api.host
: The hostname of the Scheduler Web Administration Tool of the Solution Manager. If the Scheduler Web Administration Tool is configured to use SSL, ensure the hostname of the machine matches with the SSL certificate’s Common Name (CN) or Subject Alternative Name (SAN).127.0.0.1
by default.scheduler.api.port
: The port of the Scheduler Web Administration Tool.19090
by default.scheduler.api.ssl
: If the Scheduler Web Administration Tool is configured to use SSL or not.scheduler.api.context-path
: (Optional) The context path of the Scheduler Web Administration Tool./webadmin/denodo-scheduler-admin/
by default.
If the Solution Manager is configured to use SSL, edit the file
<SOLUTION_MANAGER_HOME>/conf/solution-manager-web-tool/SMAdminConfiguration.properties
and change the value of the properties.com.denodo.solutionmanager.security.ssl.enabled
: Set totrue
.com.denodo.solutionmanager.host
: Update with the machine’s hostname to match with the SSL certificate’s Common Name (CN) or SubjectAlternativeName (SAN).
Restart the Design Studio.