Last modified on: 29 Jun 2020
Virtual DataPort incorporates a system called cache module that can store a local copy of the data retrieved from the data sources, in a JDBC database. This may reduce the impact of repeated queries hitting the data source and speed up data retrieval, especially with certain types of sources.
The Cache Engine, available in Virtual DataPort, allows two main modes: Partial and Full. When you enable the former, the cache only stores some of the tuples of the view and, at runtime, when a user queries a view with this cache mode, the Server checks if the cache contains the data required to answer the query. If it does not have this data, the Server queries the data source. However, if you use the latter the data of the view is always retrieved from the cache database instead of from the source.
The Denodo Cloud Cache Load Bypass Stored Procedure is focused on giving support to cache Full initialisation processes in scenarios in which Amazon Redshift or Snowflake is used as cache, and the data has to be loaded from another Amazon Redshift or Snowflake instance which is configured as data source.
For running the Denodo Cloud Cache Load Bypass Stored Procedure, first of all, you have to load the denodo-cloud-cache-load-bypass-{vdpversion}-{version}-jar-with-dependencies.jar file using the Jar Management option of VDP Administration Tool located in the File > Jar Management menu. Once you have imported the Jar file you can add the stored procedure.
You can add a new stored procedure with the statement CREATE PROCEDURE:
CREATE [OR REPLACE] PROCEDURE <name:identifier> CLASSNAME= 'com.denodo.connect.cache.CloudCacheLoadBypassStoredProcedure' JARS 'denodo-cloud-cache-load-bypass-<vdpversion>'; [ FOLDER = <literal> ] [ DESCRIPTION = <literal> ] |
The classname must be com.denodo.connect.cache.CloudCacheLoadBypassStoredProcedure and the JARS value must be the Jar file, denodo-cloud-cache-load-bypass-<vdpversion>, previously added to the Virtual DataPort Server (see the Importing the extension to the Virtual DataPort Server section for more information).
You can add a new stored procedure clicking Stored procedure on the menu File > New:
You must set a name in the Name field and select the Select Jars checkbox in order to use the Jar file, denodo-cloud-cache-load-bypass-<vdpversion>, previously added to the Virtual DataPort Server (see the Importing the extension to the Virtual DataPort Server section for more information).
When you enable the Cache Engine in the Virtual DataPort the default DBMS (Database Management System) is the embedded Apache Derby database but it is highly recommended to change it and use an external one. Snowflake or Amazon Redshift can be configured as external DBMSs in order to use them as cache systems. In addition, both databases can be used as data sources in Virtual DataPort.
If you have this scenario, with Snowflake or Amazon Redshift as data source and Snowflake or Amazon Redshift as a cache, and you use Full cache mode in some of your base or derived views, the Denodo Cloud Cache Load Bypass Stored Procedure will allow you to cache the data faster than when you invoke the process with the cache preload query. Nevertheless, when you are caching derived views, you should be aware of:
In order to carry out the data caching, the stored procedure requires Amazon S3 (Amazon Simple Storage Service) to store the exported data from the data source. Denodo Cloud Cache Load Bypass Stored Procedure will download in parallel the data to cache into a specified Amazon S3 bucket, in GZip files, and the DBMS used as cache will load the data from the Amazon S3, also in parallel. The files will be deleted from Amazon S3 at the end of a successful loading process.
During the data unloading and loading process, null values are represented as ‘\\N’ in the Amazon S3.
This decision is based on:
The Amazon S3 bucket where Amazon Redshift will write the unloaded files must be created in the same region as the Amazon Redshift cluster used as data source. However, if you use Amazon Redshift as cache, the Amazon S3 bucket that holds the data files may be in a different region than the region in which the Amazon Redshift cluster, used as cache, resides.
The stored procedure needs several input parameters:
After installing the stored procedure there are four ways of invoking it:
CALL CLOUD_CACHE_LOAD_BYPASS('database_name', 'data_source_name', 'view_name', 'id > 3 and id < 20', true, 'bucket_name', 'aws_access_key_id', 'aws_secret_access_key', 'amazon_s3_object_prefix', 'region', max_file_size); |
SELECT * FROM CLOUD_CACHE_LOAD_BYPASS('database_name', 'data_source_name', 'view_name', 'id > 3 and id < 20', true, 'bucket_name', 'aws_access_key_id', 'aws_secret_access_key', 'amazon_s3_object_prefix', 'region', max_file_size); |
SELECT * FROM CLOUD_CACHE_LOAD_BYPASS() WHERE DATABASE_NAME = 'database_name' and DATA_SOURCE_NAME = 'data_source_name' and VIEW_NAME = 'view_name' and QUERY_CONDITIONS = 'id > 3 and id < 20' and INVALIDATE_BEFORE_LOADING = true and BUCKET_NAME = 'bucket_name' and AWS_ACCESS_KEY_ID = 'aws_access_key_id' and AWS_SECRET_ACCESS_KEY = 'aws_secret_access_key' and OBJECT_PREFIX = 'amazon_s3_object_prefix' and REGION= 'region' and MAX_FILE_SIZE = max_file_size; |
The Denodo Cloud Cache Load Bypass Stored Procedure also has an output parameter called LOADED_ROWS that indicates the number of rows loaded in the cache system.
Synchronous process
The execution of Denodo Cloud Cache Load Bypass Stored Procedure is synchronous. The stored procedure prevents Virtual DataPort Server from processing the data but it will wait until the end of the process.
Timeout considerations
The process of loading the cache can be quite long, depending on how much data you need to move. If you invoke the Denodo Cloud Cache Load Bypass Stored Procedure using the Administration Tool you must revise the Admin Tool Preferences where you can change the query timeout. If the value 0 is specified, the tool will wait indefinitely until the process ends.