Cache Module¶
Virtual DataPort includes a “Cache Engine” that can store the data retrieved from the data sources, in an external relational database. You enable the cache engine on the views for which you want to cache the data (data is not cached by default).
Caching the data of a view may help reduce the impact of repeated queries hitting the data source and speed up data retrieval, especially the queries with complex calculations, queries that aggregate millions of rows or when working with slow data sources.
This is the list of databases that you can use for this:
Database |
Driver is included in the Denodo Platform |
---|---|
AlloyDB for PostgreSQL |
Included |
Amazon Athena |
Driver included |
Amazon Aurora MySQL |
Driver included |
Amazon Aurora PostgreSQL |
Driver included |
Amazon Redshift |
Driver included |
Azure SQL Database |
Driver included |
Azure Synapse SQL (previously known as Azure SQL Data Warehouse) |
Yes (Microsoft SQL Server driver) |
Google BigQuery |
Driver not included |
Databricks |
Driver included |
Exasol |
Driver included |
Hive 2.0.0 (HiveServer2) (*) |
Driver not included |
Hive 3.1.2 and higher(HiveServer2) |
Driver not included |
HP Vertica 7 |
Driver not included |
IBM DB2 8, 9, 9 for z/OS, 10, 10 for z/OS, 11 and higher and 11 for z/OS |
Driver not included |
Impala 2.3 |
Driver not included |
Impala 3.x Kudu |
Driver not included |
Microsoft SQL Server 2000, 2005, 2008, 2008 R2 (*) |
Two drivers are available and both are included:
|
Microsoft SQL Server 2012, 2014 and 2016 |
|
Microsoft SQL Server 2017, 2019 and higher |
Yes (only the Microsoft driver) |
MySQL 4, 5, 8 and higher and GCP Cloud SQL for MySQL |
Driver not included |
Netezza 6.0, 7.0 and higher |
Driver not included |
Oracle 8i, 9i, 10g, 11g, 12c, 12c In-Memory, 18c, 19c and higher |
Driver included |
Oracle TimesTen 11g |
Driver not included |
PostgreSQL 9, 10, 11, 12 and higher and GCP Cloud SQL for PostgreSQL |
Driver included |
PrestoDB 0.1x |
Driver included |
Trino 4xx |
Driver included |
SAP HANA 1.0 and 2.0 |
Driver not included |
Snowflake |
Driver included |
Spark SQL 2.x, 3.x and higher |
Driver not included |
Teradata 12, 13, 14, 15, 16 and 17 |
Driver not included |
Vertica 7 and 9 |
Driver not included |
Yellowbrick |
Driver included |
(*): the adapter is deprecated and it may be removed in future major versions of the Denodo Platform. The page Features Deprecated in Denodo Platform lists all the features that are deprecated.
Consider this:
Some database vendors forbid other software companies to distribute their drivers (in the table above, see the databases with the column Driver Included in the Denodo Platform = Driver not included):
To connect to one of these databases, download the official driver of the database, not the driver of another third-party company. Consider this:
We conduct the tests of each adapter using the official driver.
Only the official driver is really supported. If you use the driver of another company and something does not work, the Denodo Support Team will try to provide assistance but may not be able to provide a solution.
To import a driver, use the wizard of the menu File > Extension management to import the driver. The page Importing a JDBC Driver explains how to do this.
The Denodo Platform includes the drivers of the databases to make it easier for developers to create connections to data sources. Periodically, we upgrade these drivers so our customers benefit from bugfixes and enhancements. However, we cannot fix bugs in the drivers of other companies (only in the drivers developed by Denodo). If any issue arises due to bugs in these drivers, we will try to provide a workaround to the best of our ability but ultimately, it is up to the database vendors to fix the bugs in their own drivers.
In addition to the adapters listed above, the Cache Engine has a generic adapter that can be used with any other JDBC database to store cached data. The section Generic Support for Other Databases explains how to configure it. However, we strongly recommend that you use one of the databases of the list above because the performance when querying the cache database will be better. The reason is that Virtual DataPort will push down more operations to the supported databases than the generic one.
When you enable the Cache Engine for the first time, the default database to store the cache data is an Apache Derby database that is embedded with Denodo.
Important
We advise against using this embedded Apache Derby database. This database is provided to show case the Cache Engine and doing small projects. Instead, use an external database, especially on production environments.
To enable the cache engine, follow these steps:
Create a catalog or schema in the database specifically for the Cache Engine of Denodo.
This is not strictly mandatory but highly recommended. That is because the Cache Engine creates many tables (one for each view of Denodo where the cache is enabled and a few other “control tables”). Therefore, it is useful for these tables to be isolated from the rest of the elements of that database.
Consider this:
Size of the database: read the article Cache database size estimate of the Knowledge Base to know much space the database administrator should allocate for caching.
This new catalog or schema has to be created with these options:
Support for multi-byte characters (e.g. UTF-8). This will allow to store data that contains multi-byte characters (e.g. Japanese characters). Also, to be able to enable cache for a view whose name or fields has multi-byte characters.
Binary collation. In a database management system, the collation specifies how the database compares and sorts character strings. One of the main effects of the collation is the order of the results in queries that have an ORDER BY with columns that are character strings. For example, the collation determines whether or not uppercase letters and lowercase letters are treated as the same, sensitivity to accents (e.g. is “A” the same as “Á”), etc.
With binary collations, the data is ordered according to the numeric value of each byte of the character strings.
Important
Validate with the administrator of this database that the collation of this catalog/schema is binary.
The Execution Engine of Virtual DataPort expects the data to be sorted following the rules of a binary collation. If the collation is not binary, queries that execute JOIN operations of data obtained from the cache database and from another database may return incorrect results.
Grant the following privileges to the user account that Denodo will use to connect to this database:
Privileges to create and drop tables on this schema.
Privileges to execute
SELECT
,INSERT
,UPDATE
andDELETE
statements on these tables.
Enable the Cache Engine at a server level (see section Configuring the Cache). Alternatively, you can enable the cache module only for some databases (see section Configuring and Deleting Databases).
When the Server initializes the cache, it creates a set of tables in the database that will have information about the data stored in the cache. For each query whose data is cached, the Server stores a Query Pattern that later is used to know if a query can be “solved” with the data stored in the cache. A Query Pattern is formed by:
An id that uniquely identifies every Query Pattern.
The name of the view that these data belongs to.
The conditions of the WHERE clause of the query used to retrieve the data from the data source.
The expiration time of the data set that the Query Pattern represents. Once this time is reached, the data set associated with this Query Pattern is marked as invalid and no longer used. At regular intervals (Maintenance Period), the Server deletes the data sets that are marked as invalid. Note that this applies to cache partial only. For cache full, the expiration date represents the time of the latest cache refresh.
Status of the Query Pattern. This can be valid, processing (while the data is being stored in the cache) or invalid (when the expiration time of the Query Pattern has been reached).
Enable the cache in the views that requires it (see section Configuring the Cache of a View).
When enabling the cache on a view, the cache module creates a table in the cache’s database that will store the cached data of this view. The data types of the fields of this table are equivalent to the data types of the fields of the view in Virtual DataPort.
When a user executes a query that involves a view with cache enabled, the Server checks if it can retrieve the data from the cache. If not, it retrieves the data from the data source, stores it in the cache and creates a Query Pattern that represents these data.
The cache can also be by-passed for specific queries if you need to access the actual data.
When querying a view with mandatory fields (e.g. a Web service base view with mandatory input parameters) and the view is cached, you do not have to provide a value for the input parameters. However, you have to do it if the data has to be retrieved from the data source.
Warning
Users should be careful when enabling the cache for views that involve data sources with pass-through credentials enabled. The section Considerations When Configuring Data Sources with Pass-Through Credentials explains this in more detail.
Warning
The tables of the database that store cached data should be monitored for excessive fragmentation and compacted periodically if needed. In particular, with tables whose rows are invalidated very often.
You can see information about the Query Pattern of the database by
executing the Denodo stored procedure CACHE_CONTENT
. The section
Displaying the Contents of the Cache of the VQL Guide
explains how to invoke this procedure.
Denodo uses built-in SQL statements to create the cache tables. Although this should be sufficient for most scenarios, Denodo allows to customize the commands used to create the cache tables. See section Cache Table Creation Templates for more information.