Applies to:
Denodo 8.0
Last modified on: 22 Jul 2022
Tags:
Cache
Denodo Administrator
Denodo Developer
Development
Infrastructure Setup
Expert trails guide Denodo users through all the relevant materials related to a specific topic, including official doc, KB articles, training, Professional Services offering, and more. The main goal is to give users a single place with references to all the information that they need to become a Denodo expert on any specific topic.
“Why do I need caching?” I thought Data virtualization is a real time integration but caching is replicating data. These are some of the questions that might come up with respect to cache.
In this Expert Trail, you will gain a knowledge of caching with respect to how to model your cache with the necessary best practices in mind, the different purposes of cache and advanced configurations of caching.
Stage 1: Purpose of Caching Denodo incorporates a cache module to store local copies of the source data as required. It is important to understand the difference between cache and other forms of replication like ETL. Some of the cache purposes are:
Let's continue this path by checking the Best Practices and recommendations for different aspects of the Cache Module such as, how to choose the cache database, how to decide what views to cache, or what is the best cache mode and refresh strategy for each particular use case: Best Practices to Maximize Performance III: Caching For information in other performance optimization techniques, review the Expert Trail: Query Performance Optimization |
Stage 2: Cache Configuration Initial setup: To make use of the cache, the first and foremost step is to configure it. The document Configuring the Cache explains how to enable the cache engine in the Server. After this, the cache can be configured for a specific view that you want its data to be cached. The document Configuring the Cache of a View explains how to do this Loading/refreshing the cache: Cached data is liable to become out of date. Therefore, you may need to establish a data refresh plan periodically. Our recommendation is to use Denodo Scheduler to automate the cache load process, may it be an incremental or complete load of the cache.
Cache Maintenance: When the content of a cached view is invalidated or reaches its time to live, expired/invalidated rows will not appear in query results. Denodo provides a cache maintenance process that periodically removes these rows. But, in production environments, the recommended setting is to use the CLEAN_CACHE_DATABASE predefined stored procedure instead. |
Stage 3: Modeling Depending upon your scenario selecting the right cache mode plays a vital role.
Additional consideration that should be taken into consideration are described in the Fine-grained privileges and caching best practices KB article. |
Stage 4: Advanced Configuration With a better understanding of what cache can do, now it is time now to take a look at some additional configurations or considerations. Bulk Data Load: Some databases also provide bulk load API options to optimize loading big amounts of data, unlike the default ‘batch insert’ capabilities. This may also imply limitations on the atomicity of the cache loading operation and Caching Very Large Data Sets documentation provides several recommendations for it. Cache Indexes: Indexes improve the speed of data retrieval operations on database tables. Thus, defining cache indexes for a view can take advantage of the indexes benefits. Specific conditions: There are some specific conditions you have to take into account depending on the database you use to store cached data. The document Specific Information about Cache Databases explains in detail about this. Estimate size of cache database: There are several factors involved in estimating the space required for the cache, and they can be highly dependent on the characteristics of the selected database system. The document Cache database size estimate explains more in detail. |
Stage 5: Summaries Apart from the traditional cache, Denodo 8 includes smart query acceleration techniques using a new type of view called Summaries or Smart Caching. These store common intermediate results that the query optimizer can then use as a starting point to accelerate analytical queries. Summaries do have some advantages over the traditional cache:
However, when it comes to caching it does not require a preprocessing phase to decide whether to use it or not. So in cases where you know you always want to access a certain data in the same way as you usually do, it is better to continue using tradicional cache, to avoid any unnecessary overhead. Also, caching offers more refresh and management options, like implicit caching mode, incremental queries, atomic or non-atomic loads, selective invalidation. |
Stage 6: Troubleshooting Finally, the latest step is to learn to debug and identify the possible reasons when a certain problem arises. For example, what if the cache server goes down for any reason, in this case the behavior of the VQL queries executed on cached views will depend on the cache mode: What happens if the cache database goes down? Monitoring the cache can also help to debug if any issue occurs when load processes of the cache on the Virtual DataPort server is performed. The document Monitoring - Cache shows the different fields that are available in the “Cache Load Processes” table which are useful for debugging. More information about monitoring can be found in the Expert Trail: Monitoring. |
Fill up your backpack with additional gear:
Cache Best Practices |
|
Official Documentation |
|
KB Articles |
|
Additional Resources |
Cache related Predefined Stored Procedures |
|
Official Documentation |
Smart Query Acceleration (Summaries) |
|
Official Documentation |
|
KB Articles |
|
Webinars |
|
Additional Resources |
Denodo training courses provide expert data virtualization training for data professionals, including administrators, architects, and developers.
If you are interested in Cache you should enroll the following course/s:
Denodo Customers with active subscriptions have access to request Meet a Technical Advisory sessions.
These are the sessions available related to Cache:
Infrastructure implementation |
Cache: Configuration & Administration Best Practices |
- Configure the cache data source/s using a supported database. - Configure Bulk Load for cache. - Set up Cache Maintenance Task. - Monitoring the cache. - Smart Caching: Summaries |
Architecture & Standards |
Cache Modes Overview |
Review and showcase how the Denodo Cache works: - The different cache modes (Partial, Full, Incremental) - Loading the cache and refreshing. - Invalidating the cache contents. - Indexes. |
Cache Best Practices |
- Assist you in defining a policy to determine when to use the cache and what type to use, or review your current policy. - Advice on selecting the best cache strategy for a specific scenario. - Incremental caching strategies. |
|
Schedule Cache Load |
Assist you on your first steps creating cache jobs in Scheduler. |
Denodo Professional Services can help you at the start or any part of your query performance trail. You can find information about the Denodo Professional Services offering in:
Professional Services for Data Virtualization | Denodo
If you are a Denodo customer, you can reach out to your Customer Success Manager for details about any Guided Route that you need.
Let’s see if you are ready to start your big trail. Take this 3-question questionnaire to check your readiness for an enjoyable hike.
Read the questions below, think about the solution and check if you got them right by looking at the solution. Have you become an expert?
Click here to check if you got it right
The recommendation in this scenario would be at JSON base view level. From the execution trace you have identified the post processing in Denodo is not a problem but the network latency of JSON affects the query performance. |
Click here to check if you got it right
Incremental Full cache mode is a good option for this scenario. For more details, refer to Requirements a View Has to Meet to Support Incremental Queries. |
Click here to check if you got it right
This may happen when the cache maintenance job is configured to run at the same time of the cache preload job. There are 2 possibilities:
In Production environments, the recommendation is to set the “Maintenance” option to “Off” to avoid executing the Cache Maintenance task periodically and having control over when the maintenance task is executed. Thus, use the Denodo Scheduler to program the execution of the CLEAN_CACHE_DATABASE procedure at a period where the Server and the cache’s database are not expected to be under heavy load, thus avoid having cache load job and CLEAN_CACHE_DATABASE to run in parallel. |