Lookout
Expert trails guide Denodo users through all the relevant materials related to a specific topic, including official doc, KB articles, training, Professional Services offering, and more. The main goal is to give users a single place with references to all the information that they need to become a Denodo expert on any specific topic.
“Why do I need caching?” I thought Data virtualization is a real time integration but caching is replicating data. These are some of the questions that might come up with respect to cache.
In this Expert Trail, you will gain a knowledge of caching with respect to how to model your cache with the necessary best practices in mind, the different purposes of cache and advanced configurations of caching.
The Hike
Stage 1: Purpose of Caching Denodo incorporates a cache module to store local copies of the source data as required. It is important to understand the difference between cache and other forms of replication like ETL. Some of the cache purposes are:
Let's continue this path by checking the Best Practices and recommendations for different aspects of the Cache Module such as, how to choose the cache database, how to decide what views to cache, or what is the best cache mode and refresh strategy for each particular use case: Best Practices to Maximize Performance III: Caching For information in other performance optimization techniques, review the Expert Trail: Query Performance Optimization |
Stage 2: Cache Configuration Initial setup: To make use of the cache, the first and foremost step is to configure it. The document Configuring the Cache explains how to enable the cache engine in the Server. After this, the cache can be configured for a specific view that you want its data to be cached. The document Configuring the Cache of a View explains how to do this Loading/refreshing the cache: Cached data is liable to become out of date. Therefore, you may need to establish a data refresh plan periodically. Our recommendation is to use Denodo Scheduler to automate the cache load process, may it be an incremental or complete load of the cache.
Cache Maintenance: When the content of a cached view is invalidated or reaches its time to live, expired/invalidated rows will not appear in query results. Denodo provides a cache maintenance process that periodically removes these rows. But, in production environments, the recommended setting is to use the CLEAN_CACHE_DATABASE predefined stored procedure instead. |
Stage 3: Modeling Depending upon your scenario selecting the right cache mode plays a vital role.
Additional consideration that should be taken into consideration are described in the Fine-grained privileges and caching best practices KB article. |
Stage 4: Advanced Configuration With a better understanding of what cache can do, now it is time now to take a look at some additional configurations or considerations. Bulk Data Load: Some databases also provide bulk load API options to optimize loading big amounts of data, unlike the default ‘batch insert’ capabilities. This may also imply limitations on the atomicity of the cache loading operation and Caching Very Large Data Sets documentation provides several recommendations for it. Cache Indexes: Indexes improve the speed of data retrieval operations on database tables. Thus, defining cache indexes for a view can take advantage of the indexes benefits. Specific conditions: There are some specific conditions you have to take into account depending on the database you use to store cached data. The document Specific Information about Cache Databases explains in detail about this. Estimate size of cache database: There are several factors involved in estimating the space required for the cache, and they can be highly dependent on the characteristics of the selected database system. The document Cache database size estimate explains more in detail. |
Stage 5: Summaries Apart from the traditional cache, Denodo 8 includes smart query acceleration techniques using a new type of view called Summaries or Smart Caching. These store common intermediate results that the query optimizer can then use as a starting point to accelerate analytical queries. Summaries do have some advantages over the traditional cache:
However, when it comes to caching it does not require a preprocessing phase to decide whether to use it or not. So in cases where you know you always want to access a certain data in the same way as you usually do, it is better to continue using tradicional cache, to avoid any unnecessary overhead. Also, caching offers more refresh and management options, like implicit caching mode, incremental queries, atomic or non-atomic loads, selective invalidation. |
Stage 6: Troubleshooting Finally, the latest step is to learn to debug and identify the possible reasons when a certain problem arises. For example, what if the cache server goes down for any reason, in this case the behavior of the VQL queries executed on cached views will depend on the cache mode: What happens if the cache database goes down? Monitoring the cache can also help to debug if any issue occurs when load processes of the cache on the Virtual DataPort server is performed. The document Monitoring - Cache shows the different fields that are available in the “Cache Load Processes” table which are useful for debugging. More information about monitoring can be found in the Expert Trail: Monitoring. |
Exploration
Fill up your backpack with additional gear:
Cache Best Practices |
|
Official Documentation |
|
KB Articles |
|
Additional Resources |
Cache related Predefined Stored Procedures |
|
Official Documentation |
Smart Query Acceleration (Summaries) |
|
Official Documentation |
|
KB Articles |
|
Webinars |
|
Additional Resources |
Guided Routes
Denodo Training Courses
Denodo training courses provide expert data virtualization training for data professionals, including administrators, architects, and developers.
If you are interested in Cache you should enroll the following course/s:
- Denodo Data Management: This course covers when and how Denodo can store data itself (Cache / Materialized tables / Temporary tables / Summaries) and how to manage that data (see the data, remove/invalidate data,...).
- Advanced Configuration of Denodo Views: This course covers all the advanced configuration options available in Denodo views including cache configuration.
- Denodo Performance Best Practices: This course will talk about the internal details of the Denodo Optimizer to learn how to maximize the performance of the queries executed in Denodo Platform.
Technical Advisory Sessions
Denodo Customers with active subscriptions have access to request Meet a Technical Advisory sessions.
These are the sessions available related to Cache:
Infrastructure implementation |
Cache: Configuration & Administration Best Practices |
- Configure the cache data source/s using a supported database. - Configure Bulk Load for cache. - Set up Cache Maintenance Task. - Monitoring the cache. - Smart Caching: Summaries |
Architecture & Standards |
Cache Modes Overview |
Review and showcase how the Denodo Cache works: - The different cache modes (Partial, Full, Incremental) - Loading the cache and refreshing. - Invalidating the cache contents. - Indexes. |
Cache Best Practices |
- Assist you in defining a policy to determine when to use the cache and what type to use, or review your current policy. - Advice on selecting the best cache strategy for a specific scenario. - Incremental caching strategies. |
|
Schedule Cache Load |
Assist you on your first steps creating cache jobs in Scheduler. |
Professional Services
Denodo Professional Services can help you at the start or any part of your query performance trail. You can find information about the Denodo Professional Services offering in:
Professional Services for Data Virtualization | Denodo
- Operations Quick Start
- Development Quick Start
- Virtualization Architecture & Design
If you are a Denodo customer, you can reach out to your Customer Success Manager for details about any Guided Route that you need.
Big Hike Prep Check
Let’s see if you are ready to start your big trail. Take this 3-question questionnaire to check your readiness for an enjoyable hike.
Read the questions below, think about the solution and check if you got them right by looking at the solution. Have you become an expert?
- Consider you have a derived (join) view on three base views created over three data sources (1 - JDBC , 2- SAP , 3 - JSON (Rest API)). Looking at the execution trace you have identified the query performance is affected by JSON data source. Where would you apply caching? Is it at JSON base view level or at derived view level?
Click here to check if you got it right
The recommendation in this scenario would be at JSON base view level. From the execution trace you have identified the post processing in Denodo is not a problem but the network latency of JSON affects the query performance. |
- Consider you have a Full cache on a base view and have preloaded with 1 Million rows. The data in the source does not change much, probably a few hundred rows are added every week. So you decided to schedule a cache preload once in a month. Now you have a requirement that the end users also need to view the changes from the data source (i.e those few hundred records). What would be the best approach here?
Click here to check if you got it right
Incremental Full cache mode is a good option for this scenario. For more details, refer to Requirements a View Has to Meet to Support Incremental Queries. |
- In Production, you noticed there is a deadlock error in Denodo Scheduler between a cache preload job and cache maintenance job. What could have caused this deadlock?
Click here to check if you got it right
This may happen when the cache maintenance job is configured to run at the same time of the cache preload job. There are 2 possibilities:
In Production environments, the recommendation is to set the “Maintenance” option to “Off” to avoid executing the Cache Maintenance task periodically and having control over when the maintenance task is executed. Thus, use the Denodo Scheduler to program the execution of the CLEAN_CACHE_DATABASE procedure at a period where the Server and the cache’s database are not expected to be under heavy load, thus avoid having cache load job and CLEAN_CACHE_DATABASE to run in parallel. |