In this tutorial on cache, we are going to explore the efficiency of Partial and Full cache types and their impact on data retrieval. The tutorial covers the basics of scheduling cache updates to maintain data up-to-date. By the end, users will gain essential knowledge in managing caches, setting the stage for improved performance and data accuracy.
As you have learnt, Denodo accesses and extracts information from target sources at runtime and combines them in real-time to get the results. In other words, no local copy of the data will be stored in Denodo.
With this in mind, it is clear that some of the more traditional performance optimization practices used in database and data warehouse implementations, such as index construction, will fall outside the scope of a real-time Data Virtualization framework but strategies such as caching can help to improve the performance of real-time source access and combination goals.
The Denodo advanced cache system is based on an external database (traditional or in-memory database).
In the previous tutorial, Data Combination, you learned how to combine the unified customer view with its billing information, so you can have a report that lists the total amount that is due for each of your customers. It is strongly advised to complete the Data Combination tutorial before starting this one, as we are going to be using the resources (data source, base views and integrations) created on it.
If you have already completed it, please proceed to the next section. Otherwise, follow the subsequent instructions:
- Launch the resources needed (check how in the Installation & Bootstrapping tutorial).
- Log in Denodo Design Studio (user/password:
admin
/admin
) - Import this VQL to Denodo by clicking into
File > Import
. Drag ‘n' drop the file and select "tutorial" as the Database:
If all steps have been executed correctly, you should observe the following in the Design Studio's elements tree:
Great! Now it's time to start the tutorial!
Denodo is an important component of any data management infrastructure, but not the only one. When measuring performance, it is important to make sure which of the elements are bottlenecks. For example, a data source might be returning data in a slow fashion; in some cases you will be able to increase the performance by adding a new index to that source. If these actions cannot be performed, you can configure an intelligent caching system in Denodo to speed up your queries.
Let's see some use cases where you can use cache:
- Some data sources might be slow and you want to speed up the queries hitting those data sources.
- You want to avoid high workloads in the data sources.
- Pre-compute and store in cache costly transformations, so they do not need to be recomputed in real-time.
- You want to delegate (or speed up) some queries with data coming from several different data sources.
- Protect the system against potential unavailability of data sources (especially when they are external sources).
In this tutorial you will see how to:
- Activate the Cache Module in Denodo
- Enable Cache in Base and Integration Views
- View Some Examples of Different Cache Modes
- Scheduling cache refreshes
Denodo includes a module to store local copies of the data as required in an external data source.
For using the cache engine in Denodo, you have to enable it at server-level or at database-level. In this tutorial we will configure the cache at server level. Let's see how to configure the Denodo server to use cache:
- Go to the Design Studio and, in the Menu Bar, go to
Administration > Server configuration
- Unfold
Cache
and click onGeneral
. To enable the cache, set theCache status
to On.
- Click on
Connection
to select the Cache engine you are going to use. The defaultEmbedded Derby server
is theDatabase adapter
that will be used in this tutorial as it is already included with Denodo.
- Click
Save
.
In the next section, you will learn how to configure what views in Denodo have to be cached.
In the previous section, you activated the cache module in your Denodo server. Now, you need to configure your views to make use of the cache.
For example, let's activate the cache in the iv_client_with_bills
view. We have two main reasons to select this view as a cached view:
- It queries two different data sources (a relational database and a Web Service).
- Usually Web Service response times are worse than traditional database response times.
How to activate cache for a view from Design Studio
You can follow these steps:
- Open Design Studio and, in the Elements Tree, double-click on the view name
iv_client_with_bills
. The view schema will show in the Workspace. - Click on the
Options
tab at the top. - Under
Cache mode
, selectPartial
, for example. It will enable a partial cache for theiv_client_with_bills
view.
- Leave the rest of the fields with the default values.
- Finally, click
Save
Cache in Action
Now, let's test our cached view executing it. Follow these steps:
- Open Design Studio and, in the Elements Tree, double-click on the view name
iv_client_with_bills
. - Click on the
Summary
tab at the top. - Execute the query by clicking on the
Execution Panel
button. Ensure you have selected the optionRetrieve all rows
and then click onExecute
- After the execution, click on the
Execution Trace
link (above the results) to see the query execution plan. You can see that the data comes directly from the different data sources.
- Open the
Execute
tab and perform the same query again. - Click on the
Execution Trace
link again.
Now you can see that the data comes from the cache! (displayed with a box), and the execution time of the query is faster.
In the next section, you will learn more about the available cache modes in Denodo. In the above example, you configured the view to use Partial mode, in the next section we will learn the behavior of the different cache modes.
In the previous section, you activated the cache in one of your views. Now, it's time to learn more about how the Denodo Cache works. Denodo has the following cache modes:
- Partial:
The first time a query over the view is executed, the cache table will be populated with the tuples in the output from the data sources. At runtime, when a user queries the view, the Denodo server checks if the cache contains the data required to answer the query. If it does not have this data, Denodo will query the data source and populate the cache with that output.
When the Time To Live (TTL) of the data has passed, the cache system will invalidate the cached data of the view so the next query will hit the data source.
This mode supports the following options:Match exact queries only
: if this option is selected, the cache stores the result of each query. Then, if the same query is executed, and the entries of this query in cache have not expired (TTL), the data returned to the client is retrieved from the cache.With explicit loads
: if this option is selected, the cache has to be loaded explicitly. - Full:
In this mode, the data of the view is always retrieved from the cache engine instead of from the source. For loading the cache, this mode always requires explicit cache loads.
The main benefit of this mode over the partial cache is that complex operations (joins, unions, group by...) involving several views (even from different data sources) can be delegated to the cache database. Therefore, the performance of these operations is significantly improved.
Cache Examples
Let's see how the cache works using the VQL Shell. This tool allows you to execute queries to Denodo. Just go to the to menu and select the Tools > VQL Shell
option.
As an example, let's use the following query in the VQL Shell (make sure you have the tutorial
database selected from the drop-down Database menu and selected the Retrieve all rows
checkbox):
SELECT * FROM iv_client_with_bills WHERE client_id = 'C077'
Run this query and check the Execution Trace with these different cache configurations of the iv_client_with_bills
view:
- With the Cache Off
The data sources are always queried. - With a Partial Cache
The data source will be queried the first time and subsequent queries will retrieve the data from the cache, if it is available.
To demonstrate, execute these queries:SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C066'
will query the data source again.SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02'
will query the cache.SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C066'
queries the cache now. - Partial Cache with Explicit loads
Denodo server will query the data source until the cache is told to be loaded, adding thisCONTEXT('cache_preload' = 'true')
to the query.SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077'
will query the data source.SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' CONTEXT('cache_preload' = 'true')
will query the data source and load the cache.SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077'
will now query the cache. - Partial cache with Match exact queries only (unselecting Explicit loads option)
SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077'
will query the cache.SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02'
will query the data source and load the cache for this query.SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02'
will query the cache now. - Full Cache
All of the view's underlying data will be loaded into the cache, and all queries will only go to the cache.
The action of load (or refresh) de cache can be scheduled. Go to the next step to see how.
Let's suppose that in this company, billing information is updated each day, so we want to automate this process by creating a scheduled job to refresh the updated rows in the cache. We are going to work with the same view used in the previous section, iv_client_with_bills
.
- Let's start Scheduler server and the Web Administration Tool (available at http://127.0.0.1:9090/webadmin/denodo-scheduler-admin/ )
- If you are launching Denodo from a container, for starting Scheduler you can stop your current containers and execute this command (the Web Administration Tool will be available at http://127.0.0.1:39090/webadmin/denodo-scheduler-admin/ ):
~$ docker compose --profile denodo --profile ds down ~$ docker compose --profile denodo –profile denodo-sched --profile ds up
- Use
admin / admin
for the credentials
- Create a new VDPCache job by clicking on
Add job > VDPCache
- Complete
Job Name
field asbilling_daily_cache_refresh
- On the
Extraction section
tab choosetutorial.iv_client_with_bills
- Finally, we want to program an execution of the job each day. On the
Triggers section
tab, click onAdd trigger
(leave by default all the fields, as it is already a daily update) and click onSave
.
The job we just created should be added to the list. We can test it by selecting the job and clicking on Start > Start
.
Schedule from Design Studio
It is possible to create the Scheduler job directly from the Design Studio of a Denodo Solution Manager installation. You will see in the Options
of the view a table with the Cache refresh jobs
:
By clicking on the on Add refresh job in Scheduler
you will be redirected to the Scheduler to create the job.
Congratulations! You have completed the Denodo Caching Tutorial.