CACHING

In this tutorial on cache, we are going to explore the efficiency of Partial and Full cache types and their impact on data retrieval. The tutorial covers the basics of scheduling cache updates to maintain data up-to-date. By the end, users will gain essential knowledge in managing caches, setting the stage for improved performance and data accuracy.

As you have learnt, Denodo accesses and extracts information from target sources at runtime and combines them in real-time to get the results. In other words, no local copy of the data will be stored in Denodo.

With this in mind, it is clear that some of the more traditional performance optimization practices used in database and data warehouse implementations, such as index construction, will fall outside the scope of a real-time Data Virtualization framework but strategies such as caching can help to improve the performance of real-time source access and combination goals.

The Denodo advanced cache system is based on an external database (traditional or in-memory database).

In the previous tutorial, Data Combination, you learned how to combine the unified customer view with its billing information, so you can have a report that lists the total amount that is due for each of your customers. It is strongly advised to complete the Data Combination tutorial before starting this one, as we are going to be using the resources (data source, base views and integrations) created on it.

If you have already completed it, please proceed to the next section. Otherwise, follow the subsequent instructions:

  1. Launch the resources needed (check how in the Installation & Bootstrapping tutorial).
  2. Log in Denodo Design Studio (user/password: admin/admin)
  3. Import this VQL to Denodo by clicking into File > Import. Drag ‘n' drop the file and select "tutorial" as the Database:

If all steps have been executed correctly, you should observe the following in the Design Studio's elements tree:

Great! Now it's time to start the tutorial!

Denodo is an important component of any data management infrastructure, but not the only one. When measuring performance, it is important to make sure which of the elements are bottlenecks. For example, a data source might be returning data in a slow fashion; in some cases you will be able to increase the performance by adding a new index to that source. If these actions cannot be performed, you can configure an intelligent caching system in Denodo to speed up your queries.

Let's see some use cases where you can use cache:

  1. Some data sources might be slow and you want to speed up the queries hitting those data sources.
  2. You want to avoid high workloads in the data sources.
  3. Pre-compute and store in cache costly transformations, so they do not need to be recomputed in real-time.
  4. You want to delegate (or speed up) some queries with data coming from several different data sources.
  5. Protect the system against potential unavailability of data sources (especially when they are external sources).

In this tutorial you will see how to:

  • Activate the Cache Module in Denodo
  • Enable Cache in Base and Integration Views
  • View Some Examples of Different Cache Modes
  • Scheduling cache refreshes

Denodo includes a module to store local copies of the data as required in an external data source.

For using the cache engine in Denodo, you have to enable it at server-level or at database-level. In this tutorial we will configure the cache at server level. Let's see how to configure the Denodo server to use cache:

  1. Go to the Design Studio and, in the Menu Bar, go to Administration > Server configuration

  1. Unfold Cache and click on General. To enable the cache, set the Cache status to On.

  1. Click on Connection to select the Cache engine you are going to use. The default Embedded Derby server is the Database adapter that will be used in this tutorial as it is already included with Denodo.

  1. Click Save.

In the next section, you will learn how to configure what views in Denodo have to be cached.

In the previous section, you activated the cache module in your Denodo server. Now, you need to configure your views to make use of the cache.

For example, let's activate the cache in the iv_client_with_bills view. We have two main reasons to select this view as a cached view:

  • It queries two different data sources (a relational database and a Web Service).
  • Usually Web Service response times are worse than traditional database response times.

How to activate cache for a view from Design Studio

You can follow these steps:

  1. Open Design Studio and, in the Elements Tree, double-click on the view name iv_client_with_bills. The view schema will show in the Workspace.
  2. Click on the Options tab at the top.
  3. Under Cache mode, select Partial, for example. It will enable a partial cache for the iv_client_with_bills view.

  1. Leave the rest of the fields with the default values.
  2. Finally, click Save

Cache in Action

Now, let's test our cached view executing it. Follow these steps:

  1. Open Design Studio and, in the Elements Tree, double-click on the view name iv_client_with_bills.
  2. Click on the Summary tab at the top.
  3. Execute the query by clicking on the Execution Panel button. Ensure you have selected the option Retrieve all rows and then click on Execute

  1. After the execution, click on the Execution Trace link (above the results) to see the query execution plan. You can see that the data comes directly from the different data sources.

  1. Open the Execute tab and perform the same query again.
  2. Click on the Execution Trace link again.

Now you can see that the data comes from the cache! (displayed with a box), and the execution time of the query is faster.

In the next section, you will learn more about the available cache modes in Denodo. In the above example, you configured the view to use Partial mode, in the next section we will learn the behavior of the different cache modes.

In the previous section, you activated the cache in one of your views. Now, it's time to learn more about how the Denodo Cache works. Denodo has the following cache modes:

  • Partial:
    The first time a query over the view is executed, the cache table will be populated with the tuples in the output from the data sources. At runtime, when a user queries the view, the Denodo server checks if the cache contains the data required to answer the query. If it does not have this data, Denodo will query the data source and populate the cache with that output.

    When the Time To Live (TTL) of the data has passed, the cache system will invalidate the cached data of the view so the next query will hit the data source.



    This mode supports the following options:

    Match exact queries only: if this option is selected, the cache stores the result of each query. Then, if the same query is executed, and the entries of this query in cache have not expired (TTL), the data returned to the client is retrieved from the cache.

    With explicit loads: if this option is selected, the cache has to be loaded explicitly.
  • Full:
    In this mode, the data of the view is always retrieved from the cache engine instead of from the source. For loading the cache, this mode always requires explicit cache loads.

    The main benefit of this mode over the partial cache is that complex operations (joins, unions, group by...) involving several views (even from different data sources) can be delegated to the cache database. Therefore, the performance of these operations is significantly improved.

Cache Examples

Let's see how the cache works using the VQL Shell. This tool allows you to execute queries to Denodo. Just go to the to menu and select the Tools > VQL Shell option.

As an example, let's use the following query in the VQL Shell (make sure you have the tutorial database selected from the drop-down Database menu and selected the Retrieve all rows checkbox):

SELECT * FROM iv_client_with_bills WHERE client_id = 'C077'

Run this query and check the Execution Trace with these different cache configurations of the iv_client_with_bills view:

  • With the Cache Off
    The data sources are always queried.
  • With a Partial Cache
    The data source will be queried the first time and subsequent queries will retrieve the data from the cache, if it is available.

    To demonstrate, execute these queries:

    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C066' will query the data source again.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02' will query the cache.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C066' queries the cache now.
  • Partial Cache with Explicit loads
    Denodo server will query the data source until the cache is told to be loaded, adding this CONTEXT('cache_preload' = 'true') to the query.

    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' will query the data source.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' CONTEXT('cache_preload' = 'true') will query the data source and load the cache.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' will now query the cache.
  • Partial cache with Match exact queries only (unselecting Explicit loads option)

    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' will query the cache.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02' will query the data source and load the cache for this query.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02' will query the cache now.
  • Full Cache
    All of the view's underlying data will be loaded into the cache, and all queries will only go to the cache.

The action of load (or refresh) de cache can be scheduled. Go to the next step to see how.

Let's suppose that in this company, billing information is updated each day, so we want to automate this process by creating a scheduled job to refresh the updated rows in the cache. We are going to work with the same view used in the previous section, iv_client_with_bills.

  1. Let's start Scheduler server and the Web Administration Tool (available at http://127.0.0.1:9090/webadmin/denodo-scheduler-admin/ )

  1. If you are launching Denodo from a container, for starting Scheduler you can stop your current containers and execute this command (the Web Administration Tool will be available at http://127.0.0.1:39090/webadmin/denodo-scheduler-admin/ ):
~$ docker compose --profile denodo --profile ds down 
~$ docker compose --profile denodo –profile denodo-sched --profile ds up 
  1. Use admin / admin for the credentials

  1. Create a new VDPCache job by clicking on Add job > VDPCache

  1. Complete Job Name field as billing_daily_cache_refresh

  1. On the Extraction section tab choose tutorial.iv_client_with_bills

  1. Finally, we want to program an execution of the job each day. On the Triggers section tab, click on Add trigger (leave by default all the fields, as it is already a daily update) and click on Save.

The job we just created should be added to the list. We can test it by selecting the job and clicking on Start > Start.

Schedule from Design Studio

It is possible to create the Scheduler job directly from the Design Studio of a Denodo Solution Manager installation. You will see in the Options of the view a table with the Cache refresh jobs:

By clicking on the on Add refresh job in Scheduler you will be redirected to the Scheduler to create the job.

Congratulations! You have completed the Denodo Caching Tutorial.