MATERIALIZATION OF VIEWS

In this tutorial on materialization, we are going to explore the efficiency of the different options for materializing Denodo views and their impact on data retrieval. The tutorial covers the basics of scheduling jobs to maintain materialized data up-to-date. By the end, users will gain essential knowledge in managing materialization in Denodo Platform, setting the stage for improved performance and data accuracy.

As you have learnt, Denodo Platform accesses and extracts information from target sources at runtime and combines them in real-time to get the results. In other words, no local copy of the data will be stored in the Denodo Platform.

With this in mind, it is clear that some of the more traditional performance optimization practices used in database and data warehouse implementations, such as index construction, will fall outside the scope of a real-time Data Virtualization framework but strategies such as materialization can help to improve the performance of real-time source access and combination goals.

Cache architecture

The Denodo Platform advanced materialization system is based on an external data source (traditional or in-memory data source).

In the previous tutorial, Data Combination, you learned how to combine the unified customer view with its billing information, so you can have a report that lists the total amount that is due for each of your customers. It is strongly advised to complete the Data Combination tutorial before starting this one, as we are going to be using the resources (data source, base views and integrations) created on it.

If you have already completed it, please proceed to the next section. Otherwise, follow the subsequent instructions:

  1. Launch the resources needed (check how in the Installation & Bootstrapping tutorial).
  2. Log in Denodo Design Studio (user/password: admin/admin)
  3. Import this VQL to Denodo by clicking into File > Import. Drag ‘n' drop the file and select your preferred database (in our example, "tutorial"):

Design Studio - File menu

Design Studio - Import wizard

If all steps have been executed correctly, you should observe the following in the Design Studio's elements tree:

Design Studio - Elements tree

Great! Now it's time to start the tutorial!

Denodo Platform is an important component of any data management infrastructure, but not the only one. When measuring performance, it is important to make sure which of the elements are bottlenecks. For example, a data source might be returning data in a slow fashion; in some cases you will be able to increase the performance by adding a new index to that source. If these actions cannot be performed, you can configure an intelligent caching system in Denodo to speed up your queries.

Let's see some use cases where you can use materialization:

  1. Some data sources might be slow and you want to speed up the queries hitting those data sources.
  2. You want to avoid high workloads in the data sources.
  3. Pre-compute and store in the materialization system costly transformations, so they do not need to be recomputed in real-time.
  4. You want to delegate (or speed up) some queries with data coming from several different data sources.
  5. Protect the system against potential unavailability of data sources (especially when they are external sources).

In this tutorial you will see how to:

  • Activate the Cache Module in Denodo Platform
  • Enable Materialization in Base and Integration Views
  • View Some Examples of Different Materialization Types
  • Scheduling refreshes of the materialized data

Denodo Platform includes a module to store local copies of the data as required in an external data source.

For using the cache engine in Denodo, you have to enable it at server-level or at database-level. In this tutorial we will configure the cache at server level. Let's see how to configure the Denodo server to use cache:

  1. Go to the Design Studio and, in the Menu Bar, go to Administration > Server configuration

Design Studio - Administration menu

  1. Unfold Cache and click on General. To enable the cache, set the Cache status to On.

Design Studio - Cache Engineconfiguration

  1. Click on Connection to select the Cache engine you are going to use. The default Embedded Derby server is the Database adapter that will be used in this tutorial as it is already included with Denodo Platform.

Design Studio - Cache Engine configuration - Derby

  1. Click Save.

In the next section, you will learn how to configure what views in Denodo Platform have to be materialized.

In the previous section, you activated the cache module in your Denodo server. Now, you need to configure your views to make use of the materialization functionality.

For example, let's materialize the view iv_client_with_bills. We have two main reasons for selecting this view:

  • It queries two different data sources (a relational database and a Web service).
  • Usually Web service response times are worse than traditional database response times.

How to activate materialization for a view

You can follow these steps:

  1. Open Design Studio and, in the Elements Tree, double-click on the view iv_client_with_bills. The view schema will show in the Workspace.
  2. Click on the Options tab at the top.
  3. Under the Materialization tab, select Query results, for example. This option will store the results of the queries over this view in the database configured in the Cache module.

Design Studio - Materialization options

  1. Leave the rest of the fields with the default values.
  2. Finally, click on Save icon

Materialization in Action

Now, let's test our view by executing it. Follow these steps:

  1. Open Design Studio and, in the Elements Tree, double-click on the view name iv_client_with_bills.
  2. Click on the Summary tab at the top.
  3. Execute the query by clicking on the Query button. Ensure you have selected the option Retrieve all rows and then click on Execute

Design Studio - Execute view

  1. After the execution, click on the Execution Trace link (above the results) to see the query execution plan. You can see that the data comes directly from the different data sources.

Design Studio - Execution trace - Data sources plan

  1. Open the Execute tab and perform the same query again.
  2. Click on the Execution Trace link again.

Design Studio - Execution trace - Cache plan

Now you can see that the data comes from the cache database! (displayed with a box icon), and the execution time of the query is faster.

In the next section, you will learn more about the available materialization types in Denodo Platform. In the above example, you configured the view to use Query Results mode, in the next section we will learn the behavior of the different types.

In the previous section, you activated the materialization in one of the sample views. Now, it's time to learn more about how materialization works in Denodo Platform. It includes the following materialization types:

  • Query Results:
    The first time a query over the view is executed, the table in the Cache database will be populated with the tuples in the output from the data sources. At runtime, when a user queries the view, the Denodo server checks if the Cache database contains the data required to answer the query. If it does not have this data, Denodo Platform will query the data source and populate the tables in the Cache database with that output.

    When the Time To Live (TTL) of the data has passed, the system will invalidate the data of the view stored in the database so the next query will always hit the data source.

    Design Studio - Materialization types

    This mode supports the following option:

    Match exact queries only: if this option is selected, Denodo materializes the result of each query. Then, if the same exact query is executed, and the entries of this query in the cache database have not expired (TTL), the data returned to the client is retrieved from the cache database.
    Design Studio - Materialization types - Query Results - Match queries option

Explicit: Data loads with CONTEXT clause: if this option is selected, the cache has to be loaded explicitly (we will see an example in the next section).

Design Studio - Materialization types - Query Results - Data load option

  • Full:
    In this mode, the data of the view is always retrieved from the materialized data instead of from the data source. For materializing the data of the view, this mode requires explicit loads using a special context in the query (we will see an example in the next section).

    The main benefit of this type of materialization is that complex operations (joins, unions, group by...) involving several views (even from different data sources) can be completely delegated to the cache database. Therefore, the performance of these operations is significantly improved.

    Design Studio - Materialization types - Full

Examples

Let's see how the materialization works using the VQL Shell. This tool allows you to execute queries to Denodo Virtual DataPort. Just go to the to menu and select the Tools > VQL Shell option.

Design Studio - Tools menu

As an example, let's use the following query in the VQL Shell (make sure you have the tutorial database selected from the drop-down Database menu and selected the Retrieve all rows checkbox):

SELECT * FROM iv_client_with_bills WHERE client_id = 'C077'

Design Studio - VQL shell - Execute query

Run this query to check that it is working correctly (if you get an error, check the connection URLs with the data sources in case they have to be modified based on your environment).

Assuming that it works fine, execute the same query and check the Execution Trace changing the materialization type of the iv_client_with_bills view:

  • With type Virtual
    Select this option and execute the query SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' several times. The data sources are always queried.
  • With type Query Results
    Select this option with the default values. The data source will be queried the first time and subsequent queries will retrieve the data from the cache, if it is available.

    To demonstrate, execute these queries:

    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' will query the data source and materialize the results.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C066' will query the data source and materialize the results.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02' will use materialized data.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C066' will use materialized data.
  • With type Query Results and Explicit data loads
    Select the explicit data loads option and click on the Delete stored results. Save the changes.
    Design Studio - Materialization types - Query Results - Enable Explicit data loads

    The Denodo server will query the data sources until we tell him to materialize the results by adding the CONTEXT('cache_preload' = 'true') to the query.

    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' will query the data sources.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' CONTEXT('cache_preload' = 'true') will query the data sources and materialize the results.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' will now use materialized data.
  • With type Query Results and Match exact queries only (unselecting Explicit Data loads option)
    Select the automatic data load option and click on the Match exact queries only option. Click on the Delete stored results and save the changes.
    Design Studio - Materialization types - Query Results - Enable automatic data load

    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' will query the data sources and materialize the results.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' will use materialized data.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02' will query the data source and materialize the results of this query.
    SELECT * FROM iv_client_with_bills WHERE CLIENT_ID = 'C077' AND CODE = '02' will now use materialized data.
  • With type Full
    All of the view's underlying data will be materialized into the cache database, and all queries will be delegated to this cache database.

Finally, the action of materializing data of a view can be scheduled. Go to the next step of this tutorial to see how.

Let's suppose that in this company, billing information is updated each day, so we want to automate this process by creating a scheduled job to refresh the rows materialized in the cache database. For this example, we are going to work with the same view used in the previous section, iv_client_with_bills.

  1. Let's start the Denodo Scheduler server and its Web Administration Tool (available by default at http://127.0.0.1:9090/webadmin/denodo-scheduler-admin/). If you are using a local installation you can do it from the Denodo Control Center:

Denodo Control Center

  1. If you are launching Denodo Platform from a Docker container using the Denodo Community Lab Environment project, for starting Denodo Scheduler you can execute this command (the Web Administration Tool will be available at http://127.0.0.1:39090/webadmin/denodo-scheduler-admin/ ):
~$ docker compose --profile denodo --profile denodo-sched --profile ds up 
  1. Use admin / admin for the credentials

Denodo Scheduler - Login page

  1. Create a new Simple Cache Management job by clicking on Add job > Simple Cache Management

Denodo Scheduler - Add job

  1. Complete Job Name field as billing_daily_cache_refresh

Denodo Scheduler - Simple cache job - details

  1. On the Configuration tab choose tutorial.iv_client_with_bills

Denodo Scheduler - Simple cache job - configuration

  1. Finally, we want to program an execution of the job each day. On the Triggers tab, click on Add trigger (leave by default all the fields, as it is already a daily update) and click on Save.

Denodo Scheduler - Simple cache job - triggers

The job we just created should be added to the list. We can test it by selecting the job and clicking on Start > Start.

Denodo Scheduler - Simple cache job - start job

Wait some seconds until the job is finished:

Denodo Scheduler - Simple cache job - completed job

Scheduling from the Design Studio

It is possible to create the Scheduler job directly from the Design Studio of a Denodo Solution Manager installation. You will see in the Materialization pptions of the view a configuration option called Materialization refresh jobs:

Design Studio - Add refresh job

By clicking on the on Add refresh job in Scheduler > Simple Cache Management you will be redirected to the Scheduler with everything pre-configured to create the job.

Denodo Scheduler - Simple cache job - configuration

Congratulations! You have completed the Materialization of Views Tutorial.