You can translate the document:

Lookout

Expert trails guide Denodo users through all the relevant materials related to a specific topic, including official docs, KB articles, training, Professional Services offerings, and more. The main goal is to give users a single place with references to all the information that they need to become a Denodo expert on any specific topic.

“What about performance?” is usually the first question that is raised when introducing someone to data virtualization.

This Expert Trail shows a curated selection of the different resources available to become a master in Denodo Query Performance Optimization.

The Hike

Stage 1: Modeling

Let's start this path by checking the Best Practices and general guidelines on how to design optimal views from a performance point of view.

In this particular case, the focus is on a Big Data/Analytic use case.

This guidance is essential in analytic use cases that require integrating large volumes of data from different sources:

Best Practices to Maximize Performance I: Modeling Big Data and Analytic Use Cases

NOTE that this type of scenario was selected as it is where Performance Optimization techniques can shine and make a big difference. Other scenarios i.e. operational do not require smart optimization techniques as the volumes of data involved are small and performance will be good.

The Expert Trail: Modelling and Metadata Organization explains how to do the modeling and how to organize the metadata.

Stage 2: Leveraging the Optimizer

Let’s focus now on the modeling best practices to handle Big Data Analytic use cases, dealing with horizontal and vertical partitioned federated data and other considerations.

Denodo Query Optimizer Engine analyzes the metadata and automatically determines the best execution plan to run a query in the most performant way. For this reason, it’s essential to make sure that all the information that can be relevant to the optimizer analysis is available and it is accurate. This information includes PKs, Indexes, Referential constraints, statistics, etc.

 

Let's take a deep dive into the additional meta-information and configuration settings that are relevant to the Denodo Query Optimizer module.

Best Practices to Maximize Performance II: Configuring the Query Optimizer

Stage 3: Caching

With a better understanding of what Denodo can do to manage real-time queries even in analytical federated scenarios, it is time now to take a look at the Denodo Cache. Denodo allows configuring a Cache Engine to store local copies of the data retrieved from the data sources.

Caching in Denodo can be used for several purposes, such as enhancing performance. There might be situations where due to the client SLAs, caching is required. This often raises a question that is developed in Comparing caching in Denodo with other forms of replication like ETL 

Other uses of the cache would be to protect data sources from costly queries, and/or reusing complex data combinations and transformations.

Denodo also contains smart query acceleration techniques based on pre-stored data using an element called Summary.

Information about the Cache can be found in the Expert Trail: Cache

Let´s review the recommendations for different aspects of the Cache Module such as, how to choose the cache database, how to decide what views to cache, or what is the best cache mode and refresh strategy for each particular use case:

Best Practices to Maximize Performance III: Caching.

Stage 4: Troubleshooting

Finally, the latest step is to learn a method to identify the bottlenecks of a Query in Denodo and the different options and actions to resolve or improve the performance of an existing query.

This is considering that the origin of this issue is the query itself regardless of server congestion situations.

Best Practices to Maximize Performance IV: Detecting Bottlenecks in a Query

The Expert Trail: Monitoring explains the details of how to monitor the performance of a query or server.

Exploration

Fill up your backpack with additional gear:

Performance overview

Webinars

Additional Resources

Cache

Official Documentation

KB Articles

Additional Resources

Smart Query Acceleration (Summaries)

Official Documentation

KB Articles

Webinars

Additional Resources

Join Types

NOTE: The Cost Optimizer will automatically select the best JOIN Type to run a query when statistics are available. Nevertheless, it is useful to know how they work for query performance troubleshooting or understanding.

Official Documentation

Denodo Optimizer Techniques

Official Documentation

KB Articles

Statistics and Cost Optimizer

Official Documentation

KB Articles

Additional Resources

Massive Parallel Processing (MPP):

Official Documentation

KB Articles

Webinars

Denodo Connects

Official Documentation

Denodo Test Drives:

Denodo Test Drive provides a private sandbox environment containing a preconfigured solution that demonstrates how data virtualization brings agility and flexibility to multiple use cases. In under an hour, and using a step-by-step guide you will experience how to quickly take advantage of multiple data sources independent of location and format with zero replication.

Additional Resources

Guided Routes

Denodo Training Courses

Denodo training courses provide expert data virtualization training for data professionals, including administrators, architects, and developers.

If you are interested in Query Performance you should enroll in the following course:

Success Services

Denodo Customer Success Services can help you at the start or any part of your Self-Service Analytics trail. You can find information about the Denodo Success Services offering in:

Success Services

Advisory Sessions

Denodo Customers with active subscriptions have access to request Meet a Technical Advisory sessions.

These are the sessions available related to performance.

Platform

Administration

Performance

Optimization:

Administration

Assist in enabling optimizations capabilities:

- Enable Data Movements and Bulk Load.

- MPP (Massive Parallel Processing).

- Statistics: Gathering and refreshing policies.

Cache:

Standards &

Best Practices

Cache Modes

Overview

Review and showcase how the Denodo Cache works:

- The different cache modes (Partial, Full, Incremental).

- Loading the cache and refreshing.

- Invalidating the cache contents.

- Indexes.

Cache Best

Practices

- Assist you in defining a policy to determine when to use the cache and what type to use, or review your current policy.

- Advice on selecting the best cache strategy for a specific scenario.

- Incremental caching strategies.

Summaries

Overview and

Best Practices

Review and showcase how the Denodo Summaries works.

Performance

Optimization

Performance

Optimization:

Overview

Explanation of the different performance optimization techniques (Optimization features overview).

- static optimization

- cost-based optimization

- cache and smart cache (summaries)

- remote tables

- MPP

- etc.

Performance

Optimization:

Best Practices

In-depth best practices based on features applied to use optimizations effectively. Recommendations to decide what optimization is the most appropriate for each scenario and your use case.

Performance

Optimization:

Detecting

Bottlenecks

Guidance in the techniques to analyze and improve the performance of a query that is not reaching expected goals.

Success Accelerators

In addition to Advisory sessions, Success Services includes Success Accelerators that can help you.

If you are a Denodo customer, you can reach out to your Customer Success Manager for details about any Guided Route that you need.

Big Hike Prep Check

Let’s see if you are ready to start your big trail. Take this 2-question questionnaire to check your readiness for an enjoyable hike.

Read the questions below, think about the solution and check if you got them right by looking at the solution. Have you become an expert?

  1. If the same data is replicated physically in 2 places, is it possible for Denodo to use a unique view, so depending on the query Denodo uses the most convenient data source? How?

Click here to check if you got it right

Yes! This can be achieved through the Alternative Sources functionality. You can configure a base view to indicate that the same data is stored physically in different sources.

At runtime, when this base view is involved in a query, the optimizer will select the source of the data of this base view that maximizes the number of operations that can be pushed down to the underlying source.

  1. What views or types of views should have statistics? How often should those statistics be gathered?

Click here to check if you got it right

So the cost optimizer can be applied, it needs the statistics for the following views involved in a query:

  • All the base views
  • Derived views that have cache enabled
  • Flatten views
  • Summaries

This is explained in the section What views require statistics? of the “Best Practices to Maximize Performance II: Configuring the Query Optimizer” KB article.

Regarding how often to gather statistics, it is a good practice to keep the statistics as much updated as possible. They should be gathered from the underlying data source. Nevertheless, this is not always possible so it is convenient to gather the statistics during periods where the system is not expected to be under heavy load. Also, it is important to understand that statistics changes will affect when there is a significant change in the data they hold.

For example, 50,000 new rows might represent a relevant change or not. If a table changes from 1000 rows size to 51000 rows. That is a significant change. If it changes from 5,000,000 rows to 50,050,000 rows the change is not relevant.

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here