USER MANUALS

Cost-Based Optimization

VQL, like SQL, is a non-procedural language. This means that it does not enforce the database server what is the list of steps to follow to obtain the desired result. Instead, it is the database the one in charge of deciding what is the best plan to execute a particular query.

For this purpose, the query optimizer is a key component of Denodo. This module explores the different but equivalent execution plans that can obtain the result for a certain query, and decides which one would achieve the best performance.

The query optimization process consists of two different phases: The static optimization phase and the cost-based or dynamic optimization phase.

In the static phase the optimizer analyzes the query and performs query rewritings that improve the performance regardless of the specific data we need to manage. These transformations include removing unnecessary JOINs or UNION partitions (branch pruning) or pushing conditions and group bys under Joins and Unions. Then, the cost-based optimizer analyzes the execution plan returned by the static optimization phase and estimates the costs of different possible plans. The alternatives considered include, among other variables: the different join strategies to use for each join (MERGE, HASH, NESTED) or moving partial results from some of the operations to another data source (Data Movement) or to a Massive Parallel Processing system (see section Parallel Processing).

To estimate the cost of an execution plan, it uses:

  • When the data is obtained from a database, the indexes of the queried tables in the database.

  • Statistics about the views’ data: number of rows, number of NULL values of each field, etc.

    To obtain the statistics of JDBC base views, Virtual DataPort can query the system tables of the database. Alternatively, it can also query the view with a SELECT statement to obtain these statistics

In this section, we describe the cost-based optimization support and provide some tips to use it effectively:

When Should You Enable Automatic Cost-Based Optimization?

Cost-Based Optimization (CBO) is highly recommended for all analytical scenarios. CBO is crucial for performance, as it allows the query planner to choose the most efficient execution path. However, CBO relies entirely on accurate information to function correctly. You must ensure your views include up-to-date statistics. Here are the critical consequences of statistics management:

  • Missing Statistics: If Denodo does not have statistics for the views, the Cost-Based Optimizer cannot make informed decisions, so having this option enabled will not take effect.

  • Outdated Statistics: If views have statistics but they are old or stale, the optimizer can make wrong decisions, which can lead to performance degradation.

To prevent this, you should use the Denodo Scheduler to automatically update view statistics on a periodic basis.

Finally, if your Denodo platform handles both analytical and operational workloads, you can enable CBO selectively. To maintain high performance for operational queries, consider enabling the cost-based optimizations for specific virtual databases only.

For further reading, we recommend visiting the series of articles titled “Best Practices to Maximize Performance” at the Denodo Knowledge Base.

Add feedback