Cost-Based Optimization¶
VQL, like SQL, is a non-procedural language. This means that it does not enforce the database server what is the list of steps to follow to obtain the desired result. Instead, it is the database the one in charge of deciding what is the best plan to execute a particular query.
For this purpose, the query optimizer is a key component of Denodo. This module explores the different but equivalent execution plans that can obtain the result for a certain query, and decides which one would achieve the best performance.
The query optimization process consists of two different phases: The static optimization phase and the cost-based or dynamic optimization phase.
In the static phase the optimizer analyzes the query and performs query rewritings that improve the performance regardless of the specific data we need to manage. These transformations include removing unnecessary JOINs or UNION partitions (branch pruning) or pushing conditions and group bys under Joins and Unions. Then, the cost-based optimizer analyzes the execution plan returned by the static optimization phase and estimates the costs of different possible plans. The alternatives considered include, among other variables: the different join strategies to use for each join (MERGE, HASH, NESTED) or moving partial results from some of the operations to another data source (Data Movement) or to a Massive Parallel Processing system (see section Parallel Processing).
To estimate the cost of an execution plan, it uses:
When the data is obtained from a database, the indexes of the queried tables in the database.
Statistics about the views’ data: number of rows, number of NULL values of each field, etc.
To obtain the statistics of JDBC base views, Virtual DataPort can query the system tables of the database. Alternatively, it can also query the view with a SELECT statement to obtain these statistics
In this section, we describe the cost-based optimization support and provide some tips to use it effectively:
The section Enabling the Cost-Based Optimization describes how to enable the cost-based optimization.
The section Gathering the Statistics of Views describes how to obtain the statistics required by the cost-based optimization process.
The section Tuning the Cost-Based Optimization Process describes the information used to generate cost estimations and provides some tuning tips for the process.
The section Current Limitations of the Cost-Based Optimization Process describes some limitations of this optimization.
For further reading, we recommend visiting the series of articles titled “Best Practices to Maximize Performance” at the Denodo Knowledge Base.