This document is the fourth part of a series of articles about how to achieve the best performance while working with the Denodo Platform.
These articles will provide best practices on a variety of different topics such as the way you should build your model, the different configuration settings you should use and the caching method that best suits your scenario. This guidance is especially important in analytic use cases which require integrating large volumes of data from different sources.
The first document of this series provides general guidelines on how to design views that are optimal from a performance point of view. The second document contains details regarding the meta-information and configuration settings connected to the Query Optimizer. The third article incorporates recommendations for different aspects of the Cache Module. This fourth document is aimed at Denodo administrators and developers interested in analyzing and improving the performance of a query that is not reaching the expected goals, knowing that the origin of this issue is the query itself regardless of server congestion situations.
Denodo enables business applications to execute real-time queries on a series of distributed and heterogeneous data sources, as if all the data were contained in a large “Virtual” Database. Virtual DataPort acts as a mediator that provides a structured and unified view of the data included in all the data sources of the system.
Slightly simplified, considering a single data source, Denodo's architecture involves executing a query on the data source, data transmission through the network from the source to Denodo, transformations and combinations performed at the Denodo server, data transmission through the network from Denodo to the client and finally, client consumption.
Considering the execution model, when there exists a time-consuming query, the bottleneck can be one of the following:
This document is intended as a guide for identifying and solving bottlenecks. In order to achieve this goal, there are two main sections:
Each subsection will include:
Note: The complete flowcharts can be found in an Appendix at the end of this document.
An execution plan contains the process the execution engine will follow or have followed to answer a query. Execution plans, also called traces, are therefore the main tool for troubleshooting a poor performing query.
In order to get the trace of a query:
Additionally, it is possible to see the query plan of a query before executing it by clicking on the Query plan button. The Query plan allows to know how the Server will execute the query and check if the appropriate optimizations are being applied as well as if all possible operations are being delegated to the data source.
SELECT * FROM actor TRACE
To see the query plan that Denodo will follow without running the query execute the command ‘DESC QUERYPLAN’.
DESC QUERYPLAN SELECT * FROM actor
Using the Denodo Virtual DataPort Administration Tool, by clicking the button 'Execution trace' the graphical representation of the query plan can be seen.
Using the graphical representation it is possible to click on the tree nodes to display their properties.
Swapping, Memory limit reached and Effective time are some of the significant properties when identifying a bottleneck.
Note that each one of the nodes at the bottom, also called leaf nodes, contain information about an access to a data source. Its properties Execution time, Response time, SQL sentence and Connection time are also relevant in the study of bottlenecks.
Detailed information about these properties and how to interpret them in sections ‘Identifying the bottleneck’ and ‘Solving the bottleneck’.
If the performance of a query is not as good as expected, the first step is to analyze if the cause of that long query execution is directly related to that query or if it is a consequence of a general degradation of the server that may not be related to that specific query (for instance if the Denodo server has reached the maximum memory limit available in an environment with high concurrency). This document is focused on analyzing the performance of a query that is not affected by other external factors. The latter cause will be the subject of a future document in these series but take into account that it is possible to review the overall state of the system using the Diagnostic and Monitoring tool (for real time information) or the Denodo Monitor Tool (for historical information).
In order to identify where the bottleneck is the first step is to determine the dominant factor in the total execution overhead: the source, the client or Denodo. For this, the information displayed in the graphical representation of the execution trace of the query is usually the best source of information.
Let’s start the troubleshooting process by taking a look at the client side.
If the client application is reading the results slowly, Denodo will start accumulating the result rows that are waiting to be delivered in a memory buffer. In order to avoid using too much memory, when this happens Denodo may need to pause the query execution until the client application has read more rows.
To identify this situation review the top node of the tree diagram and check if the parameter Memory limit reached has been achieved.
If this happens, it is possible to measure the impact of the client processing by executing the same query adding the option 'cache_return_query_results'='false' in the CONTEXT clause.
select * from test CONTEXT('cache_return_query_results' = 'false')
This will execute the same query but this time Denodo will not send any data to the client. That allows to avoid the possible interference generated by the Denodo to client connection or by the client application itself. Therefore, when the execution time is significantly better after this modification in the query, the dominant factor in the overhead under study is the client processing or the Denodo to client application network.
Under these conditions, it would be necessary to identify if the bottleneck is the client itself or the Denodo to client application network. In order to do so, a third party tool installed in the same host as the current client consumer can be used to create a JDBC connection to the Virtual DataPort server. Finally, run the query from this new client making sure the entire result set is retrieved (these tools usually fetch 1000 rows by default). If the execution time is better than when the client application runs the query fetching the entire result, the bottleneck is caused by the client application. Otherwise, there is a network performance issue between Denodo and the client that is consuming the data.
Note: Be aware that it is possible to check and modify the value of Maximum size in memory of intermediate results to increase the size of the buffer holding the result to avoid pausing the query. However be mindful that excessive memory consumption by processing nodes may lead to other issues. Find more information about this option in the section ‘Review the memory usage configuration’ in this document.
If the client is dismissed as a bottleneck the next step would be paying attention to the processing time at the source layer.
First of all, review the leaf nodes of the execution trace diagram since they contain information about the retrieval of data from the sources.
An important property to check on these nodes is the Response Time, that shows the moment when the data source sent the first row to the Denodo Server. The impact of the time to get the first row can be used as a measure to identify the source as a dominant factor. This value can be calculated by subtracting the value of the Start time attribute from the Response time attribute.
If the execution plan includes several leaves over the same data source, check if the property Connection time is significantly high for some of them. If this happens check if the number of leaf nodes accessing that same data source is over the maximum number of connections allowed in the connection pool configuration. In this case, the dominant factor will be that the number of connections needed is over the number of connections configured for that data source. For more details about this issue check the section ‘Dominant factor: Too many connections to a data source’.
To check the maximum number of active connections allowed open the data source configuration and click the ‘Connection Pool configuration’ link:
Finally, note that if the response time and the connection time are low while the total execution time in the source is high it will most likely not be a source processing problem. Instead, it is usually a symptom while the real cause may be one of the following (at the Denodo layer):
Only after discarding these other possible bottlenecks continue exploring if the dominant factor is the source. In this case check the following possible causes:
If the execution time of the isolated query is significantly better than in the original query, review the model to see if there is something that is preventing the delegation of some operations to that data source. For more details about this see the section ‘Review the model’.
select * from test CONTEXT('cache_return_query_results' = 'false')
If the execution time is similar in both cases and the source is not the cause of the bottleneck, it means the bottleneck may come from the network between the source and Denodo.
Denodo Virtual DataPort uses memory management methods in order to avoid memory overflows during execution time. For instance, it uses memory buffers that can trigger a stop in the retrieval of data from data sources when they are full. Data retrieval is resumed automatically when more memory is available. The Memory limit reached attribute, that can be found in the nodes under the Advanced parameters section of the trace, will show if this limit was reached. If that is the case it shows that there are rows accumulated in a memory buffer waiting to be processed, thus stopping the retrieval of data and reducing the performance of the operation.
For example, if we want to obtain the account data from customers and the information is stored in a SaaS application, but we also need some extra information that is stored in an Oracle database. Since accessing the SaaS application is slow (data needs to go through a WAN), Oracle’s rows arrive first at a much faster pace, and will be accumulated in the memory buffer, waiting to be processed by the upper nodes that also need data from the slower branch coming from the SaaS data source.
The size of the buffers is controlled by the value of Maximum size in memory of intermediate results. Check and increase the value for that property, if necessary. However, be mindful that excessive memory consumption by processing nodes may lead to other issues, find more information about this option in the section ‘Review the memory usage configuration’.
Finally, when sources and clients have been rejected as the cause of an overhead, review the execution trace in intermediate nodes. In these nodes check the Effective time attribute, available since Denodo update 20190903. It shows the time Denodo has spent on executing the operation represented by that node disregarding the time spent waiting for other nodes of the execution tree to complete their work. Therefore, it serves as a sign of a bottleneck in that step. It is recommended to start by checking JOIN, GROUP BY, SEMIJOIN, MINUS, INTERSECT and FLATTEN nodes because these operations are potentially more costly. Afterwards, review the remaining nodes since, even if they correspond to an operation that appears to be less costly like a projection, they can include an ORDER BY or a DISTINCT, that can require more execution time.
If the effective time property is significantly high see the section ’Dominant factor: Slow operation in Denodo’.
Notice that if there exists a costly operation in Denodo you may also notice that the execution time in some of the data sources increases. This is just a consequence of a memory control mechanism that can pause the row retrieval to avoid holding too much data in memory waiting to be processed (to find more information about this mechanism in section 'High execution time on a data source due to memory management policies').
After identifying one of the three main points of Denodo's architecture as a dominant factor in the execution time it is necessary to implement proper solutions depending on the detected scenario. The following subsections detail what are the different alternatives that can improve the performance depending on the dominant factor.
When the execution of a query is being too slow due to Client, the solution will depend on whether the cause of the bottleneck is the network or the client application itself.
Slow network transfer speeds between the Denodo server and the client application can be addressed in different ways:
Also note that if the network latency is a problem and the Denodo server has the SSL connections enabled the Use only SSL communications option must be checked or the parameter ssl=true added in the JDBC connection URL. The reason is that, by default, the driver tries to establish a non-SSL connection with the Server. If SSL is enabled on the Server, the connection fails and immediately, the driver tries to establish an SSL connection. However, with the mentioned configuration,the first connection attempt must be used.
Find more information about how to select the Fetch Size (southbound) / Chunk Size (northbound) in the Optimizing Row Fetching in Denodo document of the Denodo Knowledge Base.
If the query processing bottleneck is the client application, analyze the data processing operations performed by the consumer application in order to optimize them. For example, if you are building a report using Tableau, you can see if there are other ways to obtain the same result that perform better or tune the connection to Denodo (See ‘Tableau Capability Customizations Reference’ for details on this).
When the Source is the cause of a bottleneck, it may be due to a slow source execution or a slow network between the source and Denodo.
After identifying a high response time executing a query due to the source execution, consider the following options:
If the network between the source and Denodo is a bottleneck bear in mind the following considerations:
When the origin of a bottleneck is in the Denodo layer one of these three main scenarios will be the most common:
If the situation is none of the above, we recommend using the Diagnostic and Monitoring tool (for real time information) or the Denodo Monitor Tool (for historical information) to check the state of the system when the query is executed. This tool allows to detect situations such as peaks of memory usage that may be affecting the performance.
Some things to review when Denodo is fetching too many rows from data sources or it is spending too much time reading rows:
To know more about the Fetch Size parameter in Denodo check the article ‘Optimizing Row Fetching in Denodo’ in our Knowledge Base.
When the dominant factor in the execution time is the processing of Denodo operations, the reason is usually one of the following. The operation:
In the uncommon situation that the scenario does not fit any of the previous cases see the section ‘Working on the solution’ to see if any of the alternatives described there can be beneficial for this particular scenario.
If the operation is a JOIN, check if the join strategy used is the most appropriate as using one strategy or another can make the execution time change in orders of magnitude.
Finally, if the swapping property for the join node is set to yes, check also the considerations in the following sub-section.
When the effective time for an operation is high another interesting property to look at is the ‘Swapping’ property.
Denodo has a swapping mechanism which stores in secondary storage the intermediate results of a GROUP BY, JOIN, ORDER BY or DISTINCT operation. To check if that mechanism was triggered look for the Swapping attribute within the Advanced parameters of the trace.
If that property is true for an operation, it means that it was required to hold in memory too much data (more than a limit configured at server of view level) and therefore Denodo decided to offload that data to disk to avoid exceeding the system resources. This keeps the system in good health but in return that operation will be slower as Denodo will need to write and read to/from disk.
There are several ways to avoid this swapping to happen:
If the slow operation is a SEMIJOIN corresponding to a subquery used in a WHERE clause, our recommendation is to avoid the use of subqueries and use the equivalent JOIN instead. This is because the use of subqueries in the WHERE clause in Denodo has some limitations. For more information visit the section ‘Avoid subqueries inside the WHERE clause’ of the document ‘Modeling Big Data and Analytic Use Cases’.
If the subquery exists as part of a row restriction, alternative ways to apply that security policy can be found.
When there are too many connections to a specific data source and that is causing issues (e.g. connection pool exhaustion) is normally due to one of these situations:
Please visit the section ‘Review the model’ to see how to modify the model in order to maximize the operations delegated to a data source.
If after checking the two situations explained above, it is still necessary to increase the number of connections in the connection pool go to the data source configuration and click the ‘Connection Pool configuration’ link:
Although he optimal solution will depend on each particular scenario, the way to solving this bottlenecks will involve one or more of the following points:
The following subsections provide more details about each of these different alternatives.
A good design of the data model can have a big impact on the query performance. One important factor to look at is making sure the model allows maximizing the delegation to the sources. When a query uses two or more views that rely on the same JDBC or ODBC data source (i.e there is no federation), Virtual DataPort will try to delegate the entire execution to the source database as a single query. This strategy may save a significant amount of execution time in complex views. When Virtual DataPort is unable to delegate the entire process of a certain query to a source database, it will indicate a reason in an parameter called No Delegation Cause, available in the execution trace under Advanced parameters.
If the model includes custom functions consider developing custom functions that, besides being executable by the Virtual DataPort server, can be delegated to JDBC data sources. That means that when possible, instead of executing the Java code of the custom function, the Server invokes a function of the database. The Virtual DataPort Developer Guide has more information about this Denodo feature.
If the operation is not pushed to the data source due to a custom policy, review if there are alternative ways to apply the same security restrictions without compromising the performance.
Furthermore, the Data Virtualization engine applies query partitioning techniques (branch pruning) to ignore unnecessary query branches that are removed for better performance. Applying this optimization the number of rows that is necessary to bring from the data source to Denodo can be significantly reduced. For a query where branch pruning can be applied but it is not, review the transformations made by the Denodo query optimizer and look for the possibility of their application by, for example, removing redundant branches of queries (Partitioned Unions) which can prevent access to some data sources. Check if the meta-information and configuration settings that are relevant for the Query Optimizer module are correctly declared (see the ‘Review the configuration of the query optimizer’ section).
Since Denodo update 20200310, when a branch pruning optimization can not be applied an icon will be shown beside the view that could not be delegated and the No simplification cause parameter will show the reason why the simplification was not possible. See an example in the execution trace below:
In addition, if the cost-based optimization is enabled, the execution engine selects the best execution plan based on the estimated cost of executing them. If the query is asking for aggregated data, make sure that Denodo is applying the aggregation push-down optimization and therefore obtaining a partially aggregated result from the source. This optimization will reduce the execution times, in addition to movement of data through the network. If the query is not being rewritten taking into account the aggregation push-down, review if the meta-information and configuration settings that are relevant for the Query Optimizer module are correctly declared (see the ‘Review the configuration of the query optimizer’ section).
Finally, check the modeling recommendations in Denodo for an up to date guide on modeling and its performance implications.
Denodo query optimizer explores the different but equivalent execution plans that can obtain the result for a certain query, and decides which one would achieve the best performance. Among other options, Denodo can perform query rewritings, this is, Denodo analyzes the query and tries to generate an equivalent statement with better performance.
Make sure that the meta-information and configuration settings that are relevant for the Query Optimizer module (e.g. primary keys, indexes, referential constraints, view statistics...) are correctly declared and they are accurate. For more detailed information, see the Best Practices to Maximize Performance II: Configuring the Query Optimizer document in the Knowledge Base.
Take into account that if there is either a “memory limit reached” flag in the query or an operation swapping data to disk a good solution to speed up the query is NOT usually changing the memory limits but one of the other approaches described in this section. Be aware that increasing these values can increase the overall memory usage of the system. Therefore make sure the maximum query size is suitable for the number of concurrent queries that can be running at peak.
That said, there are cases where it makes sense to increase the default value if it works better for a specific environment as the new value avoids unnecessary offloading of data to disk.
For those cases, consider modifying the default values by clicking on Server configuration on the menu Administrator and then click Memory Usage.
In this panel configure the following parameters:
Notice that when the option ‘Limit query memory usage' is enabled, the Execution Engine can automatically decrease the values of the ‘Maximum size in memory of each node’ and the ‘Maximum size in memory of intermediate results’ for a particular query to ensure that the query execution will not surpass the limit set in Maximum query size. Find more detailed information about these parameters and their use in the Limit the Maximum Amount of Memory of a Query section of the Virtual DataPort Administration Guide.
See the section Configuring the Memory Usage and Swapping Policy in the Denodo Virtual DataPort Administration Guide for more details.
When there is a Join/Minus/Intersect/Union operation where one of the views is much larger than the others, Virtual DataPort can transfer the data of the smaller views into the data source of the larger view and execute the operation on this last data source. This may offer great performance improvements.
For example, let’s say we want to obtain the number of different prices for a product:
SELECT prod_name, COUNT(DISTINCT price)
FROM sales JOIN product ON(s_prod_id = pid)
GROUP BY prod_name
Whether the GROUP BY operation is slowing down the response because data for sales and product is in different data sources and it cannot be delegated, it is possible to transfer the data of ‘product’, the smaller view, into the data source of ‘sales’ and this way execute the entire operation in this data source.
In order to allow this optimization, remember that the destination data source has to be enabled as a possible data movement target. In addition, to obtain the best performance, make sure the bulk data load settings are configured if they are available for the data source.
It is possible to force the data movement manually or let the cost optimizer decide if that is the best plan to follow. In this case, check that the cost-based optimization is enabled as well as the meta-information and configuration settings that are relevant for the Query Optimizer module (e.g. primary keys, indexes, referential constraints, view statistics...) are correctly declared.
For more information about Data Movement check the section ‘Data Movement’ of the Denodo Virtual DataPort Administration Guide and the section 'Data movement configuration' in the Knowledge base article Best Practices to Maximize Performance II: Configuring the Query Optimizer.
With SQL-on Hadoop systems like Impala, Presto or Spark, it is possible to import them in Denodo as any other data source and allow the query optimizer to create temporary tables to accelerate some queries when Virtual DataPort server detects that it would have to perform costly operations on large amounts of data. See the ‘MPP acceleration’ section of the Best Practices to Maximize Performance II: Configuring the Query Optimizer document for more information about configuring the query optimizer in order to use MPP acceleration.
Denodo provides a Cache Module that allows to, among other purposes, enhance performance, protect data sources from costly queries and reuse complex data combinations and transformations. Detailed information regarding caching can be found in the Best Practices to Maximize Performance III: Caching document.
Since version 8, Denodo allows the creation of a new element called Summary, to store common intermediate results that the query optimizer can then use as a starting point to accelerate analytical queries.
Summaries have some advantages over the traditional cache:
The creation, insertion and refreshing process is very similar to the one used for remote tables but unlike these ones, they provide full data lineage to the views used in its definition, in the same way as a standard derived view. This allows, not only keeping the data lineage information, but also invalidating the data automatically if the definition of an underlying view changes. For more information visit the section ‘Summaries’ of the VDP Administration Guide.
Step 1) Identify the client as the dominant factor
Step 2) Identify the source as the dominant factor
Step 3) Identify Denodo as the dominant factor