Introduction
This document describes the steps to follow in order to analyze the workflow of a Virtual DataPort (VDP) Server that helps to decide if there is enough room for capacity growth or the need to change the configurations or upgrade the hardware to match the workload.
Prerequisites
To be able to review the load of your Denodo system:
- Configure the Denodo Monitor and gather the Denodo Monitor log files for the Virtual DataPort servers of your deployment.A
- A full week of monitor logs is recommended. A higher number of days is better.
- The days analyzed should be representative of the normal workload of the system.
- Understand the individual monitors and corresponding log files.
- Understand the use of the Diagnostic & Monitoring Tool (DMT).
- Understand the configuration files (<DENODO_HOME>/conf/vdp/VDBConfiguration.properties) from the Virtual DataPort servers considered for this analysis.
- The VDP log files located in the <DENODO_HOME>/logs/vdp directory from the same time frame as of the Denodo Monitor logs will also be helpful.
Diagnostics in the Diagnostic and Monitoring Tool
To start with the analysis, we will create a diagnostic in the DMT:
- Place the collected Denodo Monitor log files in a location from which it could be selected for Diagnostics creation from the Diagnostic and Monitoring Tool.
- If you have the vdp.log files as well, place them in the same folder as the Denodo Monitor log files.
- Create a diagnostic by providing the log file path in the Create Diagnostics dialog in the Diagnostic and Monitoring Tool.
The monitor log file details are displayed in different tabs: Resources, State, Sessions, Requests, Cache, Data Sources and Threads. Also, vdp.log files details will be presented under the Errors tab.
Querying the monitor logs
When creating the diagnostic in the Diagnostic and Monitoring Tool a set of views corresponding to the respective log files will be created under a virtual database named ‘diagnostic’ and those views will be cached for quick consumption.
You can query these views directly to analyze the workload using regular VQL queries. For example, the following VQL would give you the maximum CPU usage for a particular diagnostic.
select round(max(cpu_percentage),2) max_cpu from diagnostic.resources_log_######## context('i18n'='rfc_restful'); |
Understanding the workload
The available data can be interpreted in terms of the following categories:
- Performance:
- CPU and Memory.
- Concurrent active requests.
- Query distribution.
- View usage.
- Operational metrics.
Performance
CPU and Memory
The graphs show the peaks with key metrics like usage, garbage collection usage with respect to CPU and used heap, committed heap, max heap with respect to Memory.
Apart from Creating Diagnostic Intervals to select a particular time frame, the graphs can be zoomed in as well to a time frame for analysis:
- Try to find a pattern for frequent spikes.
- Compare the usage or spikes between the CPU and Memory graphs.
You could also query the diagnostic.resources_log_######## view to get the maximum, minimum, average usages, etc. For instance, you could use the following queries:
- CPU
select round(max(cpu_percentage),2) max_cpu, round(avg(cpu_percentage),2) avg_cpu, round(min(cpu_percentage),2) min_cpu from diagnostic.resources_log_######## context('i18n'='rfc_restful'); |
- Memory
select round(max(cast(heap_memory_usage as float))/1073741824,2) max_memory, round(avg(cast(heap_memory_usage as float))/1073741824,2) avg_memory, round(min(cast(heap_memory_usage as float))/1073741824,2) min_memory from diagnostic.resources_log_######## context('i18n'='rfc_restful'); |
Concurrent active requests
In the ‘State’ tab select ‘Active Requests’ to visualize the number of queries executed at a time frame i.e. the number of concurrent requests.
Here based on the spikes you could arrive at a pattern of when the number of concurrent requests spike in a day. If you see a repeated occurrence of such spikes then you could narrow down the queries that are executed at that time in the ‘Requests’ tab and then spread the schedule of those requests to avoid higher workload at that time.
Query Distribution
The query distribution metrics can be calculated using the columns such as duration (ms), elements, access interface, user agent and user.
You could make use of the filters in the ‘Requests’ tab in the Diagnostic & Monitoring Tool to find out a specific query distribution pattern. For example, to find out the number of queries executed on a particular database or view, you could apply filters on the ‘VDP Database’ and ‘Elements’ columns or by executing a group by query on the diagnostic virtual database to get the number of queries executed on each of those columns.
- Counts of queries against each database and elements.
select database_name,elements, count(*) from diagnostic.requests_log_######## group by database_name,elements order by count(*) desc; |
- Counts of queries against the access interface and user agents.
select access_interface, user_agent, count(*) from diagnostic.requests_log_######## group by access_interface, user_agent order by count(*) desc; |
- Distribution across bands of duration.
SELECT 'P01' AS node, CASE WHEN duration < 1000 THEN '< 1s' WHEN duration BETWEEN 1000 and 10000 THEN '1s to 10s' WHEN duration BETWEEN 10000 and 120000 THEN '11s to 2m' WHEN duration > 120000 THEN '> 2m' ELSE 'not executed' END AS duration, CASE WHEN INSTR(REPLACE(LOWER(query),' ',''),'''cache_preload''=''true''') > 0 THEN 'Cache Load' ELSE 'Business Query' END as query_type, COUNT(*) AS query_count, avg(num_rows) AS mean_rowcount, avg(num_rows/(duration/1000.0)) AS mean_rps, median(num_rows/(duration/1000.0)) AS median_rps, max(num_rows/(duration/1000.0)) AS max_rps FROM requests_log_######## WHERE request_type IN ('SELECT VIEW','SELECT BASE VIEW') GROUP BY 1,2,3; |
View Usage
This metric helps in identifying the usage or work load based on the views. Specifically, the throughput of the rows processed per view which helps us decide the views that need to be reviewed in terms of performance. The below query can be used to get this information.
- Views vs data throughput.
select elements,count(*),min(num_rows/(duration/1000.0)), avg(num_rows/(duration/1000.0)), max(num_rows/(duration/1000.0)), group_concat(distinct coalesce(user_agent,'[null]')) user_agents, group_concat(distinct access_interface) access_interfaces from requests_log_######## where state in ('OK','INCOMPATIBLE_QUERY_VIEW','ROW_LIMIT_REACHED','QUERY_TIMEOUT','STOPPED') and request_type in ('SELECT VIEW','SELECT BASE VIEW','INSERT','UPDATE','DELETE') and instr(replace(lower(query),' ',''),'''cache_preload''=''true''') < 0 group by 1 order by 2 desc; |
Operational Metrics
- Number of queries based on execution status.
select state, count(*) from diagnostic.requests_log_######## where request_type<>'DESC' group by state order by count(*) desc; |
- Top long running queries.
select query, duration/60000 duration_in_mins from diagnostic.requests_log_######## order by duration desc limit 20; |
- Understand the throughput of the data processed in certain bands of duration. For example, the following query provides the data such as Mean, Median and Maximum Rows per Second for duration bands such as < 1 second, between 1 second to 10 seconds and so on.
SELECT 'P01' AS node, CASE WHEN duration < 1000 THEN '< 1s' WHEN duration BETWEEN 1000 and 10000 THEN '1s to 10s' WHEN duration BETWEEN 10000 and 120000 THEN '11s to 2m' WHEN duration > 120000 THEN '> 2m' ELSE 'not executed' END AS duration, CASE WHEN INSTR(REPLACE(LOWER(query),' ',''),'''cache_preload''=''true''') > 0 THEN 'Cache Load' ELSE 'Business Query' END as query_type, COUNT(*) AS query_count, avg(num_rows) AS mean_rowcount, avg(num_rows/(duration/1000.0)) AS mean_rps, median(num_rows/(duration/1000.0)) AS median_rps, max(num_rows/(duration/1000.0)) AS max_rps FROM requests_log_######## WHERE request_type IN ('SELECT VIEW','SELECT BASE VIEW') GROUP BY 1,2,3 |
Conclusions
With the use of the Denodo Monitor logs and the Denodo Diagnostic & Monitoring tool it is possible to review the work load and performance of your Denodo deployments.
The Denodo Diagnostic & Monitoring tool provides a set of graphs to review different metrics over a period of time.
At the same time, when a diagnostic is created in the DMT, a diagnostic virtual database is created with a set of views that can be queried to obtain data about those metrics like CPU and memory usage but also information about the queries in the system (volume, distribution, throughput, performance, long running queries…).
This document provides some examples of useful queries but others can be used to take advantage of the information available in the Denodo Monitor logs.
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.