Embedded MPP Extended Execution Trace¶
Queries against the Embedded MPP have a different execution trace that other JDBC queries. It is an extension of the traditional JDBC execution trace, with new nodes and properties.
MPP Route Plan¶
Each access to the MPP is modeled by a new node, the MPP Route Plan. This node is equivalent to the JDBC Route Plan and displays the same basic information about the query:
Information about this node can be found in Execution Trace Information.
MPP Execution Plan¶
The MPP Execution Plan shows information about the execution of the query planned in the Embedded MPP.
This node contains the following properties:
Elapsed time: total time that took the Embedded MPP to process the query since it was received. This time is the sum of the Queued time, Planning time and Execution time.
Queued time: total time the query has been idle waiting to be processed.
Planning time: total time that took the Embedded MPP to plan and distribute the query workload into different execution stages.
Execution time: total time that took the Embedded MPP to execute the query.
Input rows: number of rows of data consumed by the Embedded MPP while processing the query.
Input data size: size of data consumed by the Embedded MPP while processing the query.
Output rows: number of rows of data returned by the Embedded MPP to the Virtual Data Port after processing the query.
Output data size: size of data returned by the Embedded MPP to the Virtual Data Port after processing the query.
Peak user memory registration: maximum amount of memory allocated by the embedded MPP in user space while processing the query.
Peak total memory registration: maximum amount of memory allocated by the embedded MPP while processing the query.
Resource group: resource group applied to the query limiting its resource usage. See the official resource groups documentation for further information.
Stages: number of stages the query was divided into. Each query is divided into multiple stages in order to parallelize operations. Stages are modeled in the trace by MPP Stage Plan nodes.
Total worker nodes: number of nodes of the Embedded MPP cluster involved in the query execution.
MPP Stage Plan¶
Each MPP Execution Plan is divided in multiple MPP Stage Plan nodes. N stages are numerated from 0 to N-1 from top to bottom and from left to right, being the Output stage (the only one that returns the data) the stage with stage id 0, and the stages with ids N-1, N-2… the ones that typically access data sources. The query is executed from bottom to top and from right to left, so each stage processes the data retrieved from its child stages. The node name summarizes the most important operations executed in the stage.
Each node contains the following properties:
Stage ID: number that identifies a stage. This number will always be between 0 and N-1, being N the total number of stages.
Status: state of the processing of the stage. This property can have the following values:
FINISHED: the stage processing completed successfully.
ABORTED: an error occurred and the stage processing was stopped.
Input rows: number of rows of data consumed by the stage. This number is the sum of the output rows of the node children stages and the ones retrieved from data sources.
Input data size: size of data consumed by the stage. This number is the sum of the output data size of the node children stages and the size of the rows retrieved from data sources.
Output rows: number of rows of data returned by the stage.
Output data size: size of data returned by the stage.
Worker nodes: number of nodes of the Embedded MPP cluster involved in the stage execution.
Operator pipelines: chain of operators applied to the data to process it. The operators are applied in order from left to right.
Operation nodes: chain of operations applied to the data to process it. Several operations can be applied.
MPP Stage Plan Operation Nodes¶
Each MPP Stage Plan executes a chain of operations that are applied to the data. This chain is specified in the property Operation nodes:
By placing the cursor over the dotted points, it shows the operation nodes hierarchy:
The operation nodes can be classified into several categories, each one associated with an operation type:
Aggregation nodes: operation that executes an aggregation function over the data received from previous operation nodes.
Filter nodes: operation that removes the rows that do not met the filter condition from the data received from previous operation nodes.
Scan nodes: operation that retrieves rows from a data source and sends it to another operation node.
Join nodes: operation that executes a join operation between the data received from previous operation nodes.
Generic nodes: small simple operations such as data exchange between operation nodes or stages.
Errors in the Extended Execution Trace¶
If an error happens during the execution of a query against the Embedded MPP, it shows the involved stages in red color.
In addition, new properties are shown in MPP Execution Plan that clarify the error:
Error code: numeric code associated to the error.
Error kind: origin of the error.
Error type: error cause.
Exception: Presto internal exception and stack trace.