Window Functions¶
Window functions (also known as analytic functions) are functions whose result for a given row is derived from the window frame of that row. These functions are similar to aggregation functions because they both calculate aggregate values based on a group of rows. The difference is that aggregation functions return one value for each group of rows and window functions can return several rows per group. For the list of functions and their descriptions, see Window Functions
Important
Virtual DataPort tries to push down these functions to a data source to delegate them. When that is not possible, Denodo can execute the window functions in an external system by transferring the necessary data to a parallel processing system or to the cache. If these options are not enabled, Denodo will abort the execution of those queries.
Execution of Window Functions¶
If it is not possible to push a window function to the query data source the query optimizer can leverage the execution capabilities of the Embedded MPP, the cache or an external system applying any of the following optimizations:
Embedded MPP Acceleration. Data is streamed to the Denodo Embedded MPP cluster.
Parallel Processing Using an External MPP. Data is transferred to an external MPP. This option is only recommended if the latency between the external MPP and the Virtual DataPort servers is low.
Data Movement. Data is transferred to one of the query data sources or the cache database configured for the session.
The following sections describe the two ways the data can be transferred to process the window function:
Automatically: It is possible to configure the query optimizer to automatically move the necessary data to the system selected in Queries Optimization.
Manually: In this case the user specifies the desired data movement.
Automatic Data Movement to Delegate Window Functions¶
The optimizer can automatically use one of the optimizations listed above to transfer to a data source all data required to execute a window function. In order to enable this feature:
If you have an MPP available, embedded or external, make sure it is available for the optimizer. See Queries Optimization.
For those JDBC data sources the optimizer could use to execute window functions, make sure the data movement is allowed in the Read&Write section of the data source and they have bulk load configured if available.
The criteria for choosing an optimization can vary whether the query is executed with Cost-Based Optimization or not. When Cost-Based Optimization is not used, only optimizations involving an MPP can be chosen. When Cost-Based Optimization is used, any of the optimizations listed above can be chosen and the estimated costs will help in determining which sets of data will be transferred to the data source to minimize the impact in performance.
If Data Movement is chosen, the target data source for the data is always one of the underlying data sources of the query and must support automatic data movements.
The cache of the database used by the query’s session can also be used in the same fashion as an external MPP only for executing window functions.
Warning
Window functions are resource-consuming operations. We only recommend enabling this option if the cache database has a parallel processing engine.
Manual Data Movement to Delegate Window Functions¶
If no MPP is available and conditions to use Data Movement are not met, the window function will still be not executable.
In these scenarios, a manual Data Movement optimization can be used to transfer the data to a data source that
supports window functions with the parameter DATAMOVEMENTPLAN
of the CONTEXT
clause.
SELECT region.r_name, nation.n_name, sum(order.o_totalprice), rank() OVER (PARTITION BY region.r_name ORDER BY sum(order.o_totalprice) DESC)
FROM order INNER JOIN customer ON order.o_custkey = customer.c_custkey
INNER JOIN nation ON customer.c_nationkey = nation.n_nationkey
INNER JOIN region ON nation.n_regionkey = region.r_regionkey
WHERE region.r_regionkey > 0
GROUP BY region.r_name, nation.n_name
CONTEXT(DATAMOVEMENTPLAN =
order : JDBC admin.vdpcachedatasource
customer : JDBC admin.vdpcachedatasource
nation : JDBC admin.vdpcachedatasource
region : JDBC admin.vdpcachedatasource
);
In this example, the clause DATAMOVEMENTPLAN
instructs the execution engine to move the data of the views involved
in the query to the cache database (admin.vdpcachedatasource
) and execute there the window function. The optimizer
will try to delegate the WHERE conditions and other functions to the source databases to reduce the data that has to be
transferred between the sources and the cache database. In this example, the SQL query executed in the source to obtain
the data of region
will include the condition r_regionkey > 0
if the data source supports this condition.
If you cannot add the CONTEXT
clause to the query because you are executing it from a third-party tool (e.g. a
business intelligence tool), create a view with a data movement (see Forcing the Data Movement of a View)
and then query the new view.