Summaries¶
With Virtual DataPort, a user can store the result of a query on a table of an external database with the purpose of providing smart query acceleration when executing other queries. This feature is called summaries.
When building the execution plan of a query the optimizer can identify parts that match with the query used to create the summary. If possible, these matches will be substituted with accesses to the summary. With a well defined summary, the performance of the query to execute will be greatly boosted.
CREATE SUMMARY VIEW Command¶
The syntax of the command CREATE SUMMARY VIEW
is the following:
Syntax
CREATE [ OR REPLACE ] SUMMARY VIEW <summary_view_name:identifier>
[ ID = <literal> ]
INTO <data_source_name:identifier>
[ CATALOG = <data_source_catalog:literal> ]
[ SCHEMA = <data_source_schema:literal> ]
[ RELATIONNAME = <table_name>:literal]
[ IF RELATION EXISTS { REPLACE | ERROR }]
[ DATA_LOAD_IMMEDIATE = { TRUE | FALSE }]
[ FOLDER = <literal> ]
[ DESCRIPTION = <literal> ]
[ <primary key> ]
[ QUERY REWRITE ENABLED = { TRUE | FALSE }]
[ CREATE_TABLE_TEMPLATE ( <template_definition:literal>
[ DEFAULT( <parameter definition> [, <parameter definition> ]* ) ] ) ]
[ OPTIONS ( <option information> [ , <option information> ]* ) ]
AS
[ WITH <common table expressions> ]
<select_query>
[ <order by> ]
[ OFFSET <number> [ ROW | ROWS ] ]
[ {
FETCH { FIRST | NEXT } [ <number> ] { ROW | ROWS } ONLY
| LIMIT [ <number> ]
}
]
[ CONTEXT ( <context information> [ , <context information> ]* ) ]
[ TRACE ]
<primary key> ::= (see HELP CREATE TABLE)
<option information> ::= (see HELP CREATE REMOTE TABLE)
<common table expressions> ::= (see HELP SELECT)
<select_query> ::= (see HELP SELECT)
<order by> ::= (see HELP SELECT)
<parameter definition> ::= '<parameter_name:literal>' = '<parameter_value:literal>'
<context information> ::= (see HELP SELECT)
OR REPLACE
: if present and the summary exists, it will be recreated. If not present and the summary exists, then the command will fail.data_source_name
: JDBC data source. The summary will be stored in this data source.data_source_catalog
(optional): name of the catalog of the data source where the summary will be stored.data_source_schema
(optional): name of the schema of the data source where the summary will be stored.
Note
Even though data_source_catalog
and data_source_schema
are both optional, at least one of them must be used, otherwise the command will fail.
table_name
(optional): name of the table in the data source for inserting the summary contents. If not present, the name of the summary will be used.IF RELATION EXISTS { REPLACE | ERROR }
(optional, defaults toERROR
): marks the behavior if a table namedtable_name
already exists in the data source. WithERROR
the command will fail and withREPLACE
the table will be recreated.DATA_LOAD_IMMEDIATE
(optional, defaults toTRUE
): whenTRUE
the summary contents will be stored on the data source as part of the summary creation process. WhenFALSE
, the summary will be created without storing its contents and it will not be available for the optimizer to partial substitutions in queries. To load and store its contents execute theREFRESH
command. See Refreshing Views.FOLDER
(optional): folder of the database in Virtual DataPort where this summary will be stored.DESCRIPTION
(optional): description of the summary in Virtual DataPort.primary key
(optional): fields that make the primary key of the summary. See Creating a Base View for details on how to specify them.QUERY REWRITE ENABLED
(optional, defaults toTRUE
): whenTRUE
the summary will be part of the optimizer’s list of candidates for partial substitutions in queries. Whether part of a query can be substituted by a summary is determined, among other factors, by the query specified inselect_query
.CREATE_TABLE_TEMPLATE
(optional):CREATE TABLE
command used by Denodo to create the summary table on the data source. See section Table Creation Templates for more information.template_definition
: SQL template command used by Denodo to create the summary table on the data source. It contains parameters using the syntax@{parameter_name}
.DEFAULT
(optional): properties used to set the value of the template user defined parameters. If you export the summary view using the optionexport with properties
, then Denodo will export the value of these properties in the environment properties file. More info Export to a File with Properties.Example:
CREATE SUMMARY VIEW <summary_name> INTO <data_source_name> ... CREATE_TABLE_TEMPLATE ( 'CREATE TABLE @{internal_parameter_table_name} (@{internal_parameter_columns} @{internal_parameter_restrictions}) IN ACCELERATOR @{accelerator}' DEFAULT('accelerator' = 'accel1') ) AS <select_query>
OPTIONS
: optional parameters to configure the insertion. Some data sources do not support some parameters. The available insert options are:batch_insert_size
: the number of tuples of each batch insert.location
: the location where Virtual DataPort will upload the data. It depends on the data source:Hadoop data sources (Hive, Impala, PrestoDB, Trino, Spark): the HDFS URI.
Redshift: the S3 bucket name.
Athena: the S3 location.
parquet_row_group_size
: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses Parquet files to insert data: Athena, Hive, Impala, PrestoDB, Trino and Spark.
Note
Use the following command to set the default value of the parquet_row_group_size
:
SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.parquet.rowGroupSize' = '<value>';
You do not need to restart the Virtual DataPort server to apply this change.
select_query
: query used to materialize the summary. This query will be used by the optimizer to determine if it is possible to replace with the summary part of the execution plan of another queries. When creating the summary or every time it is refreshed, this query will be executed and its result will be inserted in the remote table.
Important
When you execute CREATE SUMMARY VIEW
from the VQL Shell, select the check box Retrieve all rows.
Otherwise, the execution engine will only insert the number of rows set by the box Limit displayed rows of the VQL Shell.
Remarks
This command only works for JDBC data sources.
To remove a summary programmatically alongside its remote table, use the stored procedure DROP_REMOTE_TABLE. To keep the remote table on the database, but remove the summary from Virtual DataPort, use the command DROP VIEW instead.
If one of the field names is a reserved word in the data source, Virtual DataPort will rename the field name to ‘field_0’. If this happens with more fields, they will be renamed to ‘field_1’, ‘field_2’, etc.
This command relies on CREATE REMOTE TABLE Command for storing summary contents in the data source, so all its specific data source configuration remarks apply here as well.
Privileges Required
Summaries can only be created, edited and dropped by server administrators.
The user account set in the JDBC data source has to be able to create, replace and drop tables in the underlying database.
Example
CREATE OR REPLACE SUMMARY VIEW my_summary
INTO sqlserver_data_source
CATALOG = 'master'
SCHEMA = 'dbo'
AS SELECT date_dim.d_year, sum(store_sales.ss_net_profit) AS total_profit
FROM store_sales INNER JOIN date_dim ON date_dim.d_date_sk = store_sales.ss_sold_date_sk
GROUP BY date_dim.d_year;
This command will perform these steps:
Create a table called
my_summary
in the underlying database of the JDBC data sourcesqlserver_data_source
.Execute the following query in the VDP Server.
SELECT date_dim.d_year, sum(store_sales.ss_net_profit) AS total_profit FROM store_sales INNER JOIN date_dim ON date_dim.d_date_sk = store_sales.ss_sold_date_sk GROUP BY date_dim.d_year
Insert the result of the previous query into the table
my_summary
.Create in Virtual DataPort a JDBC wrapper named
my_summary
.Create in Virtual DataPort a summary view named
my_summary
which uses the wrapper from the previous step for accessing data.