USER MANUALS

Summaries

With Virtual DataPort, a user can store the result of a query on a table of an external database with the purpose of providing smart query acceleration when executing other queries. This feature is called summaries.

When building the execution plan of a query the optimizer can identify parts that match with the query used to create the summary. If possible, these matches will be substituted with accesses to the summary. With a well defined summary, the performance of the query to execute will be greatly boosted.

CREATE SUMMARY VIEW Command

The syntax of the command CREATE SUMMARY VIEW is the following:

Syntax

Syntax of the CREATE SUMMARY VIEW statement
CREATE [ OR REPLACE ] SUMMARY VIEW <summary_view_name:identifier>
    [ ID = <literal> ]
    INTO <data_source_name:identifier>
    [ CATALOG = <data_source_catalog:literal> ]
    [ SCHEMA = <data_source_schema:literal> ]
    [ RELATIONNAME = <table_name>:literal]
    [ IF RELATION EXISTS { REPLACE | ERROR }]
    [ DATA_LOAD_IMMEDIATE = { TRUE | FALSE }]
    [ FOLDER = <literal> ]
    [ DESCRIPTION = <literal> ]
    [ <primary key> ]
    [ QUERY REWRITE ENABLED = { TRUE | FALSE }]
    [ CREATE_TABLE_TEMPLATE ( <template_definition:literal>
                              [ DEFAULT( <parameter definition> [, <parameter definition> ]* ) ] ) ]
    [ OPTIONS ( <option information> [ , <option information> ]* ) ]
    AS
    [ WITH <common table expressions> ]
    <select_query>
    [ <order by> ]
    [ OFFSET <number> [ ROW | ROWS ] ]
    [ {
          FETCH { FIRST | NEXT } [ <number> ] { ROW | ROWS } ONLY
        | LIMIT [ <number> ]
      }
    ]
    [ CONTEXT ( <context information> [ , <context information> ]* ) ]
    [ TRACE ]

<primary key> ::= (see HELP CREATE TABLE)

<option information> ::= (see HELP CREATE REMOTE TABLE)

<common table expressions> ::= (see HELP SELECT)

<select_query> ::= (see HELP SELECT)

<order by> ::= (see HELP SELECT)

<parameter definition> ::= '<parameter_name:literal>' = '<parameter_value:literal>'

<context information> ::= (see HELP SELECT)
  • OR REPLACE: if present and the summary exists, it will be recreated. If not present and the summary exists, then the command will fail.

  • data_source_name: JDBC data source. The summary will be stored in this data source.

  • data_source_catalog (optional): name of the catalog of the data source where the summary will be stored.

  • data_source_schema (optional): name of the schema of the data source where the summary will be stored.

Note

Even though data_source_catalog and data_source_schema are both optional, at least one of them must be used, otherwise the command will fail.

  • table_name (optional): name of the table in the data source for inserting the summary contents. If not present, the name of the summary will be used.

  • IF RELATION EXISTS { REPLACE | ERROR } (optional, defaults to ERROR): marks the behavior if a table named table_name already exists in the data source. With ERROR the command will fail and with REPLACE the table will be recreated.

  • DATA_LOAD_IMMEDIATE (optional, defaults to TRUE): when TRUE the summary contents will be stored on the data source as part of the summary creation process. When FALSE, the summary will be created without storing its contents and it will not be available for the optimizer to partial substitutions in queries. To load and store its contents execute the REFRESH command. See Refreshing Views.

  • FOLDER (optional): folder of the database in Virtual DataPort where this summary will be stored.

  • DESCRIPTION (optional): description of the summary in Virtual DataPort.

  • primary key (optional): fields that make the primary key of the summary. See Creating a Base View for details on how to specify them.

  • QUERY REWRITE ENABLED (optional, defaults to TRUE): when TRUE the summary will be part of the optimizer’s list of candidates for partial substitutions in queries. Whether part of a query can be substituted by a summary is determined, among other factors, by the query specified in select_query.

  • CREATE_TABLE_TEMPLATE (optional): CREATE TABLE command used by Denodo to create the summary table on the data source. See section Table Creation Templates for more information.

    • template_definition: SQL template command used by Denodo to create the summary table on the data source. It contains parameters using the syntax @{parameter_name}.

    • DEFAULT (optional): properties used to set the value of the template user defined parameters. If you export the summary view using the option export with properties, then Denodo will export the value of these properties in the environment properties file. More info Export to a File with Properties.

    • Example:

      CREATE SUMMARY VIEW <summary_name> INTO <data_source_name>
      ...
      CREATE_TABLE_TEMPLATE (
          'CREATE TABLE @{internal_parameter_table_name} (@{internal_parameter_columns} @{internal_parameter_restrictions}) IN ACCELERATOR @{accelerator}'
          DEFAULT('accelerator' = 'accel1')
      )
      AS <select_query>
      
  • OPTIONS: optional parameters to configure the insertion. Some data sources do not support some parameters. The available insert options are:

    • batch_insert_size: the number of tuples of each batch insert.

    • location: the location where Virtual DataPort will upload the data. It depends on the data source:

      • Hadoop data sources (Hive, Impala, PrestoDB, Trino, Spark): the HDFS URI.

      • Redshift: the S3 bucket name.

      • Athena: the S3 location.

    • parquet_row_group_size: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses Parquet files to insert data: Athena, Hive, Impala, PrestoDB, Trino and Spark.

Note

Use the following command to set the default value of the parquet_row_group_size:

SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.parquet.rowGroupSize' = '<value>';

You do not need to restart the Virtual DataPort server to apply this change.

  • select_query: query used to materialize the summary. This query will be used by the optimizer to determine if it is possible to replace with the summary part of the execution plan of another queries. When creating the summary or every time it is refreshed, this query will be executed and its result will be inserted in the remote table.

Important

When you execute CREATE SUMMARY VIEW from the VQL Shell, select the check box Retrieve all rows. Otherwise, the execution engine will only insert the number of rows set by the box Limit displayed rows of the VQL Shell.

Remarks

This command only works for JDBC data sources.

To remove a summary programmatically alongside its remote table, use the stored procedure DROP_REMOTE_TABLE. To keep the remote table on the database, but remove the summary from Virtual DataPort, use the command DROP VIEW instead.

If one of the field names is a reserved word in the data source, Virtual DataPort will rename the field name to ‘field_0’. If this happens with more fields, they will be renamed to ‘field_1’, ‘field_2’, etc.

This command relies on CREATE REMOTE TABLE Command for storing summary contents in the data source, so all its specific data source configuration remarks apply here as well.

Privileges Required

Summaries can only be created, edited and dropped by server administrators.

The user account set in the JDBC data source has to be able to create, replace and drop tables in the underlying database.

Example

CREATE OR REPLACE SUMMARY VIEW my_summary
INTO sqlserver_data_source
CATALOG = 'master'
SCHEMA = 'dbo'
AS SELECT date_dim.d_year, sum(store_sales.ss_net_profit) AS total_profit
FROM store_sales INNER JOIN date_dim ON date_dim.d_date_sk = store_sales.ss_sold_date_sk
GROUP BY date_dim.d_year;

This command will perform these steps:

  1. Create a table called my_summary in the underlying database of the JDBC data source sqlserver_data_source.

  2. Execute the following query in the VDP Server.

    SELECT date_dim.d_year, sum(store_sales.ss_net_profit) AS total_profit
    FROM store_sales INNER JOIN date_dim ON date_dim.d_date_sk = store_sales.ss_sold_date_sk
    GROUP BY date_dim.d_year
    
  3. Insert the result of the previous query into the table my_summary.

  4. Create in Virtual DataPort a JDBC wrapper named my_summary.

  5. Create in Virtual DataPort a summary view named my_summary which uses the wrapper from the previous step for accessing data.

Add feedback