Summaries

With Virtual DataPort, a user can store the result of a query on a table of an external database with the purpose of providing smart query acceleration when executing other queries. This feature is called summaries.

When building the execution plan of a query the optimizer can identify parts that match with the query used to create the summary. If possible, these matches will be substituted with accesses to the summary. With a well defined summary, the performance of the query to execute will be greatly boosted.

CREATE SUMMARY VIEW Command

The syntax of the command CREATE SUMMARY VIEW is the following:

Syntax

Syntax of the CREATE SUMMARY VIEW statement
CREATE [ OR REPLACE ] SUMMARY VIEW <summary_view_name:identifier>
    [ ID = <literal> ]
    INTO <data_source_name:identifier>
    [ CATALOG = <data_source_catalog:literal> ]
    [ SCHEMA = <data_source_schema:literal> ]
    [ RELATIONNAME = <table_name>:literal]
    [ IF RELATION EXISTS { REPLACE | ERROR }]
    [ FOLDER = <literal> ]
    [ DESCRIPTION = <literal> ]
    [ <primary key> ]
    [ QUERY REWRITE ENABLED = { TRUE | FALSE }]
    [ OPTIONS ( <option information> [ , <option information> ]* ) ]
    AS
    [ WITH <common table expressions> ]
    <select_query>
    [ <order by> ]
    [ OFFSET <number> [ ROW | ROWS ] ]
    [ {
          FETCH { FIRST | NEXT } [ <number> ] { ROW | ROWS } ONLY
        | LIMIT [ <number> ]
      }
    ]
    [ CONTEXT ( <context information> [ , <context information> ]* ) ]
    [ TRACE ]

<primary key> ::= (see HELP CREATE TABLE)

<option information> ::= (see HELP CREATE REMOTE TABLE)

<common table expressions> ::= (see HELP SELECT)

<select_query> ::= (see HELP SELECT)

<order by> ::= (see HELP SELECT)

<context information> ::= (see HELP SELECT)
  • OR REPLACE: if present and the summary exists, it will be recreated. If not present and the summary exists, then the command will fail.

  • data_source_name: JDBC data source. The summary will be stored in this data source.

  • data_source_catalog (optional): name of the catalog of the data source where the summary will be stored.

  • data_source_schema (optional): name of the schema of the data source where the summary will be stored.

Note

Even though data_source_catalog and data_source_catalog are both optional, at least one of them must be used, otherwise the command will fail.

  • table_name (optional): name of the table in the data source for inserting the summary contents. If not present, the name of the summary will be used.

  • IF RELATION EXISTS { REPLACE | ERROR } (optional, defaults to ERROR): marks the behaviour if a table named table_name already exists in the data source. With ERROR the command will fail and with REPLACE the table will be recreated.

  • FOLDER (optional): folder of the database in Virtual DataPort where this summary will be stored.

  • DESCRIPTION (optional): description of the summary in Virtual DataPort.

  • primary key (optional): fields that make the primary key of the summary. See Creating a Base View for details on how to specify them.

  • QUERY REWRITE ENABLED (optional, defaults to TRUE): when TRUE the summary will be part of the optimizer’s list of canditates for partial substitutions in queries. Whether part of a query can be substituted by a summary is determined, among other factors, by the query specified in select_query.

  • OPTIONS: optional parameters to configure the insertion. Some data sources do not support some parameters. The available insert options are:

    • batch_insert_size: the number of tuples of each batch insert.

    • location: the location where Virtual DataPort will upload the data. It depends on the data source:

      • Hadoop data sources (Hive, Impala, Presto, Spark): the HDFS URI.

      • Redshift: the S3 bucket name.

      • Athena: the S3 location.

    • parquet_row_group_size: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses parquet files to insert the data: Athena, Hive, Impala, Presto, and Spark.

Note

Use the following command to set the default value of the parquet_row_group_size:

SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.parquet.rowGroupSize' = '<value>';

You do not need to restart the Virtual DataPort server to apply this change.

  • select_query: query used to materialize the summary. This query will be used by the optimizer to determine if it is possible to replace with the summary part of the execution plan of another queries. When creating the summary or every time it is refreshed, this query will be executed and its result will be inserted in the remote table.

Important

When you execute CREATE SUMMARY VIEW from the VQL Shell, select the check box Retrieve all rows. Otherwise, the execution engine will only insert the number of rows set by the box Display rows of the VQL Shell.

Remarks

This command only works for JDBC data sources.

To remove a summary programmatically alongside its remote table, use the stored procedure DROP_REMOTE_TABLE. To keep the remote table on the database, but remove the summary from Virtual DataPort, use the command DROP VIEW instead.

If one of the field names is a reserved word in the data source, Virtual DataPort will rename the field name to ‘field_0’. If this happens with more fields, they will be renamed to ‘field_1’, ‘field_2’, etc.

This command relies on CREATE REMOTE TABLE Command for storing summary contents in the data source, so all its specific data source configuration remarks apply here as well.

Privileges Required

Summaries can only be created, edited and dropped by server administrators.

The user account set in the JDBC data source has to be able to create, replace and drop tables in the underlying database.

Example

CREATE OR REPLACE SUMMARY VIEW my_summary
INTO sqlserver_data_source
CATALOG = 'master'
SCHEMA = 'dbo'
AS SELECT * FROM vdp_view WHERE id > 100;

This command will perform these steps:

  1. Create a table called my_summary in the underlying database of the JDBC data source sqlserver_data_source.

  2. Execute the following query in the VDP Server.

    SELECT *
    FROM vdp_view
    WHERE id > 100;
    
  3. Insert the result of the previous query into the table my_summary.

  4. Create in Virtual DataPort a JDBC wrapper named my_summary.

  5. Create in Virtual DataPort a summary view named my_summary which uses the wrapper from the previous step for accessing data.