Managing Summaries

This section describes how to create, edit, and drop summaries using the administration tool.

Creating Summaries

There are two ways for creating a summary:

  1. Using the wizard of the administration tool.

  2. Using the CREATE SUMMARY VIEW command.

This section describes how to create a summary from the administration tool. To do this, follow these steps:

  1. Right-click the JDBC data source where you want to create the remote table and then click Create Summary View.

    Note

    This feature is only supported for JDBC data sources. To use this feature with HDFS-based databases (Hive, Impala, Presto, Spark, Databricks), first you need to enable bulk data load.

  2. Enter the following information:

    Create summary view

    Create summary view

    In the definition tab:

    • Summary name: enter the name of the summary in Virtual DataPort.

    • Enabled for query rewrite: when checked the summary will be part of the optimizer’s list of candidates for partial substitutions in queries. Whether part of a query can be substituted by a summary is determined, among other factors, by the VQL query specified in this wizard.

      Note

      If the summary is created using the import option, and the table does not exist yet, the query will not use the summary for query acceleration regardless of this property until the first REFRESH (See section Import/Export Between Environments).

    • Target data source: select the data source where you want Virtual DataPort to create the table that the summary will use to insert its values.

    • Insert options: optional parameters to configure the insertion. If you leave one of these fields empty, then VDP will use the default value. Some data sources do not support some parameters. The available insert options are:

      • Batch insert size: the number of tuples of each batch insert.

      • Location: the location where Virtual DataPort will upload the data. It depends on the data source:

        • Hadoop data sources (Hive, Impala, Presto, Spark): the HDFS URI.

        • Redshift: the S3 bucket name.

        • Athena: the S3 location.

      • Parquet row group size: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses parquet files to insert the data: Athena, Hive, Impala, Presto, and Spark.

    • Remote table: select the catalog and schema where you want Virtual DataPort to create the new table containing the summary data, as well as the name you want to give to this new table. Do not confuse with the element Remote Table.

      Note

      To help the maintenance of these special tables, we recommend specifying a separate schema for them and following a naming convention. For example, add prefix “s_” to the table name.

    • VQL Query: enter the query used to materialize the summary. This query will be used by the optimizer to determine if it is possible to replace with the summary part of the execution plan of another queries. When creating the summary or every time it is refreshed, this query will be executed and its result will be inserted in the remote table.

      Alternatively, you can drag a view from the Server Explorer to VQL Query for using this view as a basis for writing the query.

      Note

      As a best practice, please specify aliases for all columns in the SELECT clause of the query definition. If the query specified in the creation does not include aliases for all the columns in the SELECT clause, Denodo will internally add an extra SELECT on top to make sure all fields have aliases. In this case, after creating the summary you will experience that the query definition you see in the VQL contains an extra SELECT clause on top of the original one.

    In the Metadata tab:

    • Database: select a Virtual DataPort database to store the summary.

    • Folder: folder of the database to store the summary.

    • Description: a description for the summary.

  3. Click Save.

To create the summary, the Execution Engine does the following:

  1. It creates a table in the underlying database of the selected JDBC data source.

  2. It executes the VQL query.

  3. It inserts the result of this query into the table of the database created in step #1. It uses the bulk data load to insert the results if the data source supports it, and it is configured.

  4. It creates a summary view linked to the table created in step #1.

Privileges Required

Summaries can only be created, edited and dropped by server administrators.

The user account set in the JDBC data source has to be able to create, replace and drop tables in the underlying database.

Requirements

In the dialog Server Configuration - Queries Optimization make sure these options are enabled:

  • Data Movement to create a summary. Otherwise, the creation of summaries will fail.

  • Automatic simplification of queries for using summary rewriting in queries.

Refresh Data

As the query optimizer will use this summaries everytime it is possible in a transparent way for the user, it is important to make sure the data in those summaries is up to date the same way as we do it for the cached views. For this purpose you can schedule a periodical refresh using the Denodo Scheduler Module. You need to specify a VDP data source and in the parameterized query you can use one of the following options: - Use the command REFRESH to do a full refresh of the data. This is, Denodo will trunc the remote table containing the data and insert everything again. - Use the command INSERT INTO <summary view> ( <SELECT clause> ) to add new rows to the existing data.

Apart from this, you can also perform a full refresh graphically using the VDP Administration tool. To do that, open the summary and then, click refresh_summary

The Execution Engine will do the following:

  1. Truncate the current table from the underlying database. All the data in this table will be lost. If the remote table does not exist, then the Virtual DataPort server will create it.

  2. Execute the data load query in Virtual DataPort.

  3. Insert the result of this query into the table of the database created in step #1.

Editing Summaries

To edit, open the summary and click on the Edit tab.

Edit summary

Edit summary

After doing the necessary changes, click Save. If you changed the VQL query or any of the fields related with the remote table, the Execution Engine will do the following:

  1. Drop the current table from the underlying database. All the data in this table will be lost.

  2. Create a table in the underlying database of the JDBC data source.

  3. Execute the query in Virtual DataPort.

  4. Insert the result of this query into the table of the database created in step #2.

  5. Replace the existing summary with a new one, linked to the table created in step #2.

If you want to modify a summary programmatically, use the CREATE SUMMARY VIEW command.

In addition, the Options tab has available sections for configuring Indexes and Statistics.

Dropping Summaries

When you drop a summary from the administration tool, you are asked if you want to delete the table from the underlying database. As it happens with remote tables, the tool will show a dialog like this one:

Drop remote table and base view without dependencies

Click Yes to delete the table in the underlying database, in addition to the summary.

If you are removing several summaries and/or remote tables at once, the tool will ask for which ones do you wish to remove their underlying remote table.

If you want to remove summaries programmatically alongside their remote tables, use the stored procedure DROP_REMOTE_TABLE. If you want to keep the remote table, but remove the summary element from Virtual DataPort , use instead the DROP VIEW statement.

Import/Export Between Environments

If you export an existing summary and import it in a new environment, for example during a promotion, Denodo will verify if the remote table where this summary is stored exists. If the table does not exist yet, it will not use the summary for query acceleration until the first REFRESH (the REFRESH command creates the table before inserting the data if it does not exist). To review if your summary is valid for query rewrite, double-click on it and look for the value of the property ‘Valid for query rewrite’ on the main panel (See image).

../../../../../_images/summary_promotion.png

If denodo cannot verify if the table exists but you want to still force the optimizer to use the summary (for example to see the execution plan of a query and check if the optimizer would use that summary in that case) you can do so including the option ‘consider_all_summaries’=’on’ in the CONTEXT clause.