USER MANUALS

Managing Summaries

Note

Summaries are not available in Denodo Professional or in Denodo Standard. See also Denodo Platform - Subscription Bundles.

This section describes how to create, edit, and drop summaries using the administration tool.

Creating Summaries

There are two ways for creating a summary:

  1. Using the wizard of the administration tool.

  2. Using the CREATE SUMMARY VIEW command.

This section describes how to create a summary from the administration tool. To do this, follow these steps:

  1. Right-click the JDBC data source where you want to create the remote table and then click Create Summary View.

    Note

    This feature is only supported for JDBC data sources. To use this feature with HDFS-based databases (Hive, Impala, Presto, Spark, Databricks), first you need to enable bulk data load.

  2. Enter the following information:

    Create summary view

    Create summary view

    In the definition tab:

    • Summary name: enter the name of the summary in Virtual DataPort.

    • Enabled for query rewrite: when checked the summary will be part of the optimizer’s list of candidates for partial substitutions in queries. Whether part of a query can be substituted by a summary is determined, among other factors, by the VQL query specified in this wizard.

      Note

      If the summary is created using the import option, and the table does not exist yet, the query will not use the summary for query acceleration regardless of this property until the first REFRESH (See section Export/Import Between Environments).

    • Target data source: select the data source where you want Virtual DataPort to create the table that the summary will use to insert its values.

    • Insert options: optional parameters to configure the insertion. If you leave one of these fields empty, then VDP will use the default value. Some data sources do not support some parameters. The available insert options are:

      • Batch insert size: the number of tuples of each batch insert.

      • Location: the location where Virtual DataPort will upload the data. It depends on the data source:

        • Hadoop data sources (Hive, Impala, Presto, Spark): the HDFS URI.

        • Redshift: the S3 bucket name.

        • Athena: the S3 location.

      • Parquet row group size: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses parquet files to insert the data: Athena, Hive, Impala, Presto, and Spark.

    • Remote table: select the catalog and schema where you want Virtual DataPort to create the new table containing the summary data, as well as the name you want to give to this new table. Do not confuse with the element Remote Table.

      Note

      To help the maintenance of these special tables, we recommend specifying a separate schema for them and following a naming convention. For example, add prefix “s_” to the table name.

    • Manage custom DDL template: by default, Denodo uses a built-in SQL CREATE TABLE statement to create the table. For the uncommon situations in which the default statement is not enough, Denodo allows to customize the commands used to create the tables. Section Customizing the Table Creation Template explain how to define custom templates.

    • VQL Query: enter the query used to materialize the summary. This query will be used by the optimizer to determine if it is possible to replace with the summary part of the execution plan of another queries. When creating the summary or every time it is refreshed, this query will be executed and its result will be inserted in the remote table.

      Alternatively, you can drag a view from the Server Explorer to VQL Query for using this view as a basis for writing the query.

      Note

      As a best practice, please specify aliases for all columns in the SELECT clause of the query definition. If the query specified in the creation does not include aliases for all the columns in the SELECT clause, Denodo will internally add an extra SELECT on top to make sure all fields have aliases. In this case, after creating the summary you will experience that the query definition you see in the VQL contains an extra SELECT clause on top of the original one.

    In the Metadata tab:

    • Database: select a Virtual DataPort database to store the summary.

    • Folder: folder of the database to store the summary.

    • Description: a description for the summary.

  3. Click Save or Save and load data:

    • Clicking on Save creates the Denodo catalog element only, without doing any modification in the data source that stores the summary’s contents. Take into account the query optimizer will not use the summary until the data has been loaded. To create the table in the data source and load the summary data you have to perform a full refresh as described in section Refresh Summary Data.

    • When clicking on Save and load data, the Execution Engine does the following:

      1. It creates a table in the underlying database of the selected JDBC data source.

      2. It executes the VQL query.

      3. It inserts the result of this query into the table of the database created in step #1. It uses the bulk data load to insert the results if the data source supports it, and it is configured.

      4. It creates a summary view linked to the table created in step #1.

Privileges Required

Summaries can only be created, edited and dropped by server administrators.

The user account set in the JDBC data source has to be able to create, replace and drop tables in the underlying database.

Requirements

In the dialog Server Configuration - Queries Optimization make sure these options are enabled:

  • Data Movement to create a summary. Otherwise, the creation of summaries will fail.

  • Automatic simplification of queries for using summary rewriting in queries.

Refresh Summary Data

As the query optimizer will use this summaries every time it is possible in a transparent way for the user, it is important to make sure the data in those summaries is up to date the same way as we do it for the cached views. For this purpose you can schedule a periodical refresh using the Denodo Scheduler Module. You need to create a VDPDataLoad and visit section VDPDataLoad Extraction Section for information on how to configure the load process.

Apart from this, you can also perform a full refresh graphically using the VDP Administration tool. To do that, open the summary and then, click refresh_summary

The Execution Engine will do the following:

  1. Truncate the current table from the underlying database. All the data in this table will be lost. If the remote table does not exist, then the Virtual DataPort server will create it. If the table exists but the column schema does not match with the summary definition, the table will be recreated.

  2. Execute the data load query in Virtual DataPort.

  3. Insert the result of this query into the table of the database created in step #1.

Finally, you can also insert new data using the INSERT command.

Export/Import Between Environments

If you export a summary and import it in a new environment, for example during a promotion, Denodo will verify if the remote table where this summary is stored exists. If the table does not exist yet, the execution engine will not use the summary for query acceleration until you execute the command REFRESH. This command will create the table if necessary, before inserting the data.

If you are promoting a new revision using the Solution Manager, and that revision includes new summaries, we recommend including the Scheduler jobs that load those summaries. In the Revision elements tab, select these jobs and select the option Execute job when revision is deployed, so the new summaries are loaded and ready to use after the promotion.

To review if a summary is valid for query rewrite, open the VDP Administration tool, open the summary and look for Valid for query rewrite on the main panel of the view.

../../../../../_images/summary_promotion.png

The possible status are:

  • Valid: The table exists in the database and the summary will be considered for query rewritings.

  • Needs refresh: The table does not exist. Virtual DataPort will create the table in the first REFRESH.

  • Needs recreate: The table exists, but the column schema does not match with the summary definition. Virtual DataPort will recreate the table in the first REFRESH.

  • Unknown: It was not possible to verify if the table exists. The execution engine will not consider this summary for query rewriting until the command REFRESH is executed on this table.

To force the optimizer to use the summary when this property is not Valid, add the parameter 'consider_all_summaries'='on' to the CONTEXT clause query. For example, you may want to do this to see the execution plan of a query and check if the optimizer would use that summary. For detailed information about promotions visit section Summary promotions.

Editing Summaries

To edit, open the summary and click on the Edit tab.

Edit summary

Edit summary

After doing the necessary changes, click Save or Save and load data. If you changed the VQL query or any of the fields related with the remote table, these buttons differ on their behavior:

  • Clicking on Save applies changes in the Denodo catalog element only, without doing any modification in the data source that stores the summary’s contents. To apply this changes in the data source and reload the summary data you have to perform a full refresh as described in section Refresh Summary Data. Take into account the query optimizer will not use the summary until the data has been loaded.

  • When clicking on Save and load data, the Execution Engine will do the following:

    1. Drop the current table from the underlying database. All the data in this table will be lost.

    2. Create a new table following the changes in the summary.

    3. Execute the query in Virtual DataPort.

    4. Insert the result of this query into the table of the database created in step #2.

    5. Existing indexes in the old table will be created in the new one. When the VQL Query has been changed and fields used by indexes have been removed, these affected indexes will not be created in the new table.

    6. Replace the existing summary with a new one, linked to the table created in step #2.

If you want to modify a summary programmatically, use the CREATE SUMMARY VIEW command.

In addition, the Options tab has available sections for configuring Indexes and Statistics.

Note

If a summary view is created on the Embedded MPP, it is recommended to execute the COMPUTE_SOURCE_TABLE_STATS stored procedure.

Dropping Summaries

When you drop a summary from the administration tool, you are asked if you want to delete the table from the underlying database. As it happens with remote tables, the tool will show a dialog like this one:

Drop remote table and base view without dependencies

Click Yes to delete the table in the underlying database, in addition to the summary.

If you are removing several summaries and/or remote tables at once, the tool will ask for which ones do you wish to remove their underlying remote table.

If you want to remove summaries programmatically alongside their remote tables, use the stored procedure DROP_REMOTE_TABLE. If you want to keep the remote table, but remove the summary element from Virtual DataPort , use instead the DROP VIEW statement.

Add feedback