USER MANUALS

Denodo Embedded MPP

To configure the Denodo Embedded MPP data source to perform bulk data loads follow the steps in section Bulk Data Load on a Distributed Object Storage like HDFS, S3 or ADLS.

Iceberg Tables

Apache Iceberg is a high-performance table format for large analytic datasets. Iceberg tables support ACID transactions, full schema evolution, partition evolution and table version rollback without the need to rewrite or migrate tables.

By default, the Denodo Embedded MPP uses the catalog named iceberg to store the tables in Iceberg format. In case the Iceberg connector uses a different name, follow the instructions in the section Create Iceberg Tables in the Denodo MPP or Presto Data Source When the Catalog Name is not Iceberg.

Configure the Denodo Server to Use the Embedded MPP as Cache Using Iceberg Tables

The Denodo server supports using the JDBC data sources and the Embedded MPP to cache the data of the views. The section Configuring the Cache explains how to do it. In case you want cache the data using tables in Iceberg format you have to follow these steps:

  1. Install and configure the Embedded MPP (see Embedded MPP Guide).

  2. Open the Design Studio.

  3. Open the embedded_mpp data source, which is located in the admin_denodo_mpp database.

  4. Configure the bulk data load in the embedded_mpp data source.

  5. Click the Read Write tab.

  6. Select Specify custom catalog and schema.

  7. Click Reload Catalogs and Schemas.

  8. Select the iceberg catalog.

  9. Click Save.

    Configure Iceberg tables for cache

Create Iceberg Tables in the Denodo Embedded MPP

Denodo supports creating remote tables in the JDBC data sources and also in the Denodo Embedded MPP. To create a remote table in Iceberg format, follow the steps described in the creating remote tables section and specify as a target catalog one of the Iceberg catalogs configured in the Denodo server using the property described in the previous section. This is the catalog where Denodo is going to create the remote table.

  1. Click on the three-dots icon next to embedded_mpp data source and click Replication (remote table) on the menu New.

    New remote table menu
  2. Fill the new remote table form and select the iceberg catalog.

    New Iceberg table

Note: the embedded_mpp must have the bulk data load enabled and configured to support this feature.

Bulk Data Loading into Iceberg Tables

In order to bulk load data in Iceberg format the Denodo server performs the following steps:

  1. Create a temporary table in the Embedded MPP default catalog (the one in the connection URI). It must be a non-iceberg catalog (hive by default). Bulk data loading will not work if the default catalog is a catalog using Iceberg table format.

  2. Use bulk data load to upload the data into the temporary table in Parquet format.

  3. Insert the new rows from the temporary table into the Iceberg table: INSERT INTO iceberg_table SELECT * FROM temporary_table. During this insertion, the Embedded MPP will generate the necessary Iceberg metadata.

  4. Drop the temporary table.

    Bulk Data Loading into Iceberg Tables

Note

The default catalog configured in the Embedded MPP connection URI must be a non-iceberg catalog. The cluster.sh script used to deploy the Embedded MPP cluster automatically configures a valid catalog.

Create Iceberg Tables in the Denodo MPP or Presto Data Source When the Catalog Name is not Iceberg

The Denodo Embedded MPP supports creating, querying and loading data in Iceberg tables. By default, the Denodo Embedded MPP uses the catalog named iceberg to store the tables in Iceberg format. It is possible to create new catalogs with different names that are also in Iceberg format. If you created new Iceberg catalogs, you need to configure the Denodo server to recognize them as Iceberg catalogs. You can do this by setting the Denodo server property com.denodo.vdb.util.tablemanagement.sql.PrestoTableManager.iceberg.catalogNames.

The value of this property is a comma-separated list of catalog names. For example, to indicate that the catalogs named iceberg, iceberg-onpremise and iceberg-cloud contain tables using the Iceberg format, execute the following command in a VQL Shell:

SET 'com.denodo.util.jdbc.inspector.impl.PrestoJDBCInspector.iceberg.catalogNames' = 'iceberg, iceberg-onpremise, iceberg-cloud';

Set an empty value to indicate that no catalog contains tables using the Iceberg format:

SET 'com.denodo.util.jdbc.inspector.impl.PrestoJDBCInspector.iceberg.catalogNames' = '';

Set a null value disables the property. This is the default value. When the property is disabled, Denodo assumes the tables use Iceberg format only when they are located in the catalog named iceberg.

SET 'com.denodo.util.jdbc.inspector.impl.PrestoJDBCInspector.iceberg.catalogNames' = null;

You do not need to restart the Denodo server after executing this command to take effect.

Note

This is a global property that affects all the Denodo Embedded MPP data sources and Presto data sources in the Denodo server.

Troubleshooting

This section provides information about how to resolve the most common problems while using the Denodo Embedded MPP.

The following error may happen when enabling the cache of a Denodo database or view using Iceberg tables:

com.denodo.vdb.cache.VDBCacheException: Error creating table 'vdb_cache_querypattern' into the
data source using the table creation template 'DATA_SOURCE_DEFAULT':
CREATE TABLE iceberg.default.vdb_cache_querypattern (
    queryPatternId BIGINT,
    databaseName VARCHAR,
    viewName VARCHAR,
    numConditions BIGINT,
    VDPCondition VARCHAR,
    VDPConditionList VARCHAR,
    projectedFields VARCHAR,
    expirationDate BIGINT,
    qpStatus VARCHAR,
    lastUpdated INTEGER,
    valid INTEGER
)
WITH (FORMAT = 'PARQUET', location = 'hdfs://<server>/user/presto/iceberg/vdb_cache_querypattern')

Query failed (#20240418_201136_13374_ppk3s): Table metadata is missing.

This problem occurs because the metadata of the iceberg.default.vdb_cache_querypattern table is missing. It means the location of the data or metadata no longer exists in the object storage or it is corrupted. It can happen if someone accidentally deletes the hdfs://<server>/user/presto/iceberg/vdb_cache_querypattern route. A similar error can happen with any of the other cache management tables when their metadata are missing: vdb_cache_names, vdb_cache_querypattern, vdb_cache_sequences and vdb_cache_viewname. To solve that problem you have to follow these steps:

Warning

These steps recreate the cache management tables. After that, these tables will be empty. This means the Denodo views that use the Embedded MPP as a cache have to load the cache again.

  1. Open the Design Studio.

  2. Open your Denodo Embedded MPP data source. It is located in the admin_denodo_mpp database.

  3. Click the VQL pill. The Embedded MPP VQL contains the connection URI of your Embedded MPP cluster.

    CREATE OR REPLACE DATASOURCE JDBC embedded_mpp EMBEDDED_MPP
        DRIVERCLASSNAME = 'com.facebook.presto.jdbc.PrestoDriver'
        DATABASEURI = 'jdbc:presto://<PRESTO_HOST>:<PRESTO_PORT>/hive/default?SSL=true&protocols=http11'
        USERNAME = 'presto'
        ...
    
  4. Use a JDBC client, such as “DBeaver,” to create a new PrestoDB connection using the connection URI you obtained in the previous step.

    Denodo Cache Management tables
  5. Once you are connected to the Denodo Embedded MPP, execute the following SQL command to delete the iceberg.default.vdb_cache_querypattern table.

    DELETE TABLE iceberg.default.vdb_cache_querypattern
    

    Please note that this SQL command is only compatible with Presto version 0.285 and any newer versions. If you use version 0.284 or any older version you will get the error Table metadata is missing.

  6. Use the Design Studio to disable and enable the Denodo cache. It will force Denodo to recreate the cache tables.

    1. Open the Denodo server configuration menu.

    Server Configuration Menu
    1. Click on the Cache status toggle to disable the cache and then click on the Save button.

    Disable Denodo cache
    1. Repeat the previous step to enable the Denodo cache and force the cache table recreation.

Add feedback