USER MANUALS

Bulk Data Load with the Denodo MPP

The Presto data source can be used to perform bulk data loads from Denodo.

In this case, Denodo:

  1. First generates temporary files containing the data to insert in Parquet file format (Delta Lake and Iceberg table formats not supported),

  2. Then uploads those files to the specific path configured in this section.

  3. Finally, Denodo will make the necessary operations to make sure the database table takes the data from the path provided.

For more information see Bulk Data Load Databases using Hadoop-compatible storage like in the MPP at Denodo.

Before setting up the Bulk Data Load in Denodo, you have to create a new schema in Presto that sets the location where the parquet files created by Denodo will be located.

To do this you can use the Denodo stored procedure CREATE_SCHEMA_ON_SOURCE:

CALL CREATE_SCHEMA_ON_SOURCE(
   'admin_denodo_mpp',
   'embedded_mpp',
   'hive',
   'test',
   '<filesystem_schema>://<host>/<folders>');

To configure the Bulk Data Load, check Use Bulk Data Load APIs of the embedded_mpp data source in its Read & Write tab and at least fill in the parameters:

  • HDFS URI

  • Server time zone

  • Catalog

  • Schema

Configure Bulk Data Load

Configure Bulk Data Load

Then, depending on the chosen file system, you may need to add some Hadoop properties to configure authentication, see Support for Hadoop-compatible routes section.

In the example, properties to configure access to Azure Data Lake Storage are:

Configure Azure Data Lake Storage properties

Configure Azure Data Lake Storage properties

Finally, click the Test bulk load button to check that everything is working fine.

Add feedback