USER MANUALS


Bulk Data Load with the Denodo MPP

The Denodo Embedded MPP data source can be used to perform bulk data loads from Denodo. In this case, the Denodo server:

  1. First generates temporary files containing the data to insert in Parquet file format or Iceberg table format (Delta Lake table format not supported),

  2. Then uploads those files to the specific path, HDFS URI, configured in this section.

  3. Finally, Denodo will make the necessary operations to make sure the database table takes the data from the path provided.

For more information see Bulk Data Load on a Distributed Object Storage like HDFS or S3 at Denodo.

For details about creating Iceberg tables using the Bulk Data Load feature, see Iceberg tables and subsequent sections.

Configuration

Before setting up the Bulk Data Load in the Denodo Embedded MPP data source you have to create a new schema in the Denodo Embedded MPP that sets the location where the parquet files created by Denodo will be placed.

To do this you can use the Denodo stored procedure CREATE_SCHEMA_ON_SOURCE:

CALL CREATE_SCHEMA_ON_SOURCE(
   'admin_denodo_mpp',
   'embedded_mpp',
   'hive',
   'test',
   '<filesystem_schema>://<host>/<folders>');

To configure the Bulk Data Load, check Use Bulk Data Load APIs of the embedded_mpp data source in its Read & Write tab and at least fill in the parameters:

  • HDFS URI

  • Server time zone

  • Catalog

  • Schema

Configure Bulk Data Load

embedded_mpp Bulk Data Load configuration

Then, depending on the chosen file system, you may need to add some Hadoop properties to configure authentication, see Support for Hadoop-compatible routes section.

In the example, properties to configure access to Azure Data Lake Storage are:

Configure Azure Data Lake Storage properties

Configure Azure Data Lake Storage properties

Finally, click the Test bulk load button to check that everything is working fine.

Add feedback