Bulk Data Load with the Denodo MPP¶
The Denodo Embedded MPP data source can be used to perform bulk data loads from Denodo. In this case, the Denodo server:
First generates temporary files containing the data to insert in Parquet file format or Iceberg table format (Delta Lake table format not supported),
Then uploads those files to the specific path,
HDFS URI
, configured in this section.Finally, Denodo will make the necessary operations to make sure the database table takes the data from the path provided.
For more information see Bulk Data Load on a Distributed Object Storage like HDFS or S3 at Denodo.
For details about creating Iceberg tables using the Bulk Data Load feature, see Iceberg tables and subsequent sections.
Configuration¶
Before setting up the Bulk Data Load in the Denodo Embedded MPP data source you have to create a new schema in the Denodo Embedded MPP that sets the location where the parquet files created by Denodo will be placed.
To do this you can use the Denodo stored procedure CREATE_SCHEMA_ON_SOURCE:
CALL CREATE_SCHEMA_ON_SOURCE(
'admin_denodo_mpp',
'embedded_mpp',
'hive',
'test',
'<filesystem_schema>://<host>/<folders>');
To configure the Bulk Data Load, check Use Bulk Data Load APIs
of the embedded_mpp
data source in its Read & Write
tab
and at least fill in the parameters:
HDFS URI
Server time zone
Catalog
Schema
Then, depending on the chosen file system, you may need to add some Hadoop properties
to configure authentication,
see Support for Hadoop-compatible routes section.
In the example, properties to configure access to Azure Data Lake Storage are:
Finally, click the Test bulk load
button to check that everything is working fine.