Virtual DataPort uses the Impala 3.x Kudu database adapter to bulk load data into Kudu, as recommended by the Kudu documentation. To configure a Kudu data source to perform bulk data loads, follow the same process described for Impala.

Steps Virtual DataPort performs to bulk load data into Kudu

This section describes how Virtual DataPort loads data in bulk into Kudu data sources using the Impala 3.x Kudu database adapter.

Virtual DataPort performs the following steps when a query involves a bulk load to Kudu:

  1. Creates an auxiliary table in Impala, for example, impala_table.

  2. Bulk loads the data into the table created in the previous step. Virtual DataPort uses Parquet files to perform the bulk data load into Impala.

  3. Inserts or merges the data into the Kudu table using the following command: [INSERT | UPSERT] INTO TABLE <kudu_table> SELECT * FROM <impala_table>.

  4. Calculates the Kudu table statistics by executing the following command: COMPUTE STATS <kudu_table>.

  5. Deletes the auxiliary table created in Impala and the Parquet files used for bulk data loading.

Denodo executes the COMPUTE STATS command only to create remote tables, create summaries, and refresh views (see REFRESH) in Kudu. It does not execute this command after loading the cache of a view, a data movement, or an INSERT command. To run this command after the insertion, add the property 'compute_stats_on_target' = 'true' in the CONTEXT of the insertion query. By default, Virtual DataPort executes the COMPUTE STATS command only in Impala 3.x Kudu and does not execute it for Impala 2.x or previous versions. You can change the default behavior by executing the following commands:

SET 'com.denodo.vdb.util.tablemanagement.sql.ImpalaKuduTableManager.computeStatsOnTarget' = 'true' | 'false';