Settings of the Generation of the Temporary Files¶
When Virtual DataPort uses the proprietary interface of the database to load data in bulk, first, it writes the data to a delimited file that is deleted once is transferred to target database.
The following properties define several aspects of these temporary delimited files:
Meaning |
Property name |
---|---|
Encoding of the file. Default encoding: UTF-8 |
com.denodo.vdb.util.tablemanagement.sql.insertion.DFInsertWorker.encoding |
Separator of values. Default value: 0x1E (Information separator two - U+001E) |
com.denodo.vdb.util.tablemanagement.sql.insertion.DFInsertWorker.fieldSeparator |
Quote character. Default value: 0x1F (Information separator one - U+001F) |
com.denodo.vdb.util.tablemanagement.sql.insertion.DFInsertWorker.quoteCharacter |
Row separator Default value: 0x1E followed by the end of line of the platform where the Virtual DataPort server is running. |
com.denodo.vdb.util.tablemanagement.sql.insertion.DFInsertWorker.rowSeparator |
To change its value, use the administration tool to login with an administrator account and from the VQL Shell, execute this:
SET '<name of the property>' = '<new value>';
For example:
SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.DFInsertWorker.encoding'
= 'UTF-16';
Additionally, you can change these settings for a specific database adapter instead of for all of them. That is, you can change the encoding of the files that will be written in Oracle, but leave the default encoding for the rest.
To do this, add the name of adapter after DFInsertWorker.
, followed
by a dot. For example,
SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.DFInsertWorker.oracle.encoding'
= 'UTF-16';
The name of the adapter is the value of the parameter DATABASENAME
of the CREATE DATASOURCE JDBC
statement that created the data
source. You can check this name in the dialog “VQL” of the data source.
Generation of Temporary Files for a Hadoop-compatible Distributed Object Storage¶
Databases using Hadoop-compatible storage (Hive, Impala, Presto, Spark and Databricks) require to configure the option “Use Bulk Data Load APIs” in order to insert data into their tables. In these cases, Denodo first generates temporary files containing the data to insert in Parquet format, and then uploads those files to the specific path configured in this section. Finally, Denodo will make the necessary operations to make sure the database table takes the data from the path provided.
The object storages supported to upload these data are S3, ADLS and HDFS. It is also possible to configure routes that are compatible with S3 or with the Hadoop API like Google Cloud Storage. See section Support for Hadoop-compatible storage for more details on this topic.
Some options of the format may be tuned using the following options.
Compression¶
The temporary files are generated with Snappy compression. To use another compression algorithm or disable the compression, execute this command from the VQL Shell:
SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.parquet.compression'
= '<compression mechanism>';
The value of “<compression mechanism>” can be:
off
(no compression)snappy
(default value)gzip
lzo
You do not need to restart after changing this property.
The database needs to support the selected compression algorithm.
Generating these files compressed may speed up the process loading the cache of a view or a data movement. However, if the database is in the same local area network as the Denodo server, it may not be the case because the transfer speed is high. It is possible that there is not any reduction in time and an increase of CPU usage by the Denodo server because it has to generate the files compressed.
Decimal Precision and Scale¶
In Parquet files, the precision and scale of the values of type “decimal” has to be specified.
The Denodo server uses the “source type properties” of the fields - if available - to set these values. However, the source type properties are not always available (e.g. fields obtained from non-JDBC data sources do not have these properties unless the user manually sets them in the base view, by default the result of an expression does not have these properties, etc.). When these properties are not available, the Server assumes that the precision is 38 and the scale 20. To change these values, execute the following commands from the VQL Shell (only administrators are allowed to execute them):
SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.decimal.parquet.decimal.precision'
= '<precision>';
SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.decimal.parquet.decimal.scale'
= '<scale>';
When a value could not be represented using the current precision and scale an error like
Error processing value "xxx" as precision y and scale z
is shown.