Specific Information about Cache Databases

This section contains information you have to take into account depending on the database you use to store cached data.

Amazon Athena

When using Athena to store cached data, take the following into account:

  • To use Amazon Athena as cache, it is necessary to configure the Bulk Data Load to upload the data to S3.

  • In the tab, Read & Write, select Use bulk data load APIs and fill the fields AWS acces key id, AWS secret access key and the S3 Location. Otherwise, Virtual DataPort will not be able to cache data.

  • When using Athena to store cached data, you just can use the FULL cache mode.

Azure SQL Data Warehouse

The Azure SQL Data Warehouse data source as cache have some limitations:

  • Azure SQL Data Warehouse does not support unique indexes in tables.

  • Data movements and creating materialized tables cannot be performed within a transaction.

  • Binaries and complex type columns (arrays and registers) are limited to 8,000 bytes. If this size is exceeded during a cache load process, it will end with the error “String or binary data would be truncated”.

  • The only available isolation level is READ UNCOMMITTED.

Because of an issue in the Microsoft SQL Server Driver, you need a driver version 7.1.3 or newer to use Azure SQL Data Warehouse as cache. See Bug for more details.

Presto

When using Presto to store cached data, take the following into account:

  • You can only use it when the data is stored in the Hadoop Distributed File System (HDFS).

  • In the tab, Read & Write, select Use bulk data load APIs and fill the fields HDFS URI and Hadoop executable location. Otherwise, Virtual DataPort will not be able to cache data.

  • When using Presto to store cached data, you must not select the cache mode Partial on a view. The data will not be cached.

Spark

During the process of invalidating the cache, some queries may fail if they are executed when the Parquet file changes.

Yellowbrick

Yellowbrick supports two types of encoding for its databases: LATIN9 (default) and UTF-8. Keep in mind that Virtual DataPort can provide full support only to databases using encoding UTF-8. You can select the database encoding when creating the database using encoding = UTF-8.

If a database with encoding LATIN9 is used as a cache data source, data from invalid characters may be lost when caching from other data sources.

Databases with HDFS Storage

Databases that use HDFS storage (Hive, Impala, Presto, Spark and Databricks) only support full cache mode. Any view with partial cache will be ignored and behave as if it has no cache.

The temporary files are generated using snappy compression. If you want to change the Parquet files compression, then execute this command from the VQL Shell:

SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.parquet.compression'
    = '<compression mechanism>';

The value of “<compression mechanism>” can be off (no compression), snappy, gzip or lzo.

You do not need to restart after changing this property.

The database needs to support the selected compression algorithm.

Generating these files compressed may speed up the process loading the cache of a view or a data movement. However, it may not be the case if the database is in the same local area network as the Denodo server because the transfer speed is high. It is possible that there is not any reduction in time and an increase of CPU usage by the Denodo server because it has to generate the files compressed.

Besides, these databases do only support Full Cache invalidation.