With Virtual DataPort, a user can store the result of a query on a table of an external database. This feature is called remote tables. In this process, Denodo creates the table with the appropriate schema and inserts the result of the query in this table. With this feature, the user does not need another tool to execute a query in Denodo and store the result in a database.
The main components of this feature are:
CREATE REMOTE TABLE(see below). This command does this:
Creates a table in the underlying database of a JDBC data source.
Executes a query in the Virtual DataPort Server.
Inserts the result of this query into the table of the database.
This command does not return any row. If you execute it from the administration tool, it does return the trace.
This command uses the bulk data load API of the target database to insert the data, if the database has such API and the target data source has the option Use bulk data load APIs enabled.
The stored procedure CREATE_REMOTE_TABLE, in addition to the previous three steps, it creates a base view that queries the table created in the database.
The stored procedure DROP_REMOTE_TABLE. It deletes a base view and the remote table accessed by the base view of a JDBC data source.
CREATE REMOTE TABLE Command¶
The syntax of the command
CREATE REMOTE TABLE is the following:
CREATE [ OR REPLACE ] REMOTE TABLE <remote_table_name:identifier> INTO <data_source_name:identifier> [ CATALOG = <data_source_catalog:literal> ] [ SCHEMA = <data_source_schema:literal> ] [ <create index clause> ]* [ OPTIONS ( <option information> [ , <option information> ]* ) ] AS <select query> <create index clause> ::= CREATE INDEX <index_name:identifier> [ UNIQUE ] ON ( <table index field> [, <table index field> ]* ) <table index field> :: = <index_field_name:identifier> [ ASC | DESC ] <option information> ::= 'batch_insert_size' = <literal> | 'location' = <literal> | 'parquet_row_group_size' = <literal> <literal> ::= '.*' (a single quote in a literal has to be escaped with another single quote. E.g. 'literal''with a quote')
<select query> ::= (see Syntax of the SELECT statement)
OR REPLACE: if present and the table
remote_table_namealready exists in the database, Virtual DataPort drops the table and creates it again. If this clause is not present and the table exists, the command fails.
query: query executed by Virtual DataPort as any other query.
data_source_name: JDBC data source. The new table will be created in the database of this data source.
data_source_catalog(optional): name of the catalog of the underlying database in which you want to create the table. If you do not specify it, the table will be created in the default catalog of the source.
data_source_schema(optional): name of the schema of the underlying database in which you want to create the table. If you do not specify it, the table will be created in the default schema of the source.
index_name: name of the index that will be created.
index_field_name: name of the field used in the index.
OPTIONS: optional parameters to configure the insertion. Some data sources do not support some parameters. The available insert options are:
batch_insert_size: the number of tuples of each batch insert.
location: the location where Virtual DataPort will upload the data. It depends on the data source:
Hadoop data sources (Hive, Impala, Presto, Spark): the HDFS URI.
Redshift: the S3 bucket name.
Athena: the S3 location.
parquet_row_group_size: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses parquet files to insert the data: Athena, Hive, Impala, Presto, and Spark.
Use the following command to set the default value of the
SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.parquet.rowGroupSize' = '<value>';
You do not need to restart the Virtual DataPort server to apply this change.
If you do not indicate the catalog nor the schema, the table will be created in the default catalog/schema.
When you execute
CREATE REMOTE TABLE from the VQL Shell, select the check box Retrieve all rows.
Otherwise, the execution engine will only insert the number of rows set by the box Display rows of the VQL Shell.
If the database is based on HDFS (Hadoop, Impala, Presto, etc.), you have to enable bulk data load on the data source before executing this command.
This command only works for JDBC data sources whose adapter is supported by the Cache Engine (list of databases supported by the Cache Engine).
A table created with this command has to be deleted from another application. However, if the table is created with the procedure CREATE_REMOTE_TABLE, it can be deleted from Denodo using the procedure DROP_REMOTE_TABLE.
If one of the field names is a reserved word in the data source, VDP will rename the field name to ‘field_0’. If this happens with more fields, they will be renamed to ‘field_1’, ‘field_2’, etc.
CREATE REMOTE TABLE command can be executed in Presto only if it uses the Hive connector.
CREATE REMOTE TABLE creates
INTERNAL TABLES in Hive and Impala and it creates
EXTERNAL TABLES in Presto, Spark and Databricks.
If you want to execute IDU queries in a table created with the
CREATE REMOTE TABLE command in Presto, you have to enable the property
hive.non-managed-table-writes-enabled=true in the Presto server configuration.
Follow these steps to enable the property in the Presto server:
Edit the file
Add the following line:
Restart the Presto server.
After creating the remote table and inserting the data, some data sources require or recommend to execute post-insertion commands. The execution engine will execute them automatically if necessary. For example, Impala requires executing a
REFRESH table_name command, and it recommends executing a
COMPUTE STATS table_name command.
Denodo executes the
COMPUTE STATS command only to create remote tables and summaries and to refresh views (see REFRESH) in Impala. It does not execute this command after a data movement or an
INSERT command. To run this command after the insertion, add the property
'compute_stats_on_target' = 'true' in the
CONTEXT of the insertion query. By default, Virtual DataPort executes the
COMPUTE STATS command only in Impala 3.x Kudu and does not execute it for Impala 2.x or previous versions. You can change the default behavior by executing the following commands:
SET 'com.denodo.vdb.util.tablemanagement.sql.ImpalaTableManager.computeStatsOnTarget' = 'true'; SET 'com.denodo.vdb.util.tablemanagement.sql.ImpalaTableManager.computeStatsOnTarget' = 'false';
SET 'com.denodo.vdb.util.tablemanagement.sql.ImpalaKuduTableManager.computeStatsOnTarget' = 'true'; SET 'com.denodo.vdb.util.tablemanagement.sql.ImpalaKuduTableManager.computeStatsOnTarget' = 'false';
To execute this command, the user needs these privileges:
Connectover the database of the JDBC data source.
Executeover the JDBC data source in which the remote table will be created.
Executeover the views used in the query.
The user account of the JDBC data source has to be able to create tables in the underlying database.
CREATE REMOTE TABLE reporting_customer360_state_ca INTO common_sources.ds_jdbc_oracle SCHEMA = 'REPORTING' AS SELECT * FROM customer360.customer WHERE state = 'CA';
This command will perform these steps:
Create a table called
reporting_customer360_state_cain the underlying database of the JDBC data source
Execute the following query in the VDP Server.
SELECT * FROM customer360.customer WHERE state = 'CA';
Insert the result of the previous query into the table