Remote Tables¶
With Virtual DataPort, a user can store the result of a query on a table of an external database. This feature is called remote tables. In this process, Denodo creates the table with the appropriate schema and inserts the result of the query in this table. With this feature, the user does not need another tool to execute a query in Denodo and store the result in a database.
The main components of this feature are:
The command
CREATE REMOTE TABLE
(see below). This command does this:Creates a table in the underlying database of a JDBC data source.
Executes a query in the Virtual DataPort Server.
Inserts the result of this query into the table of the database.
This command does not return any row. If you execute it from the administration tool, it does return the trace.
This command uses the bulk data load API of the target database to insert the data, if the database has such API and the target data source has the option Use bulk data load APIs enabled.
The stored procedure CREATE_REMOTE_TABLE, in addition to the previous three steps, it creates a base view that queries the table created in the database.
The stored procedure DROP_REMOTE_TABLE. It deletes a base view and the remote table accessed by the base view of a JDBC data source.
CREATE REMOTE TABLE Command¶
The syntax of the command CREATE REMOTE TABLE
is the following:
Syntax
CREATE [ OR REPLACE ] REMOTE TABLE <remote_table_name:identifier>
INTO <data_source_name:identifier>
[ CATALOG = <data_source_catalog:literal> ]
[ SCHEMA = <data_source_schema:literal> ]
[ <create index clause> ]*
[ CREATE_TABLE_TEMPLATE ( <template_definition:literal>
[ DEFAULT( <parameter definition> [, <parameter definition> ]* ) ] ) ]
[ OPTIONS ( <option information> [ , <option information> ]* ) ]
AS <select query>
<create index clause> ::=
CREATE INDEX <index_name:identifier> [ UNIQUE ]
ON ( <table index field> [, <table index field> ]* )
<table index field> :: = <index_field_name:identifier> [ ASC | DESC ]
<parameter definition> ::= '<parameter_name:literal>' = '<parameter_value:literal>'
<option information> ::=
'batch_insert_size' = <literal>
| 'location' = <literal>
| 'parquet_row_group_size' = <literal>
<literal> ::= '.*' (a single quote in a literal has to be escaped with another single quote. E.g. 'literal''with a quote')
<select query> ::= (see Syntax of the SELECT statement)
OR REPLACE
: if present and the tableremote_table_name
already exists in the database, Virtual DataPort drops the table and creates it again. If this clause is not present and the table exists, the command fails.query
: query executed by Virtual DataPort as any other query.data_source_name
: JDBC data source. The new table will be created in the database of this data source.data_source_catalog
(optional): name of the catalog of the underlying database in which you want to create the table. If you do not specify it, the table will be created in the default catalog of the source.data_source_schema
(optional): name of the schema of the underlying database in which you want to create the table. If you do not specify it, the table will be created in the default schema of the source.index_name
: name of the index that will be created.index_field_name
: name of the field used in the index.CREATE_TABLE_TEMPLATE
(optional):CREATE TABLE
command used by Denodo to create the remote table on the data source. See section Table Creation Templates for more information.template_definition
: SQL template command used by Denodo to create the remote table on the data source. It contains parameters using the syntax@{parameter_name}
.DEFAULT
(optional): properties used to set the value of the template user-defined parameters. If you export the view using the optionexport with properties
, then Denodo will export the value of these properties in the environment properties file. More info Export to a File with Properties.Example:
CREATE REMOTE TABLE <remote_table_name> INTO <data_source_name> ... CREATE_TABLE_TEMPLATE ( 'CREATE TABLE @{internal_parameter_table_name} (@{internal_parameter_columns} @{internal_parameter_restrictions}) IN ACCELERATOR @{accelerator}' DEFAULT('accelerator' = 'accel1') ) AS <select_query>
OPTIONS
: optional parameters to configure the insertion. Some data sources do not support some parameters. The available insert options are:batch_insert_size
: the number of tuples of each batch insert.location
: the location where Virtual DataPort will upload the data. It depends on the data source:Hadoop data sources (Hive, Impala, Presto, Spark, Denodo Embedded MPP): the HDFS URI.
Redshift: the S3 bucket name.
Athena: the S3 location.
parquet_row_group_size
: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses parquet files to insert the data: Athena, Hive, Impala, Presto, and Spark.
Note
Use the following command to set the default value of the parquet_row_group_size
:
SET 'com.denodo.vdb.util.tablemanagement.sql.insertion.HdfsInsertWorker.parquet.rowGroupSize' = '<value>';
You do not need to restart the Virtual DataPort server to apply this change.
Note
If you do not indicate the catalog nor the schema, the table will be created in the default catalog/schema.
Note
The first character of the remote table name must be one of A to Z or a to z. The next characters have to be one of a to z, A to Z, numbers or underscores. The remote table name cannot be a reserved word in the target data source.
Important
When you execute CREATE REMOTE TABLE
from the VQL Shell, select the check box Retrieve all rows.
Otherwise, the execution engine will only insert the number of rows set by the box Limit displayed rows of the VQL Shell.
Remarks
If the database is based on HDFS (Hadoop, Impala, Presto, etc.), you have to enable bulk data load on the data source before executing this command.
This command only works for JDBC data sources whose adapter is supported by the Cache Engine (list of databases supported by the Cache Engine).
A table created with this command has to be deleted from another application. However, if the table is created with the procedure CREATE_REMOTE_TABLE, it can be deleted from Denodo using the procedure DROP_REMOTE_TABLE.
If one of the field names is a reserved word in the data source, VDP will rename the field name to ‘field_0’. If this happens with more fields, they will be renamed to ‘field_1’, ‘field_2’, etc.
The CREATE REMOTE TABLE
command can be executed in Presto only if it uses the Hive connector.
The CREATE REMOTE TABLE
creates INTERNAL TABLES
in Hive and Impala and it creates EXTERNAL TABLES
in Presto, Spark and Databricks.
If you want to execute IDU queries in a table created with the CREATE REMOTE TABLE
command in Presto, you have to enable the property hive.non-managed-table-writes-enabled=true
in the Presto server configuration.
Follow these steps to enable the property in the Presto server:
Edit the file
/opt/presto-server-<version>/etc/catalog/hive.properties
Add the following line:
hive.non-managed-table-writes-enabled=true
Restart the Presto server.
After creating the remote table and inserting the data, some data sources require or recommend to execute post-insertion commands. The execution engine will execute them automatically if necessary. For example, Impala requires executing a REFRESH table_name
command, and it recommends executing a COMPUTE STATS table_name
command.
Denodo executes the COMPUTE STATS
command only to create remote tables, create summaries and refresh views (see REFRESH) in Impala and Kudu. It does not execute this command after loading the cache of a view, a data movement, or an INSERT
command. To run this command after the insertion, add the property 'compute_stats_on_target' = 'true'
in the CONTEXT
of the insertion query. By default, Virtual DataPort executes the COMPUTE STATS
command only in Impala 3.x Kudu
and does not execute it for Impala 2.x
or previous versions. You can change the default behavior by executing the following commands:
SET 'com.denodo.vdb.util.tablemanagement.sql.ImpalaTableManager.computeStatsOnTarget' = 'true' | 'false';
SET 'com.denodo.vdb.util.tablemanagement.sql.ImpalaKuduTableManager.computeStatsOnTarget' = 'true' | 'false';
Privileges Required
To execute this command, the user needs these privileges:
Connect
over the database of the JDBC data source.Execute
over the JDBC data source in which the remote table will be created.Execute
over the views used in the query.
The user also needs Write
privilege over the JDBC data source in case he wants to create the remote table using a custom table creation template.
The user account of the JDBC data source has to be able to create tables in the underlying database.
Example
CREATE REMOTE TABLE reporting_customer360_state_ca
INTO common_sources.ds_jdbc_oracle
SCHEMA = 'REPORTING'
AS
SELECT * FROM customer360.customer WHERE state = 'CA';
This command will perform these steps:
Create a table called
reporting_customer360_state_ca
in the underlying database of the JDBC data sourceds_jdbc_oracle
.Execute the following query in the VDP Server.
SELECT * FROM customer360.customer WHERE state = 'CA';
Insert the result of the previous query into the table
reporting_customer360_state_ca
.