Managing Remote Tables¶
This section describes how to create, edit, and drop remote tables using Design Studio.
Note
This feature is supported for JDBC data sources; not for other types of sources.
To use this feature with HDFS-based databases (Hive, Impala, PrestoDB, Trino, Spark, Databricks), first you need to enable bulk data load.
Creating Remote Tables¶
There are three ways of creating a remote table:
Using the wizard of Design Studio.
Using the CREATE_REMOTE_TABLE stored procedure.
Using the CREATE REMOTE TABLE command. This creates the table in the underlying database and inserts the results of the query in this table. However, it does not create the associated base view in Virtual DataPort.
This section describes how to create a remote table using Design Studio. To do this, follow these steps:
Click the context menu of the JDBC data source where you want to create the remote table and then, click Create Remote Table.
Enter the following information:
Target data source: data source where you want to create the remote table.
Insert options: parameters to configure the process of inserting data into the database. All of them are optional and if empty, the execution engine will use the default values.
The available options depend on the type of database:
Batch insert size: number of tuples of each batch insert.
Location: location where Virtual DataPort will upload the data. It depends on the data source:
Hadoop data sources (Hive, Impala, PrestoDB, Trino, Spark): the HDFS URI.
Redshift: the S3 bucket name.
Athena: the S3 location.
Parquet row group size: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses parquet files to insert the data: Athena, Hive, Impala, PrestoDB, Trino and Spark.
Remote table: enter the name, catalog and schema of the new remote table. By default, the first character of the remote table name must be one of A to Z or a to z. The next characters have to be one of a to z, A to Z, numbers or underscores. The remote table name will be created using the default case of the target data source and cannot be a reserved word. See section Create remote table using case-sensitive identifiers and Unicode characters.
Manage custom DDL template: by default, Denodo uses a built-in SQL CREATE TABLE statement to create the table. For the uncommon situations in which the default statement is not enough, Denodo allows to customize the commands used to create the tables. Section Customizing the Table Creation Template explain how to define custom templates.
Base view: enter the information of the base view that will be created.
VQL Query: query that will be executed and whose result will be inserted in the new remote table.
Alternatively, you can drag a view from the Server Explorer to VQL Query to create the remote table using the definition of this view.
Click Create.
To create the remote table, the Execution Engine does the following:
It creates a table in the underlying database of a JDBC data source.
It executes the VQL query.
It inserts the result of this query into the table of the database created in step #1. If in the data source, the use of the bulk load API is enabled, the execution engine uses the bulk data load API of the database to insert the results.
It creates a base view over the table created in step #1.
Privileges Required
To do this, you need these privileges:
“Connect” over the database of the JDBC data source.
“Execute” over the JDBC data source.
“Connect” and “Create view” over the Virtual DataPort database in which the base view will be created.
“Create folder” over the database where the base view will be created, if the procedure has to create the base view on a folder and the folder does not exist.
“Execute” over the views referenced in the query.
The user account set in the JDBC data source has to be able to create tables in the underlying database.
Requirements
To use this feature, make sure Data Movement is enabled. To check this, open the dialog Server Configuration - Queries Optimization. Otherwise, the creation of remote tables will fail.
Create remote table using case-sensitive identifiers and Unicode characters
By default, the remote table names and columns can only contain the characters a
to z
, numbers and underscore _
, and will be created using the default case of the data source.
To create case-sensitive identifiers or table and columns names that contain characters different than the ones listed before, execute this command:
SET 'com.denodo.vdb.util.tablemanagement.escapeDDLIdentifiersEnabled' = 'true';
The change to this property is applied immediately; you do not need to restart.
To restore the default behavior, execute this:
SET 'com.denodo.vdb.util.tablemanagement.escapeDDLIdentifiersEnabled' = 'false';
To obtain the current value of this property, execute this:
SELECT property_value FROM get_parameter() WHERE input_property_name = 'com.denodo.vdb.util.tablemanagement.escapeDDLIdentifiersEnabled';
To create base views or column names using Unicode characters, configure the Virtual DataPort server in Unicode mode. See Identifiers Charset.
Limitations
If the table name includes special Unicode characters such as those with accents, diacritical marks, or characters from other alphabets like Cyrillic, Chinese, Japanese, etc., bulk data load may not function properly in data sources that rely on external executables for data loading. In this case, Denodo will create the remote table, but the bulk data load will fail, and no data will be inserted. The errors that occur when this problem happens are displayed below:
Oracle:
Error executing sqlldr command.
SQL Server:
Error executing bcp command.
Many data sources that rely on Hadoop, such as Hive, Impala, PrestoDB, PrestoSQL, Spark, and Databricks, have a limited range of characters that can be used as identifiers. Only lowercase letters from a
to z
, numbers, and underscores _
are supported. Enabling the property 'com.denodo.vdb.util.tablemanagement.escapeDDLIdentifiersEnabled'
will not apply to these data sources. If you try to create a remote table with a name that contains other characters, the creation of the remote table will fail in theses data sources. The errors that occur when this problem happens is displayed below:
Invalid table name. The identifier '<identifier>' contains unsupported characters for Spark SQL 2.x
Note
If a remote table is created on the Embedded MPP, it is recommended to execute the COMPUTE_SOURCE_TABLE_STATS stored procedure against it.
Editing Remote Tables¶
To edit a remote table, open the base view associated with the remote table and then, click Edit Remote.
After doing the necessary changes, click Create. The Execution Engine will do the following:
Drop the current table from the underlying database. All the data in this table will be lost.
Create a table in the underlying database of the JDBC data source.
Execute the query in Virtual DataPort.
Insert the result of this query into the table of the database created in step #2.
Replaces the existing base view with a new one, with the schema of the table created in step #2. If the base view had dependent elements, the changes in the schema will not be propagated.
If you want to modify a remote table programmatically, use the stored procedure CREATE_REMOTE_TABLE.
Refresh Data¶
There are two ways of refreshing the data of a remote table:
Executing the command REFRESH.
Graphically, using Design Studio: open the base view associated with the remote table and click Refresh Data.
In both cases the Execution Engine does the following:
Truncate the current table from the underlying database. That is, remove all the data from the table. If the remote table does not exist, then the Virtual DataPort server will create it.
Execute the query associated with this remote table.
Insert the result of this query into the table of the database created in step #1.
You can also insert new data either using the INSERT command or scheduling a periodical refresh using the Denodo Scheduler Module. In this case, and depending on your scenario, you can perform incremental inserts and update the data that changed during a time interval. Visit sections INSERT Statement and VDPDataLoad Extraction Section for information on these operations.
Dropping Remote Tables¶
When you drop a remote table using Design Studio, you delete the base view associated to it and optionally, you can choose to delete the table from the underlying database. This is different from removing a regular base view because with regular base views, the underlying database is not modified.
To remove remote tables programmatically, use the stored procedure DROP_REMOTE_TABLE.
Customizing the Table Creation Template¶
To create the table containing the data from the query defined, Denodo uses a built-in SQL CREATE TABLE statement that should be sufficient for most scenarios. For these uncommon situations in which the default statement is not enough, Denodo allows to customize the commands used to create the tables. The overall process is described in section Table Creation Templates Management. In contrast to the cache configuration, to define a custom template for remote tables or summaries it is necessary to specify which of the variables entered in the template are columns and which ones are environment-specific variables. Make sure that the ones that are columns match the variables entered in the VQL query.