Managing Remote Tables

This section describes how to create, edit, and drop remote tables using the administration tool.

Creating Remote Tables

There are three ways of creating a remote table:

  1. Using the wizard of the administration tool.

  2. Using the CREATE_REMOTE_TABLE stored procedure.

  3. Using the CREATE REMOTE TABLE command. This creates the table in the underlying database and inserts the results of the query in this table. However, it does not create the associated base view in Virtual DataPort.

This section describes how to create a remote table from the administration tool. To do this, follow these steps:

  1. Right-click the JDBC data source where you want to create the remote table and then click Create Remote Table.

    Note

    This feature is only supported for JDBC data sources. To use this feature with HDFS-based databases (Hive, Impala, Presto, Spark, Databricks), first you need to enable bulk data load.

  2. Enter the following information:

Create remote table

Create remote table

  • Target data source: data source where you want to create the remote table.

  • Insert options: parameters to configure the process of inserting data into the database. All of them are optional and if empty, the execution engine will use the default values.

    The available options depend on the type of database:

    • Batch insert size: number of tuples of each batch insert.

    • Location: location where Virtual DataPort will upload the data. It depends on the data source:

      • Hadoop data sources (Hive, Impala, Presto, Spark): the HDFS URI.

      • Redshift: the S3 bucket name.

      • Athena: the S3 location.

    • Parquet row group size: the row group size in megabytes. The property is only valid for the data sources for which Virtual DataPort uses parquet files to insert the data: Athena, Hive, Impala, Presto, and Spark.

  • Remote table: enter the name, catalog and schema of the new remote table.

  • Base view: enter the information of the base view that will be created.

  • VQL Query: query that will be executed and whose result will be inserted in the new remote table.

    Alternatively, you can drag a view from the Server Explorer to VQL Query to create the remote table using the definition of this view.

  1. Click Create.

To create the remote table, the Execution Engine does the following:

  1. It creates a table in the underlying database of a JDBC data source.

  2. It executes the VQL query.

  3. It inserts the result of this query into the table of the database created in step #1. If in the data source, the use of the bulk load API is enabled, the execution engine uses the bulk data load API of the database to insert the results.

  4. It creates a base view over the table created in step #1.

Privileges Required

To do this, you need the following privileges:

  • “Connect” over the database of the JDBC data source.

  • “Execute” over the JDBC data source.

  • “Connect” and “Create view” over the Virtual DataPort database in which the base view will be created.

  • “Create folder” over the database where the base view will be created, if the procedure has to create the base view on a folder and the folder does not exist.

  • “Execute” over the views referenced in the query.

The user account set in the JDBC data source has to be able to create tables in the underlying database.

Requirements

To use this feature, make sure Data Movement is enabled. To check this, open the dialog Server Configuration - Queries Optimization. Otherwise, the creation of remote tables will fail.

Editing Remote Tables

To edit a remote table, open the base view associated with the remote table and then, click Edit Remote.

Edit remote table and base view

Edit remote table

After doing the necessary changes, click Create. The Execution Engine will do the following:

  1. Drop the current table from the underlying database. All the data in this table will be lost.

  2. Create a table in the underlying database of the JDBC data source.

  3. Execute the query in Virtual DataPort.

  4. Insert the result of this query into the table of the database created in step #2.

  5. Replaces the existing base view with a new one, with the schema of the table created in step #2. If the base view had dependent elements, the changes in the schema will not be propagated.

If you want to modify a remote table programmatically, use the stored procedure CREATE_REMOTE_TABLE.

Refresh Data

To refresh the remote table data, open the base view associated with the remote table and then, click Refresh Data.

Refresh the remote table data

Refresh the remote table data

The Execution Engine will do the following:

  1. Truncate the current table from the underlying database. All the data in this table will be lost. If the remote table does not exist, then the Virtual DataPort server will create it.

  2. Execute the data load query in Virtual DataPort.

  3. Insert the result of this query into the table of the database created in step #1.

If you want to refresh the data of a remote table programmatically, then use the command REFRESH.

Dropping Remote Tables

When you drop a remote table from the administration tool, you delete the base view associated to it and optionally, you can also delete the table from the underlying database. This is different from removing a regular base view because with regular base views, the underlying database is not modified.

When you delete a base view associated with a remote table, the tool will show a dialog like this one:

Drop remote table and base view without dependencies

Click Yes to delete the table in the underlying database, in addition to the base view.

If you are removing several remote tables at once, the tool will show a dialog like this one:

Drop multiple remote tables and base views with dependencies

If you want to remove remote tables programmatically, use the stored procedure DROP_REMOTE_TABLE.