Remote Tables

With Virtual DataPort, you can store the result of a query on a table of an external database. This feature is called remote tables. When creating a remote table, Denodo automatically does the following:

  • Creates the table with the appropriate schema in an external database.

  • Inserts the result of the query in this table.

  • Optionally, it creates a base view in Virtual DataPort associated to this new table.

With this feature, the user does not need another tool to execute a query in Denodo and store its result on a database. The following section (Managing Remote Tables) describes how to create, edit, and drop remote tables.

With this feature, you can use Denodo to create ETL flows (Extract, Transform, Load) easily, quickly and efficiently. You create this flow in Denodo with a declarative approach, using a SQL query,

This has multiple benefits over traditional ETL flows:

  1. In Denodo, you create the ETL flows from a SQL query. This declarative approach is easier to maintain than traditional ETL flows, which are procedural.

  2. Very often, the execution time of an ETL task in Denodo will be faster than in a conventional ETL tool. That is because:

    1. The execution engine of Denodo optimizes the process of retrieving the data.

    2. To insert the data, the execution engine uses the bulk data load API of the target database (if the database has such API). By using these APIs, Virtual DataPort inserts the data much faster than with regular INSERT statements. In addition, for some databases, the process of inserting the data is done in parallel.

  3. In Denodo, you can launch the ETL tasks manually from the administration tool, or scheduling them to run periodically using Denodo Scheduler.

This feature is very useful to move data to the same environment where the applications that consume the data run. Having the data in the same location has the following benefits:

  • Minimizes the data access time.

  • Parallel data set reads.

  • Reusable data set. The data consumption application can execute several times over the same data set without the app have to request it again to Virtual DataPort. Additionally, more than one application can access to the same data set at the same time.

A common use case of this feature is to move data from an on-premise database to HDFS or Amazon S3 and then, in the same environment, use Apache Spark to run data processing jobs.

Remote Tables vs Other Mechanisms for Persisting Data

  • Temporary Tables: can help to speed up complex queries that reuse partial calculations. They are useful for data preparation. They are meant to be short-lived; the data of a temporary table are stored in the cache database only and they are only valid during that session. Once the user or the client application that created the temporary table closes the current session, the temporary table is deleted.

  • Summaries and Caching: both are useful to speed up processing and reduce the load on the data sources. Unlike remote tables, summaries and cached views provide full lineage for the data. In addition:

    1. In the case of cached views, Denodo has total control over the tables created in the cache database and they are not meant to be used as a regular table.

    2. In the case of the summary views, the query optimizer can use them to rewrite queries dynamically.

    Read the section Caching vs Summary Rewriting for a more detailed comparison between these two.

  • Materialized Tables: convenient to store random data quickly on a table. However, unlike when caching or with summaries, you do not have lineage. That is, you do not know where the data came from, the last time it was updated, etc.

Add feedback