USER MANUALS

Manage Views Created From Object Storage

Views created from files in an Object Storage are different from other views in some aspects:

  • Inserts: It is possible to insert individual rows or insert using a SELECT query that can be fully delegated to the target data source (direct load). However, it is not currently possible to insert data into these kind of views using Denodo’s bulk load mechanism.

  • Export: The VQL of the view includes an id to identify the route defined on the data source for the external object storage. It also includes the relative path from that base route.

  • Lineage: If the base route of the data changes or the schema of the files (columns and types) differs from the schema in the view, the view will turn into an invalid state. In order for the view to be valid again you can recreate it from the data source with the new schema changes or fix the route to point to the right dataset.

  • Schema evolution: Schema evolution can be considered during the base view creation. As any other base view, it is also possible to periodically update the tables metadata executing a source refresh.

  • Gather statistics: In order to achieve the best performance in queries accessing these data, it is very important to compute the statistics on these views and keep them up to date. You can do that by executing the stored procedure COMPUTE_SOURCE_TABLE_STATS or following the steps in section Gathering the Statistics of Views

  • Metadata maintenance: There exists some maintenance tasks that is recommended to execute on a regular basis to make sure the data is complete and the performance is optimal:

Note

Consider creating jobs using the Denodo Scheduler to compute the statistics and execute the metadata maintenance tasks periodically.

Source Refresh on Views From Object Storage

The Denodo Lakehouse Accelerator allows accessing data in an Object Storage using its embedded Massive Parallel Processing engine. For these base views created using the Lakehouse Accelerator data source, the Source Refresh has additional options to deal with the Schema Evolution. A new screen will be prompted before executing the source refresh, where you can tune up the introspection process:

Source refresh with schema evolution

This panel is analog to the one presented in section Create base views from data in an object storage considering schema evolution. It allows to configure the data sample Virtual DataPort will use to infer the view schema:

  • Maximum number of files to analyze: establishes the maximum number of files (file limit) to analyze when refreshing the table.

  • Consider files modified after this date (optional): in this field you can specify the time instant (in UTC time format) from which the files will be analyzed. The files will be analyzed from oldest to newest, starting on the specified instant until the file limit is reached. If this field is empty, Virtual DataPort will analyze the files from newest to oldest until the file limit is reached.

Add feedback