USER MANUALS


Create base views from data in an object storage considering schema evolution

For Apache Iceberg or Delta Lake tables, Denodo relies on the support these formats provide for schema evolution. See Apache Iceberg and Delta Lake documentation for more information.

For data in Parquet format, Virtual DataPort can analyze multiple files and create a view containing all the columns found in this analysis and for each one assign the wider type found. To enable this behavior, select the check box Consider schema evolution on the left corner of the introspection panel before creating the views. As introspecting big parquet directories can be time consuming it is possible to configure the size of the sample and to limit the sample to those files modified after a certain date.

Enable schema evolution

When creating views with the schema evolution enabled, if any Parquet dataset is selected, the Design Studio will show a new screen to configure some parameters before the introspection starts. It contains a table with one row for each selected dataset and the following columns:

  • Table: shows the selected table name.

  • Last modification date: shows the timestamp when the table was lately modified (in UTC).

  • Number of files to read: establishes the maximum number of files (file limit) to analyze when introspecting the table. Increase this value to obtain compatibility with more changing columns, or lower it in order to get a more narrower schema.

  • Read after this date (optional): in this field you can specify the time instant (in UTC time format) from which the files will be analyzed. The files will be analyzed from oldest to newest, starting on the specified instant until the file limit is reached. If this field is empty, Virtual DataPort will analyze the files from newest to oldest until the file limit is reached starting on the Last modification date of the table.

Schema evolution panel

Note

For Delta Lake and Iceberg tables, the schema information is automatically obtained from the table metadata, so the schema evolution configuration will not apply to them.

In those scenarios where the type of a column changed over time, Denodo applies type widening to use the most generic type. The following table shows the current type widening support:

Source type

Supported wider types

integer

bigint, real, double

bigint

real, double

real

double

If the types are not compatible, Denodo selects the most recent one to allow querying the most recent data. In these scenarios, it would be possible to see all data by manually creating a partitioned union partitioned by the date of the type change.

Add feedback