Data Profiling of Views

The execution engine of Virtual DataPort implements several optimizations that, when enabled, allow you to run queries faster. One of them is called cost-based optimization. In short, Virtual DataPort has multiple options for executing the same query. The execution engine evaluates the estimated cost of executing each one of them and selects the best plan. To perform this cost evaluation, it requires that an administrator has previously configured Virtual DataPort to gather some statistics on views: number of rows, average size of each field, number of NULL values in a field, etc.

See also

For further information on the cost-based optimization and how to configure Virtual DataPort to gather the statistics it requires, check the Virtual DataPort Administration Guide.

In addition to feed the execution engine of Virtual DataPort, these statistics are useful for the Data Catalog, since they describe which kind of data you should expect for each field of a view. When they are available for the current view, they will appear in the Data profiling tab.

Statistics for each field of a view in the Data profiling tab

Statistics for each field of a view in the Data profiling tab

At the top, the Data profiling tab shows how many rows and columns the view has and when the statistics were collected. Then, for each field in the view, you get the following information about its data:

  • Field name. The name and type of the field.

  • Average size. If the field has type text, the average length in characters; the average size in bytes, otherwise.

  • Data range. If the field has a numeric or date type, the minimum and maximum values.

  • Distinct values. The number of distinct values in the field and the average number of rows per value.

  • Null values. The number of NULL values in the field and the percentage of rows with NULL values.

Note

A n/a value indicates that the statistic is not available for the field.