Data Profiling of Views¶
The execution engine of Virtual DataPort implements several optimizations that,
when enabled, allow you to run queries faster. One of them is called cost-based
optimization. In short, Virtual DataPort has multiple options for executing the
same query. The execution engine evaluates the estimated cost of executing each
one of them and selects the best plan. To perform this cost evaluation, it
requires that an administrator has previously configured Virtual DataPort to
gather some statistics on views: number of rows, average size of each field,
NULL values in a field, etc.
For further information on the cost-based optimization and how to configure Virtual DataPort to gather the statistics it requires, check the Virtual DataPort Administration Guide.
In addition to feed the execution engine of Virtual DataPort, these statistics are useful for the Data Catalog, since they describe which kind of data you should expect for each field of a view. When they are available for the current view, they will appear in the Data profiling tab.
At the top, the Data profiling tab shows how many rows and columns the view has and when the statistics were collected. Then, for each field in the view, you get the following information about its data:
Field name. The name and type of the field.
Average size. If the field has type
text, the average length in characters; the average size in bytes, otherwise.
Data range. If the field has a numeric or date type, the minimum and maximum values.
Distinct values. The number of distinct values in the field and the average number of rows per value.
Null values. The number of
NULLvalues in the field and the percentage of rows with
n/a value indicates that the statistic is not available for the