COLUMN_DEPENDENCIES

Description

The stored procedure COLUMN_DEPENDENCIES returns the “lineage” of the columns of a view. That is, the list of data sources and views from which the value of a field is obtained. The difference with the lineage of a column and the “Tree View” is that the “Tree view” or only provides information about the views. The lineage of a column goes to a deeper level, a column.

This procedure allows you to obtain programmatically the same information that the Administration Tool displays in the Data Lineage dialog of the derived views.

If you want to obtain the dependencies between views and you do not need the dependencies between fields, we recommend using the procedure VIEW_DEPENDENCIES instead of this one. Its output is simpler than the output of this procedure.

Syntax

COLUMN_DEPENDENCIES (
      input_view_database_name : text
    , input_view_name : text
    , input_column_name : text
)
  • input_view_database_name: name of the database of the view.

  • input_view_name: name of the view.

  • input_column_name (optional): name of the field you want to obtain its dependencies from. If null, the procedure returns the dependencies of all the fields of the view.

Each row of the result of this procedure represents a dependency between a field of the derived view “view_name” and the field of another element (“dependency name”), which can be a derived view, a base view or a data source.

The output schema has the following fields:

  • view_database_name: name of the database that the view on the field “view_name” belongs to.

  • view_name: name of the derived view.

  • column_name: name of the column.

  • view_identifier: identifier of the view. It is autogenerated by the stored procedure and may change between executions of the procedure. See the example below to understand how to use this value.

  • private_view: true if the view on the field “view_name” is private; false otherwise.

    A view is private when “view_name” represents a join, minus, intersect view of another join/minus/intersect view and the view was internally created by Virtual DataPort and not by the user.

    For example, if you create a derived view called J_V that joins three views: (V1 JOIN V2) JOIN V3, internally Virtual DataPort creates a join view of V1 and V2. J_V will be the result of joining this internal view with V3. In this example, the join of V1 and V2 is a private view and J_V is public.

    The information about private views allows you to know in which order a join/minus/intersect view of more than two views is performed.

  • view_type: type of view. I.e. Base view, Select, Join, Interface, etc. When the type of the view is join, the value of this field includes the following information, in this order:

    • Type of join: it can be “Inner”, “Left outer”, “Right outer” or “Full outer”.

    • Join method: it can be “Any”, “Hash”, “Nested”, “Nested parallel” or “Merge”. “Any” means that the user did not select a method to execute the join when creating the view so the Execution Engine will select the method at runtime.

    • Order of the join: it can be “Any”, “Ordered” or “Reverseorder”. “Any” means that the user did not select an order to execute the join when creating the view so the Execution Engine will select an order at runtime.

  • dependency_database_name: name of the database that the dependency belongs to.

  • dependency_name: name of the element (view or data source) on which the view on the “view_name” field depends.

  • dependency_column_name: name of the field on which the field on “column_name” depends. The value of this field will be a comma-separated list of fields, if “column_name” is the result of evaluating an expression that involves more than one field. The “expression” field of the same row will contain this expression.

  • dependency_identifier: internal identifier of the element (view or data source) on which the view on the “view_name” field depends. This number identifies the dependency between a derived view and another element. Therefore, all the rows that represent a dependency between a field of the view V1 and the view V2, will have the same “dependency_identifier”.

    This identifier is autogenerated by the stored procedure and may change between executions of the procedure.

  • dependency_type: type of the dependency element.

  • expression: if not null, it is the expression assigned to the field of the view (column_name).

  • depth: distance between the view of the field “view_name” and the dependency.

    When a row represents a direct dependency, the value of this field is 1. Direct dependencies are dependencies between views that directly depend on each other. E.g. a base view and a projection view created over this base view have a direct dependency.

    When a row represents a transitive dependency, the value of this field is greater than 1. Transitive dependencies are dependencies between views that depend on each other, but there are other views in between. Let us say that we have created a selection view called P_V created over a base view BV1. In this example, there is a transitive dependency between P_V and the data source of the view BV1.

Examples

The following examples assume that the user executing these queries is an administrator or an administrator of the database of the views mentioned in the example. Otherwise, this procedure does not return information about the data sources.

Example 1

Let us say you have a view “orders_of_european_customers” (database “tpc_h”), execute this to obtain the lineage of the fields of this view:

SELECT *
FROM COLUMN_DEPENDENCIES()
WHERE input_view_database_name = 'tpc_h'
    AND input_view_name = 'orders_of_european_customers';

Example 2

Let us say you have a view “orders_of_european_customers” (database “tpc_h”), execute this to obtain the data source(s) from which the data of the column “total_amount” is obtained. Note that a field can be traced back to more than one source. For example, if a field is the result of concatenating two fields from two sources.

SELECT *
FROM COLUMN_DEPENDENCIES()
WHERE input_view_database_name = 'tpc_h'
    AND input_view_name = 'orders_of_european_customers'
    AND input_column_name = 'total_amount'
    AND position('Datasource' IN dependency_type) > 0
    AND depth = 1;

Example 3

Let us say that we have created a view called customer_invoicing like this: customer_invoicing like this:

(client_df LEFT OUTER JOIN invoice) INNER JOIN product

and that you have added a field to customer_invoicing that is an expression: the concatenation of the fields first_name and last_name of the client_df table.

The following image is a screenshot of the result of executing

SELECT *
FROM COLUMN_DEPENDENCIES ()
WHERE input_view_database_name = 'customer360'
    AND input_view_name = 'customer_invoicing'
    AND input_column_name = 'date';

For formatting purposes, some columns are not displayed.

Sample result of executing the procedure COLUMN_DEPENDENCIES

Sample result of executing the procedure COLUMN_DEPENDENCIES

To understand the result, take into account the following:

  • The row #2 and #3 represent the dependencies of a private view (the field “private” is true in these rows). This private view is the left outer join between employees and orders.

  • The rows #4 till the last one describe the dependencies of the view customer_invoicing. Note that all of them have the same “view_identifier”. All the dependencies of a view have the same “view_identifier”.

  • The rows whose value of “dependency_identifier” is 5 represent the direct or transitive dependencies with the DF data source client_df:

    • Row #1 represents the direct dependency between the base view client_df and the DF data source. The dependency is direct because the value of the “depth” field is 1.

    • Row #3, the transitive dependency between the private view and the DF data source. The dependency is transitive because “depth” is greater than 1.

    • Row #6, the transitive dependency between the view customer_invoicing and the DF data source. This dependency is also transitive because “depth” is 2.

  • In row #4, the value of the field “expression” is the expression assigned to the field “full_name” of the view “customer_invoicing”.

Privileges Required

The information returned by the procedure changes depending on the type of user that executes the procedure:

  • Administrators: the procedure returns information about all the fields’ dependencies.

  • Administrators of a database: if the user is an administrator of the database of the input parameter “database name”, the procedure returns information about the fields of all the views of that database.

  • Users that have METADATA privileges over a database: the procedure returns information about the dependencies of the fields of all the views of that database. However, it does not return the dependencies between views and data sources.

  • Users that have METADATA privileges over some views of a database: the procedure returns information about the dependencies of the fields of the views on which the user has the METADATA privilege granted. If the user does not have METADATA privilege granted over the dependency, the value of the field “dependency_type” will be “No Privileges”. It does not return the dependencies between views and data sources.