CREATE_TAGS_FROM_COLLIBRA

Description

The stored procedure CREATE_TAGS_FROM_COLLIBRA connects with a Collibra instance for extracting the defined information about attributes and tags assignments to views and columns.

This procedure can be used for reading the Collibra data, and also it can be used to create and assign the attributes and tags at Virtual DataPort.

Note

This feature is only available with the subscription bundle Enterprise Plus. To find out the bundle you have, open the About dialog of the Design Studio or the Administration Tool. See more about this in the section Denodo Platform - Subscription Bundles.

Syntax

CREATE_TAGS_FROM_COLLIBRA (
      input_scope : text
      , input_action : text
      , input_database : text
      , input_include_tags : boolean
      , input_tag_names : array
      , input_include_attributes : boolean
      , input_attribute_names : array
      , input_search_since_last_execution_time : boolean
      , input_search_since : timestamp
      , input_unassign_tags_mode : text
      , input_unused_collibra_tags_mode : text
      , input_attribute_name_as_prefix : boolean
)
  • input_scope: name of the Collibra scope where Virtual DataPort will search for metadata. Communities and domains accessible with the scope will be accessed (mandatory).

  • input_action: indicates what the procedure have to do. There are two possible values:

    • READ: the procedure accesses to Collibra and returns the retrieved information (default).

    • CREATE: the procedure accesses to Collibra, returns the retrieved information and stores that information at Virtual DataPort.

  • input_database: name of the Virtual DataPort database where the tagging information will be imported. This parameter must be used only when you want to override the database retrieved from Collibra.

  • input_include_tags: if Collibra tags are retrieved. (default value: false)

  • input_tag_names: names of the Collibra tags which we want to retrieve. If not defined, then any tag found on the accessible elements will be retrieved.

  • input_include_attributes: if Collibra attributes are retrieved. (default value: false)

  • input_attribute_names: names of the Collibra attributes which we want to retrieve, such as Security Classification or Personally Identifiable Information. Mandatory when input_include_attributes=true.

  • input_search_since_last_execution_time: indicates if Virtual DataPort must execute an incremental search since latest execution or since the instant indicated by input_search_since. This option is ignored in case of input_include_tags=true.

  • input_search_since: instant since the search of Collibra metadata must be done. In case of input_search_since_last_execution_time=true and no value for this parameter, the search will use the latest time the procedure was executed (recommended method).

  • input_unassign_tags_mode: indicates what Virtual DataPort will do with the pre-existent tags on the views and columns of the databases found at Collibra metadata. There are two possible values:

    • ALL: all pre-existent tags will be unassigned from the views or columns. This option is intended to be used when tags assignment is only done at Collibra.

    • ONLY_COLLIBRA: pre-existent tags imported from an external system, such the imported with this procedure, will be unassigned from the views or columns. This option is useful for allow to define tags manually on the views or columns, but also import the information from Collibra. (default)

  • input_unused_collibra_tags_mode: indicates what Virtual DataPort will do with the imported tags from Collibra which are no longer used by views or columns. There are next possible values:

    • RETAIN: tags are not removed from Virtual DataPort. (default)

    • DROP_UNASSIGNED: tags no longer used by views or columns are removed from Virtual DataPort. Note that it could exist tags used by other elements, such as Global Security Policies. In that case, the tags will not be removed with this option.

    • DROP_UNASSIGNED_CASCADE: tags no longer used by views or columns are removed from Virtual DataPort, and other elements, such Global Security Policies, will be also removed in case they are using some removed tag.

  • input_attribute_name_as_prefix: indicates if the tags created at Virtual DataPort from Collibra attributes will have the attribute name as prefix. It is useful for identifying the origin of the tags created at Virtual DataPort.

    For example, a column has Confidential value for attribute Security Classification` at Collibra. This procedure will create a tag on Virtual DataPort named as Security Classification > Confidential if this parameter is true, and Confidential in case of false. (default value true)

The output schema has the following fields:

  • execution_time: the instant when the procedure accesses to the Collibra instance. It could be used for incremental search using also the input_search_since parameter.

  • element_type: the type of the element returned at current tuple. For example database, tag or view.

  • database_name: the name of the database, in case the current element is a database, or the database which contains the current element in other case.

  • view_name: the view name, in case the current element is a view or a column.

  • column_name: the column name, in case the current element is a column.

  • tag: the tag name, in case the current element is a tag.

  • tag_description: the tag description, in case the current element is a tag.

  • assignment_status: indicates if the tag was correctly assigned or if not, in case the current tuple represents a tag assignment.

Privileges Required

Only administrators can invoke this procedure.

Configuration

Currently, it does not exist a dialog for configuring the access to Collibra. It has to be configured using properties.

Next configuration properties are available:

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.url: the connection URL.

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.user: the user name to authenticate.

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.password.secret: the password from the user to authenticate. It receives a clear value or an encrypted one. In this case, the value must be prefixed with encrypted:.

    For example:

    -- Clear
    SET 'com.denodo.vdb.contrib.ext.datagov.collibra.connection.password.secret' = 'my_secret_password';
    -- Encrypted
    SET 'com.denodo.vdb.contrib.ext.datagov.collibra.connection.password.secret' = 'encrypted:<encrypted_value>';
    

    To encrypt the password, execute the statement ENCRYPT_PASSWORD. For example:

    ENCRYPT_PASSWORD 'my_secret_password';
    
  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.proxyEnabled: use a proxy to access to Collibra. Possible values are ON, OFF, AUTOMATIC or DEFAULT (use the Default Configuration of HTTP Proxy)

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.proxyHost: the host name to use as proxy.

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.proxyPort: the port number to connect to the host used as proxy.

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.proxyUser: the user name to connect to the proxy.

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.proxyPassword.secret: the password to connect to the proxy. The value follows the same format than the user password. A clear value or an encrypted one prefixed with encrypted:.

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.proxyPacUri: the PAC URI to use when AUTOMATIC status is set.

  • com.denodo.vdb.contrib.ext.datagov.collibra.connection.attributes: a list of Collibra attribute names which are going to be accessed in case you do not want to specify them when invoking the procedure. The value must be a valid JSON string list. Note that this value is only used when parameter input_attribute_names is not specified.

    For example:

    SET 'com.denodo.vdb.contrib.ext.datagov.collibra.connection.attributes'='["Security Classification", "Personally Identifiable Information"]';
    

Examples

First of all, we need to configure the access to Collibra. Let us say that the credentials are:

  • URL: https://acme.collibra.com

  • User: cl-user

  • Password: cl-password

We have to set the configuration properties. For securing the password value, we encrypt it.

ENCRYPT_PASSWORD 'cl-password';

Copy returned encrypted value, let us say it was something like: uq2Ldx0oXTIscUU/KmiVtZfFaGs.

Now, execute these commands:

SET 'com.denodo.vdb.contrib.ext.datagov.collibra.connection.url'='https://acme.collibra.com';
SET 'com.denodo.vdb.contrib.ext.datagov.collibra.connection.user'='cl-user';
SET 'com.denodo.vdb.contrib.ext.datagov.collibra.connection.password.secret'='encrypted:uq2Ldx0oXTIscUU/KmiVtZfFaGs';

The following examples assume that you have loaded your Virtual DataPort database to your Collibra instance. This can be done, for example, using the Denodo to Collibra Integration

For the examples, we have uploaded some metadata to Collibra that contains two views named customers and employees. At Collibra, we have added the following tags and attributes:

  • customers view:

    • Added tags customers_data and sensitive.

    • Columns:

      • phone: tag sensitive_value and Restricted as value for Security Classification.

      • first_name: marked as Personally Identifiable Information.

      • last_name: marked as Personally Identifiable Information.

  • employees view:

    • Added tags sensitive and employees_data.

    • Columns:

      • first_name: marked as Personally Identifiable Information.

      • last_name: marked as Personally Identifiable Information.

      • password: tag password.

      • picture: marked as Personally Identifiable Information and Highly Confidential as value for Security Classification.

      • salary: value Restricted for Security Classification.

Also, let us say that Virtual DataPort metadata was uploaded to a Collibra domain accesible by scope Denodo to Collibra Integration`.

Example 1

Let us say we want to use this procedure to see what information about tags and attributes are at Collibra. Also, we are interested on tags and values for attributes Security Classification and Personally Identifiable Information.

Execute this for obtain the data.

SELECT element_type, database_name, view_name, column_name, tag, tag_description, assignment_status
   FROM CREATE_TAGS_FROM_COLLIBRA()
   WHERE input_scope = 'Denodo to Collibra Integration'
      AND input_attribute_names = {row('Security Classification'),
            row('Personally Identifiable Information')}
      AND input_include_tags = true
      AND input_include_attributes = true
      AND input_action = 'READ'
      AND input_attribute_name_as_prefix = true;

The following image is a screenshot of the result of executing previous query.

Sample result of executing the procedure CREATE_TAGS_FROM_COLLIBRA

Sample result of executing the procedure CREATE_TAGS_FROM_COLLIBRA

Result meaning

Depending on the element_type column:

  • database: databases present at Collibra instance. In a CREATE execution they will be affected by the changes.

  • tag: when a description is retrieved for a tag is retrieved at Collibra, this tuple contains that value.

  • view: assignment to a view. It will contain a tag name on column tag. If no tag name is found, then this means that the view does no longer have tags or attributes on Collibra. This second scenario can happen when an incremental search is executed.

  • column: assignment to a column. It will contain a tag name on column tag.

Example 2

At this example, we want to apply the Collibra tags and attributes on Virtual DataPort metadata. For that, we have to execute previous query with CREATE as input_action.

SELECT element_type, database_name, view_name, column_name, tag, tag_description, assignment_status
   FROM CREATE_TAGS_FROM_COLLIBRA()
   WHERE input_scope = 'Denodo to Collibra Integration'
      AND input_attribute_names = {row('Security Classification'),
            row('Personally Identifiable Information')}
      AND input_include_tags = true
      AND input_include_attributes = true
      AND input_action = 'CREATE'
      AND input_attribute_name_as_prefix = true;

Now, execute a Refresh Design Studio or the Administration Tool and if we can see the tags created and assigned on Virtual DataPort.

Tags imported using the procedure CREATE_TAGS_FROM_COLLIBRA

Tags imported using the procedure CREATE_TAGS_FROM_COLLIBRA

And one of the views.

Tags imported to employees using the procedure CREATE_TAGS_FROM_COLLIBRA (1)

employees view with imported tags

Example 3

At this example, we only want the information about Collibra attributes, but not the Collibra tags. We have to use the parameter input_include_tags with false value.

SELECT element_type, database_name, view_name, column_name, tag, tag_description, assignment_status
   FROM CREATE_TAGS_FROM_COLLIBRA()
   WHERE input_scope = 'Denodo to Collibra Integration'
      AND input_attribute_names = {row('Security Classification'),
            row('Personally Identifiable Information')}
      AND input_include_tags = false
      AND input_include_attributes = true
      AND input_action = 'CREATE'
      AND input_attribute_name_as_prefix = true

If we Refresh the client tool, we can see opening a view that only the Collibra attributes are assigned. For example, employees view does no longer have the tags employees_data, sensitive and password.

Tags imported to employees view using the procedure CREATE_TAGS_FROM_COLLIBRA (2)

employees view with imported tags

We are not accessing to Collibra tags, so we can test the input_search_since_last_execution_time parameter. If we execute now with specifying true to that parameter, we can see that no results are returned, as there are not changes on Collibra since last import process.

At Collibra, we remove the Personally Identifiable Information from column first_name of the view employees. If we execute now the procedure incrementally, only the tagging information for view employees is retrieved, as it is the unique modified element since last import.

SELECT element_type, database_name, view_name, column_name, tag, tag_description, assignment_status
   FROM CREATE_TAGS_FROM_COLLIBRA()
   WHERE input_scope = 'Denodo to Collibra Integration'
      AND input_attribute_names = {row('Security Classification'),
            row('Personally Identifiable Information')}
      AND input_include_tags = false
      AND input_include_attributes = true
      AND input_action = 'CREATE'
      AND input_attribute_name_as_prefix = true
      AND input_search_since_last_execution_time = true;

The following image is a screenshot of the result of executing previous query.

Sample result of executing the procedure CREATE_TAGS_FROM_COLLIBRA incrementally

Sample result of executing the procedure CREATE_TAGS_FROM_COLLIBRA incrementally

Note that first_name does no longer have tags and no changes are applied to other views.