USING THE DATA CATALOG

The Data Catalog is a web based self service tool included in Denodo Platform that lets both technical and business users query, search and browse information and metadata stored in a Virtual DataPort server. With this tool, users can generate new knowledge and pave the way to make better decisions.

Scenario

In this tutorial, we are going to show this use case:

The IT / Data department of our company has frequent requests for access to data. These requests are usually not informed as to the types and locations of company data, and usually the requests take much longer to process than is necessary due to the lack of understanding of the underlying systems by the business user.

For solving that use case, following this tutorial you will learn how to:

  • Use Denodo Data Catalog for exploring the Denodo server metadata
  • How to edit Data Catalog metadata: Tags & Categories
  • Learn How to Build Data Products Graphically to Get Data
  • Using Generative AI
  • How users can Collaborate in Data Catalog

Before starting this tutorial, please ensure to complete the configuration steps performed in the previous tutorial for Data Catalog Administration, which will allow you to fully explore the features of the Data Catalog that we will go through in this tutorial.

If you have followed the previous tutorial, in your Virtual DataPort you will have something similar to this:

Using the Metadata Search

To start, open the Denodo Platform Control Center, and start the Virtual DataPort server and the Data Catalog. Once it changes the status to "Running", click the Data Catalog link to open the Web tool (by default: https://127.0.0.1:9090/denodo-data-catalog or http://127.0.0.1:29090/denodo-data-catalog if you are using the container of the Denodo Community Lab Environment).

Our first example is from the Data Catalog home screen. Let's use the scenario of the Business Analyst to explore a simple use case, by searching for clients, by typing in client and hitting enter.

Here we have the results of our search. From Data Catalog 8.0, this search will seek views or web services that contain the query terms in the element's metadata, such as:

  • The name and description.
  • The categories and tags assigned to the view.


For example, let's click on the view bv_crm_client to be taken to the summary of the selected view, showing more information like the values of any custom properties it has assigned:

Using the Content Search

We can also use the Index feature to explore data using the Content Search function.

  1. In the Data Catalog, navigate to the Search page, and select the following options in the Filters:

    Select Content (this option appears only after configuring the index following the steps of the previous tutorial)
    Database: tutorial
  2. Click on Apply filters button

  1. Now, type James into the search field and hit Enter to run the search:

  1. The search will return all the views that include the string James in their data sets. Click the Plus Icon (+) next to the Preview results in order to expand the results to show the field that matches the search.

  1. You can also click on the bv_crm_client view name and see the filtered data. Using the Search tab, you can search the index directly.

  1. For example, we can now search Jack, and the results from the Index are returned.

Completed! We have done a search in the Virtual DataPort metadata and content. In the next section, we will explore other advanced functions of the Data Catalog!

In this section, we will explore the features of the Data Catalog metadata.

A useful feature of the Data Catalog is the ability to display view metadata, such as the View Description, as well as the Field Descriptions. By default, this metadata is obtained from Virtual DataPort server but the Data Catalog allows users to update the view and field descriptions.

In our example we are going to add descriptions to the fields of the bv_crm_client view, to allow more specific discovery of this view. Let's see how to modify that information.

Editing View and Field Descriptions

  1. Navigate to the Summary page of the bv_crm_client view and click Edit option beside Description.

  1. Add the appropriate description to the View (for example, Base view of the Client table on the CRM data source) and click Ok.

  1. Similarly, add a description to fields by navigating to the Schema tab and click on the Edit option under each field.

    client_id: Unique client identifier
    name: First name of the customer as captured on the CRM
    surname: surname of the customer as captured in the CRM
    client_type: Indicates the client type, either 01 or 02

  1. The view now displays the added descriptions. These descriptions are saved in the Data Catalog metadata.

You can use the Denodo Data Catalog to add tags and categories to views.

Tags & Categories are useful to allow users to search with more accuracy through the Data Catalog. While the amount of Data Sources and Views is small in our tutorial, it will pay off over the long term to maintain good Categorization and Tagging habits to allow users to navigate the Data Catalog more easily.

Creating Categories

  1. Navigate to Administration > Set-up and Management
  2. In the Administration window under Catalog Management, click on Categories option

  1. Click the + Add Category icon.

  1. Create a category with the following details:

    Name: Customer
    Description: Data products related to customers

  1. Create another category with the following details:
  • Name: CRM
  • Description: CRM System
  • Parent: Customer

  1. Create a final category with the following details:
  • Name: Billing
  • Description: Customer Billing
  • Parent: Customer

We now have a useful set of categories to link to our Views.

Creating Tags

  1. Navigate to Administration > Set-up and Management.
  2. In the Administration window under Catalog Management, click on the Tags option.
  3. Click the + Add Tag icon and create a new Tag with the following details:
  • Name: #public
  • Description: Data that is readily available for public use

  1. Create another tag with the following details:
  • Name: #confidential
  • Description: Sensitive data that could negatively impact operations or execution processes

We now have a useful set of tags to link to our Views.

  1. We can now navigate to the bv_crm_client view and click on the Add Tag / Add Category buttons in the Summary tab.

  1. Click on Add Category, select CRM, and then click on Ok.

  1. Now click on the Add Tag button, select #confidential, and click Ok.

  1. We have now added this view to the Customer > CRM category and have tagged it with the #confidential tag.

Browse using Tags & Categories

Finally, let's do a quick review on how to browse your views and web services by tags, go to Browse > Tags.

  1. In the sidebar, you will see the list of tags available in the Data Catalog.

  1. Click the tag #confidential to see the elements that have been assigned with this selected tag (select the Views tab).

  1. Similarly you can browse by categories, going to Browse > Categories!

We have now seen how the effective use of the Categories and Tags can enable powerful data exploration.

In the next sections we are now going to explore other features that offer more in-depth interrogation of a view in the Data Catalog. This includes:

  • Querying a view, Saving the query, filtering results and exporting results to a file
  • Creating new fields with the Data Preparation wizard
  • Exploring view relationships and data lineage

From the previous section, we have selected our bv_crm_client view. We can now explore the contents of this view.

From the Query tab, ad-hoc queries can be run against the view (the query is created graphically). For example, select the following fields and drag them into the Output columns area.

  • client_id, name, surname, client_type

Now click Execute, to get the results:


Of course, the Data Catalog allows exporting the results! You can select CSV, HTML, Excel or Tableau as output format by clicking the button.

More Options Available When Querying a View

If we want to filter the results of the view, and, for example, order the results by the surname, we can easily do so. Click the Definition link to bring back the query options.

Begin by dragging the field by which we want to filter, for example dragging field client_type to the Filters section. We will now need to add an expression, we can add = and '02'.

We also add the surname field to the Order By section for which we want to order the results by, and click the arrow to change the Order By to descending order.

Now click Execute. The results now are filtered to only include results for customer_type = '02', and the results are ordered by the surname field.

You can click on the button to see the query that is being executed in Denodo!

Adding New Output Fields

Let us consider the scenario where we want to combine the name and surname fields into a new full_name field. We can do this by concatenating the name with the surname following these steps:

  1. Go back to the Definition of the query. In the Output columns section, click on three dots and then click on Add option.

  1. In the "New output field" dialog, click on the Edit button beside the Field name column and provide field name as full_name and Expression as concat(name,' ',surname)

  1. Our results include the newly created full_name field.

  1. If we would like to save this query for later use, we can click . This will save the query under the Query > My Queries section of the top menu.

The next tab is the Data Preparation tab, which is a WYSIWYG (What You See Is What You Get) editor, which allows you to do additional transformations to the data in a simple way. For example, let's say we want to flag all clients without proper name defined, which is those with name fields with less than 2 characters in length.

  1. Navigate to the Data preparation tab and select the following fields to the Output columns section

    client_id, name, surname, client_type

  1. Click the Load data button to import some rows of the data set to the wizard.
  2. The next Definition tab will open. Click the Add column > Add column from wizard button

  1. Specify the following details:
  • Column name: invalid_name_flag
  • Column type: boolean
  • Fill the column with: an expression value
  • CASE WHEN LEN(name) < 2 THEN true ELSE false END

  1. Click Ok
  2. Filter the data by clicking the Add filter > Add filter from wizard button

  1. Use the newly created invalid_name_flag column we added in the previous step and click Ok

  1. In the Data preparation section of the page, you can see all the transformations that has been done to the data set

  1. Click the Execute button to retrieve all the results of the transformation.

Relationships

The Denodo Data Catalog is able to show the associations/relationships between the views. This is useful for the business user to understand how certain views are related.

This information is shown under the Relationships tab.

Data Lineage

The lineage tab displays a tree graph with all the data sources and views used to build the current view.

If we click on one of the fields under View fields, we will be able to see the lineage of a specific field. This is especially useful when dealing with complicated derived views, as we will explore later.

By clicking on a node, you can see the details of the corresponding data source or view (e.g. Name, Type, Description, Projected fields, Join conditions, etc).

Lineage of Complex Views

Let us now view the lineage of a more complex view.

Return to the Search page and search for iv_client_with_bills. Open this view and navigate to the Data lineage tab and select the primary_phone field.

We can now see the value of the Data lineage tab, where we can identify the lineage of the primary_phone field including all of the operations involved with the field.

In the next section we will explore the Assisted Query feature which lets you explain your needs in natural language via the Natural language query input.


This section explores the Assisted Query feature which lets users explain their needs in natural language via the Natural language query input and invoking external LLM services. From the previous tutorial, we have already configured this feature to use OpenAI's public API, so let's dive in.

  1. Open again the bv_crm_client view and navigate to the Assisted Query tab
  2. In the Natural language query input, specify the following instruction:
  • Give me the name of clients with client type 01

  1. Click the Generate button
  2. In the result, you can see that a VQL SELECT query is generated based on the natural language input, with the corresponding query explanation

  1. Click the Execute button to see the results of the query, this is great! We have used natural language for querying our views!

Finally, let's explore how users can collaborate in the Data Catalog to allow Data Stewards to better communicate with their business users.

In this section, we will review:

  • Endorsements
  • Warnings
  • Deprecation notes to views and web services
  • How to request access to a data set

Endorsements

The endorsements are the comments by users on a view or a webservice to show their support. A user can only endorse a view or web service once, meaning, when a new comment is written, the previous endorsement will be replaced.

  1. To create endorsement, navigate to the Summary tab of the bv_crm_client view and click on Collaboration > Endorse option.

  1. In the Endorse dialog, provide the details which you would like other users to see. For example, add the details as follows:
    "This Client view is a key component of our model. It is associated with Address and Client Type views to give expanded information about each client."
  2. Click Ok to save the endorsement.

  1. In the Summary tab, the Endorsed by label displays the number of endorsements on this view and their authors. Mouse over on an author say, 'admin' to see the endorsements comment.

Warnings

Warnings are used to write and display the "advise against" messages on views and web services by users. A user can write only one warning against a view or web service.

  1. To create a warning message, go to the Summary tab of the Client view, click on Collaboration > Warn option.

  1. In the Warn dialog, add the following warning information:
    "This view will be updated with delta records once in a week"
  2. Click Ok to save the warning message.

  1. In the Summary tab, the Warning by label displays the number of warnings on this view and their authors. Mouse over on an author say 'admin' to see their warnings.

Deprecation

Deprecations are used for informing users that it is obsolete and should not be used anymore. A user can write only one deprecation about a view or web service.

  1. To deprecate a view, go to the Summary tab of the Client view, and Click on Collaborate > Deprecate option.

  1. In the Deprecate dialog, we will add the following deprecation notes:
    "This view will be deprecated from next cycle. Users will be notified about the latest view by the end of this month."
  2. Click Ok to save the deprecation note.

  1. In the Summary tab of the view, you will see the ⚠ icon in the toolbar and a notification will pop up every time you click on the icon or access the view.

Request Access Workflow

To provide workflow capabilities in Data Catalog, it has introduced some types of requests that will allow users to initiate cases on elements that will be attended by an administrator from a data stewardship perspective. Most frequent types of requests: access requests, data quality issues, metadata changes and general questions have been taken into account.

Setup a user with restricted access to the tutorial views

  1. To start exploring this feature, log in to the Design Studio and create a new role restricted with connect and metadata privileges to the tutorial database.

  1. Create a new user myuser and assign it the newly created restricted role

  1. Log in to the Data Catalog using as an admin user and navigate to the Administration > Setu-up and management > Server > Permissions > Request page.
  2. Search for the restricted role, and assign permission to create access, change, data quality and question requests.

Create access request

  1. Log in to the Data Catalog as the newly created user myuser.
  2. Open the bv_crm_client view, and notice that there are only limited tabs available to this user. Right now, the user can only see the metadata of the view. The user does not have permission to execute and retrieve the data.

  1. To request for an access, create a request by clicking on the Create request > Access button

  1. In the pop up window, specify the details of the request, and click Ok

Granting access request

  1. Log in to the Data Catalog as an admin user. This user has permission to manage access requests and grant execute permissions to the view
  2. Navigate to the Administration > Request management page
  3. In this page, you can see all the requests that have been created, including the one we have created in the previous step.

  1. Click the request for bv_crm_client view, and change the status to In Progress.

  1. In the Messages tab, you can send a message to inform the requester that the request is being worked on.

  1. In the Design Studio, the admin user can now grant access to the bv_crm_client view for the restricted role

  1. Still as an admin user, log back into the Data Catalog, and update the request in the Administration > Request management page

  1. When you log back into the Data Catalog as myuser account, and navigate to the bv_crm_client view, you can now see the Query tab and be able retrieve the data for this view.

GREAT! We have now seen how the collaborative features help users in Data Catalog.

In this tutorial, we have only had a limited number of Views, Data Sources, Tags and Categories, but it is clear that through the use of the Data Catalog, business users will be able to explore the companies data, easily and quickly, with minimal overhead on the IT team. We have also learnt about how the feature packs included in the data catalog can be used and how it helps users in a collaborative environment.

Thanks!