The Data Catalog is a web based self service tool included in Denodo Platform that lets both technical and business users query, search and browse information and metadata stored in a Virtual DataPort server. With this tool, users can generate new knowledge and pave the way to make better decisions.
Scenario
In this tutorial, we are going to show this use case:
The IT / Data department of our company has frequent requests for access to data. These requests are usually not informed as to the types and locations of company data, and usually the requests take much longer to process than is necessary due to the lack of understanding of the underlying systems by the business user.
For solving that use case, following this tutorial you will learn how to:
- Use Denodo Data Catalog for exploring the Denodo server metadata
- How to edit Data Catalog metadata: Tags & Categories
- Learn How to Build Data Products Graphically to Get Data
- Using Generative AI
- How users can Collaborate in Data Catalog
Before starting this tutorial, please ensure to complete the configuration steps performed in the previous tutorial for Data Catalog Administration, which will allow you to fully explore the features of the Data Catalog that we will go through in this tutorial.
If you have followed the previous tutorial, in your Virtual DataPort you will have something similar to this:
Using the Metadata Search
To start, open the Denodo Platform Control Center, and start the Virtual DataPort server and the Data Catalog. Once it changes the status to "Running", click the Data Catalog link to open the Web tool (by default: https://127.0.0.1:9090/denodo-data-catalog or http://127.0.0.1:29090/denodo-data-catalog if you are using the container of the Denodo Community Lab Environment).
Our first example is from the Data Catalog home screen. Let's use the scenario of the Business Analyst to explore a simple use case, by searching for clients, by typing in client
and hitting enter.
Here we have the results of our search. From Data Catalog 8.0, this search will seek views or web services that contain the query terms in the element's metadata, such as:
- The name and description.
- The categories and tags assigned to the view.
For example, let's click on the view bv_crm_client
to be taken to the summary of the selected view, showing more information like the values of any custom properties it has assigned:
Using the Content Search
We can also use the Index feature to explore data using the Content Search function.
- In the Data Catalog, navigate to the
Search
page, and select the following options in theFilters
:
Select Content (this option appears only after configuring the index following the steps of the previous tutorial)
Database: tutorial - Click on
Apply filters
button
- Now, type
James
into the search field and hitEnter
to run the search:
- The search will return all the views that include the string
James
in their data sets. Click the Plus Icon (+) next to thePreview results
in order to expand the results to show the field that matches the search.
- You can also click on the
bv_crm_client
view name and see the filtered data. Using theSearch
tab, you can search the index directly.
- For example, we can now search
Jack
, and the results from the Index are returned.
Completed! We have done a search in the Virtual DataPort metadata and content. In the next section, we will explore other advanced functions of the Data Catalog!
In this section, we will explore the features of the Data Catalog metadata.
A useful feature of the Data Catalog is the ability to display view metadata, such as the View Description, as well as the Field Descriptions. By default, this metadata is obtained from Virtual DataPort server but the Data Catalog allows users to update the view and field descriptions.
In our example we are going to add descriptions to the fields of the bv_crm_client
view, to allow more specific discovery of this view. Let's see how to modify that information.
Editing View and Field Descriptions
- Navigate to the
Summary
page of thebv_crm_client
view and clickEdit
option beside Description.
- Add the appropriate description to the View (for example, Base view of the Client table on the CRM data source) and click
Ok
.
- Similarly, add a description to fields by navigating to the
Schema
tab and click on theEdit
option under each field.
client_id: Unique client identifier
name: First name of the customer as captured on the CRM
surname: surname of the customer as captured in the CRM
client_type: Indicates the client type, either 01 or 02
- The view now displays the added descriptions. These descriptions are saved in the Data Catalog metadata.
You can use the Denodo Data Catalog to add tags
and categories
to views.
Tags & Categories are useful to allow users to search with more accuracy through the Data Catalog. While the amount of Data Sources and Views is small in our tutorial, it will pay off over the long term to maintain good Categorization and Tagging habits to allow users to navigate the Data Catalog more easily.
Creating Categories
- Navigate to
Administration > Set-up and Management
- In the Administration window under
Catalog Management
, click onCategories
option
- Click the
+ Add Category
icon.
- Create a category with the following details:
Name: Customer
Description: Data products related to customers
- Create another category with the following details:
- Name: CRM
- Description: CRM System
- Parent: Customer
- Create a final category with the following details:
- Name: Billing
- Description: Customer Billing
- Parent: Customer
We now have a useful set of categories to link to our Views.
Creating Tags
- Navigate to
Administration > Set-up and Management
. - In the Administration window under
Catalog Management
, click on theTags
option. - Click the
+ Add Tag
icon and create a new Tag with the following details:
- Name: #public
- Description: Data that is readily available for public use
- Create another tag with the following details:
- Name: #confidential
- Description: Sensitive data that could negatively impact operations or execution processes
We now have a useful set of tags to link to our Views.
- We can now navigate to the
bv_crm_client
view and click on theAdd Tag
/Add Category
buttons in theSummary
tab.
- Click on
Add Category
, selectCRM
, and then click onOk
.
- Now click on the
Add Tag
button, select#confidential
, and clickOk
.
- We have now added this view to the
Customer > CRM
category and have tagged it with the#confidential
tag.
Browse using Tags & Categories
Finally, let's do a quick review on how to browse your views and web services by tags, go to Browse > Tags
.
- In the sidebar, you will see the list of tags available in the Data Catalog.
- Click the tag
#confidential
to see the elements that have been assigned with this selected tag (select theViews
tab).
- Similarly you can browse by categories, going to
Browse > Categories
!
We have now seen how the effective use of the Categories and Tags can enable powerful data exploration.
In the next sections we are now going to explore other features that offer more in-depth interrogation of a view in the Data Catalog. This includes:
- Querying a view, Saving the query, filtering results and exporting results to a file
- Creating new fields with the Data Preparation wizard
- Exploring view relationships and data lineage
From the previous section, we have selected our bv_crm_client
view. We can now explore the contents of this view.
From the Query
tab, ad-hoc queries can be run against the view (the query is created graphically). For example, select the following fields and drag them into the Output columns area.
client_id
,name
,surname
,client_type
Now click Execute
, to get the results:
Of course, the Data Catalog allows exporting the results! You can select CSV, HTML, Excel or Tableau as output format by clicking the button.
More Options Available When Querying a View
If we want to filter the results of the view, and, for example, order the results by the surname, we can easily do so. Click the Definition
link to bring back the query options.
Begin by dragging the field by which we want to filter, for example dragging field client_type to the Filters section. We will now need to add an expression, we can add =
and '02'
.
We also add the surname field to the Order By section for which we want to order the results by, and click the arrow to change the Order By to descending order.
Now click Execute
. The results now are filtered to only include results for customer_type = '02'
, and the results are ordered by the surname
field.
You can click on the button to see the query that is being executed in Denodo!
Adding New Output Fields
Let us consider the scenario where we want to combine the name
and surname
fields into a new full_name
field. We can do this by concatenating the name
with the surname
following these steps:
- Go back to the
Definition
of the query. In the Output columns section, click on three dots and then click onAdd
option.
- In the "New output field" dialog, click on the
Edit
button beside the Field name column and provide field name asfull_name
and Expression asconcat(name,'
',surname)
- Our results include the newly created
full_name
field.
- If we would like to save this query for later use, we can click
. This will save the query under the
Query > My Queries
section of the top menu.
The next tab is the Data Preparation tab, which is a WYSIWYG (What You See Is What You Get) editor, which allows you to do additional transformations to the data in a simple way. For example, let's say we want to flag all clients without proper name
defined, which is those with name fields with less than 2 characters in length.
- Navigate to the
Data preparation
tab and select the following fields to theOutput columns
sectionclient_id
,name
,surname
,client_type
- Click the
Load data
button to import some rows of the data set to the wizard. - The next
Definition
tab will open. Click theAdd column > Add column from wizard
button
- Specify the following details:
- Column name: invalid_name_flag
- Column type: boolean
- Fill the column with: an expression value
CASE WHEN LEN(name) < 2 THEN true ELSE false END
- Click
Ok
- Filter the data by clicking the
Add filter > Add filter from wizard
button
- Use the newly created
invalid_name_flag
column we added in the previous step and clickOk
- In the
Data preparation
section of the page, you can see all the transformations that has been done to the data set
- Click the
Execute
button to retrieve all the results of the transformation.
Relationships
The Denodo Data Catalog is able to show the associations/relationships between the views. This is useful for the business user to understand how certain views are related.
This information is shown under the Relationships
tab.
Data Lineage
The lineage tab displays a tree graph with all the data sources and views used to build the current view.
If we click on one of the fields under View fields, we will be able to see the lineage of a specific field. This is especially useful when dealing with complicated derived views, as we will explore later.
By clicking on a node, you can see the details of the corresponding data source or view (e.g. Name, Type, Description, Projected fields, Join conditions, etc).
Lineage of Complex Views
Let us now view the lineage of a more complex view.
Return to the Search page and search for iv_client_with_bills
. Open this view and navigate to the Data lineage tab and select the primary_phone
field.
We can now see the value of the Data lineage
tab, where we can identify the lineage of the primary_phone
field including all of the operations involved with the field.
In the next section we will explore the Assisted Query feature which lets you explain your needs in natural language via the Natural language query input.
This section explores the Assisted Query feature which lets users explain their needs in natural language via the Natural language query input and invoking external LLM services. From the previous tutorial, we have already configured this feature to use OpenAI's public API, so let's dive in.
- Open again the
bv_crm_client
view and navigate to theAssisted Query
tab - In the Natural language query input, specify the following instruction:
- Give me the name of clients with client type 01
- Click the
Generate
button - In the result, you can see that a VQL SELECT query is generated based on the natural language input, with the corresponding query explanation
- Click the
Execute
button to see the results of the query, this is great! We have used natural language for querying our views!
Finally, let's explore how users can collaborate in the Data Catalog to allow Data Stewards to better communicate with their business users.
In this section, we will review:
- Endorsements
- Warnings
- Deprecation notes to views and web services
- How to request access to a data set
Endorsements
The endorsements are the comments by users on a view or a webservice to show their support. A user can only endorse a view or web service once, meaning, when a new comment is written, the previous endorsement will be replaced.
- To create endorsement, navigate to the
Summary
tab of thebv_crm_client
view and click onCollaboration > Endorse option
.
- In the Endorse dialog, provide the details which you would like other users to see. For example, add the details as follows:
"This Client view is a key component of our model. It is associated with Address and Client Type views to give expanded information about each client." - Click
Ok
to save the endorsement.
- In the
Summary
tab, theEndorsed by
label displays the number of endorsements on this view and their authors. Mouse over on an author say, 'admin' to see the endorsements comment.
Warnings
Warnings are used to write and display the "advise against" messages on views and web services by users. A user can write only one warning against a view or web service.
- To create a warning message, go to the
Summary
tab of the Client view, click onCollaboration > Warn option.
- In the Warn dialog, add the following warning information:
"This view will be updated with delta records once in a week" - Click
Ok
to save the warning message.
- In the
Summary
tab, theWarning by
label displays the number of warnings on this view and their authors. Mouse over on an author say 'admin' to see their warnings.
Deprecation
Deprecations are used for informing users that it is obsolete and should not be used anymore. A user can write only one deprecation about a view or web service.
- To deprecate a view, go to the
Summary
tab of the Client view, and Click onCollaborate > Deprecate option
.
- In the Deprecate dialog, we will add the following deprecation notes:
"This view will be deprecated from next cycle. Users will be notified about the latest view by the end of this month." - Click
Ok
to save the deprecation note.
- In the
Summary
tab of the view, you will see the ⚠ icon in the toolbar and a notification will pop up every time you click on the icon or access the view.
Request Access Workflow
To provide workflow capabilities in Data Catalog, it has introduced some types of requests that will allow users to initiate cases on elements that will be attended by an administrator from a data stewardship perspective. Most frequent types of requests: access requests, data quality issues, metadata changes and general questions have been taken into account.
Setup a user with restricted access to the tutorial views
- To start exploring this feature, log in to the Design Studio and create a new role
restricted
withconnect
andmetadata
privileges to thetutorial
database.
- Create a new user
myuser
and assign it the newly createdrestricted
role
- Log in to the Data Catalog using as an
admin
user and navigate to theAdministration > Setu-up and management > Server > Permissions > Request
page. - Search for the
restricted
role, and assign permission to create access, change, data quality and question requests.
Create access request
- Log in to the Data Catalog as the newly created user
myuser
. - Open the
bv_crm_client
view, and notice that there are only limited tabs available to this user. Right now, the user can only see the metadata of the view. The user does not have permission to execute and retrieve the data.
- To request for an access, create a request by clicking on the
Create request > Access
button
- In the pop up window, specify the details of the request, and click
Ok
Granting access request
- Log in to the Data Catalog as an
admin
user. This user has permission to manage access requests and grant execute permissions to the view - Navigate to the
Administration > Request
management
page - In this page, you can see all the requests that have been created, including the one we have created in the previous step.
- Click the request for
bv_crm_client
view, and change the status to In Progress.
- In the
Messages
tab, you can send a message to inform the requester that the request is being worked on.
- In the Design Studio, the admin user can now grant access to the
bv_crm_client
view for therestricted
role
- Still as an
admin
user, log back into the Data Catalog, and update the request in theAdministration > Request
management
page
- When you log back into the Data Catalog as
myuser
account, and navigate to thebv_crm_client
view, you can now see theQuery
tab and be able retrieve the data for this view.
GREAT! We have now seen how the collaborative features help users in Data Catalog.
In this tutorial, we have only had a limited number of Views, Data Sources, Tags and Categories, but it is clear that through the use of the Data Catalog, business users will be able to explore the companies data, easily and quickly, with minimal overhead on the IT team. We have also learnt about how the feature packs included in the data catalog can be used and how it helps users in a collaborative environment.
Thanks!