The Data Catalog is a web based self service tool included in Denodo Platform that lets both technical and business users query, search and browse information and metadata stored in a Virtual DataPort server. With this tool, users can generate new knowledge and pave the way to take better decisions.
In this tutorial, we are going to show this use case:
The IT / Data department of our company has frequent requests for access to data. These requests are usually not informed as to the types and locations of company data, and usually the requests take much longer to process than is necessary due to the lack of understanding of the underlying systems by the business user.
For solving that use case, following this tutorial you will learn how to:
- Use Denodo Data Catalog for exploring the Denodo server metadata
- Learn how to build queries graphically to get data
- For Administrators only - Index data to enable Google-like user search queries
- Search for data using Google-like queries
- How to edit Data Catalog metadata: Tags & Categories
- How users can utilize the Recommendations and Collaborate in Data Catalog
If you have followed previous tutorials, in your Virtual DataPort you will have something similar to this:
Launching the Data Catalog
The Data Catalog is a software distributed as a web application included as part of the Denodo 8.0 that offers data analysts, business users and application developers searching and browsing capability of data and metadata in a business friendly manner for self-service exploration and analytics.
For starting this web tool, you have to open the Denodo Platform Control Center, and start the Data Catalog. Once it changes the status to "Running", click the Data Catalog link to open the Web tool (by default, https://127.0.0.1:9090/denodo-data-catalog).
Now login to the Data Catalog using the standard login details (
The first time you login to the Data Catalog, you will notice the
Synchronize Metadata popup window. This needs to be run when you open the Data Catalog for the first time, in order to ensure that the Data Catalog reflects the latest state of the Denodo 8.0 server you are connected.
Run the VDP Synchronization as follows:
- Click the
Synchronize the metadata nowlink.
Continueon each Synchronization step.
- The views are now synchronized so you can start exploring!
Using the Metadata Search
Our first example is from the Data Catalog home screen.
Let's use the scenario of the Business Analyst to explore a simple use case, by searching for clients, by typing in
client and hitting enter.
Here we have the results of our search. From Data Catalog 8.0, this search will seek views or web services that contain the query terms in the element's metadata, such as:
- Its name.
- Its description.
- The names of its fields.
- The descriptions of its fields.
- The values of any custom properties it has assigned.
For example, let's click on the view
client to be taken to the summary of the selected view:
For now we have done a search in the Virtual DataPort metadata. In the next section, we will investigate more advanced functions of the Data Catalog!
We are now going to explore the features that offer more in depth interrogation of a view in the Data Catalog. This includes:
- Querying a view and filtering results
- Exporting results to a file
- Creating new fields
- Saving queries
- Exploring view relationships
- Exploring data lineage
- Querying views with relating fields
Data Catalog View Exploration
From the previous section, we have selected our
client view. We can now explore the contents of this view.
Under the Summary Tab, we can see a summary of the selected view. It will show the metadata of the selected view such as the database name, the list of the categories, the list of the tags, collaboration information like Endorsement and Warnings provided by the user. Clicking on the
Edit button beside the Description option you will be able to edit the description of the view. In case, the view is deprecated, an indication will appear in the summary tab at the top.
Additionally, the Summary tab includes buttons like Add Tags/Categories (more details in this section of the tutorial), Collaboration options to customize the view further and also buttons like Connection URLs, Tableau to show different ways to connect to the view/datasource.
Schema Tab, we can see a schema of the view, with the view description, all the fields and types. Clicking on the
Edit button beside the column we can add the field description. We can also search for fields, data types and descriptions using the search option on top of each section.
The next tab is the Query Tab. Here Ad-hoc queries can be run against the view (the query is created graphically).
For our view, select the following fields all and drag the fields into the Output columns area.
Execute, to get the results:
Of course, the Data Catalog allows to export the results! You can select CSV, HTML, Excel or Tableau as output format by clicking the
More options are available when querying a view
If we want to filter the results of the view, and order the results by price, we can easily do so. Click the
Definition bar to bring back the query options.
Begin by dragging the field by which we want to filter, for example dragging field client_type to the Filters section.
We will now need to add an expression, we can add
'02'. We also add the surname field to the Order By section for which we want to order the results by, and click the arrow to change the Order By to descending order.
Execute. The results now are filtered to only include results for
client_type = '02', and the results are ordered by the
We can further manipulate the resulting set by using the
Let us consider the scenario where we want to combine the
surname fields into a new
full_name field. We can do this by concatenating the name with the surname following these steps:
- In the Output columns section, click on three dots and then click on
- In the "New output field" dialog, click on the
Editbutton beside the Field name column and provide field name as
full_nameand Expression as
- Our results include the newly created
- If we would like to save this query for later use, we can click
Save. This will save the query under the My Queries section.
The next tab is the Relationships tab, which shows the associations created between views.
This is useful for the business user to understand how certain views are related. You can click on the ‘i' icon to see the related view information.
Queries involving views with relationships
It is possible to join and execute simple queries in the Data Catalog by using the
Relationship Fields option. These relationships are the same as explored in the
Relationships tab, which are defined in the Virtual DataPort Server.
Let's return to the
Query tab of the
client view. In the
Relationship Fields section, we see address. This is due to the relationship defined in the Linked Data tutorial. Now you can add the field
address / state (see screenshot below):
If we execute this view, we will see the results set contains the newly added
address / state field.
Data Lineage Tab:
The lineage tab displays a tree graph with all the data sources and views used to build the current view.
If we click on one of the fields under View fields, we will be able to see the lineage of a specific field. This is especially useful when dealing with complicated derived views, as we will explore later.
By clicking on a node, you can see the details of the corresponding data source or view (e.g. Name, Type, Description, Projected fields, Join conditions, etc).
Lineage of Complex Views
Let us now view the lineage of a more complex view.
Return to the Search page and search for
client_with_bills. Open this view and navigate to the Data lineage tab and select the
We can now see the value of the
Data lineage tab, where we can identify the lineage of the primary_phone field including all of the operations involved with the field.
In the next section we will explore Indexing data to enable the Content search functionality (note that until now, the Search form was only searching in the Metadata but not in the data returned by those views!).
Note: the next section is oriented to technical people who wants to know how to enable that functionality. If you need only to learn how to use it, please skip that section clicking here.
In this section We will explore the features of the Data Catalog Content Search. With this feature you can use Denodo Scheduler to index the content of your views using either ElasticSearch or the Denodo Scheduler Index Server. You can then allow your users to perform Google-like searches on them, and to customize how they see the search results.
In our example we are going to index the fields of the client view, to allow more rapid discovery of client details.
Index Creation & Configuration
Our first Step is to configure an Index. Let's see how to do that using Denodo Scheduler
Creating an Index in the Denodo Scheduler
Start the Scheduler Server, the Scheduler Index Server and the Scheduler Administration tool from the Denodo Control Center. Once these are all running, open the Scheduler Administration Tool by clicking on the link (by default: http://127.0.0.1:9090/webadmin/denodo-scheduler-admin).
Create the Index following these steps
- In the login screen of the Scheduler Administration Tool, provide the login details
admin / adminand URI of the Scheduler Server. The URI of the server has the format //<host>:<port>.
- In the Denodo Scheduler we need to create a new job to create and maintain the Index. Click
Add Job > VDPIndexer
- Give the Job a suitable name, in this case
- Choose the following settings Under the
Extraction section, while leaving the rest to default/blank:
- Data Source: vdp
- Database: tutorial
- View: tutorial.client
- Indexing process name: tutorial.client
- Under the
Exporters section, click Add Exporter > Scheduler-Index and choose the following settings while leaving the rest to default/blank:
- Data Source: Scheduler-Index
- Index name: ix_client
- Save the Scheduler Job. Once the job is saved, you can execute the job by clicking three dots under the Processed (Tuples/Errors) column and then the
- The job will execute and once successfully complete, the
Resultstatus will change to
COMPLETE, indicating that the Index has been populated.
Configuring the Index in the Data Catalog
We now need to configure the newly created Index in the Data Catalog, in order to ensure that the Data Catalog includes the Index as part of the searchable content.
- Open the Data Catalog and navigate to
Administration > Set-Up > Content Search.
- In the Administration window, click on
- Click on + Add server option under
Index Servers tab.
- Add the details as follows to the
Add New Index Serverscreen.
- Name: TutorialIndex
- Type: Scheduler Index
- Description: Tutorial Index
- Host: localhost
- Port: 9000
- Login: admin
- Password: admin
- Go to the
Configuration tab, Click the Pencil Icon.
- In the
Search Index Pathscreen, add the following details:
- Index Type: Scheduler Index
- Index Server: TutorialIndex
- Index Name: ix_client
- The Index will display a green checkmark under the Configured column to indicate that the Index was added successfully.
DONE! In the next section we will see our new Index in action!
We can now use the Index feature to explore data using the Content Search function.
Indexed View Exploration
- In the Data Catalog, navigate to the Search page, and select the following options:
- Data type: Content (this option appears only after configuring the index following the steps of the previous section)
- Database: tutorial
- View: client
Jamesinto the search field and hit
Enterto run the search:
- The search will return all Content that includes the string
James. Click the Plus Icon (+) next to the Preview results in order to expand the results to show the field that matches the search. expand the results to show the field that matches the search.
- You can also click the
Clientview name and see the filtered data. Using the
Searchtab, you can search the index directly.
- For example, we can now search
Jack, and the results from the Index are returned.
Completed! In the next section we will explore the features of the Data Catalog View metadata.
In this section We will explore the features of the Data Catalog metadata. With this feature you can use Denodo Data Catalog to add
categories to views, as well as update the view and field descriptions.
In our example we are going to: (1) add descriptions to the client fields, to allow more specific discovery of this view, (2) add tags and categories and (3) apply them to our
Data Catalog Metadata
A useful feature of the Data Catalog is the ability to display view metadata, such as the View Description, as well as the Field Descriptions. Let's see how to modify that information.
Editing View and Field Descriptions
- Navigate to the
Summarypage of the Client View and click
Editoption beside Description.
- Add the appropriate descriptions to the View and click
- Similarly, add a description to fields by navigating to the
Schematab and click on the
Editoption under each field.
- The view now displays the added descriptions. These descriptions are saved in the Data Catalog metadata. (Note: It is recommended to synchronize the metadata with the Virtual DataPort server inorder to keep the Data Catalog synchronized. You have to use the option
Administration > Sync with VDP.
Please note that you can synchronize the Virtual DataPort server metadata changes in Data Catalog but the Tags & Categories created in Data Catalog cannot be synchronized with the Virtual DataPort Server.
Adding Categories and Tags to the Data Catalog metadata
Tags & Categories are useful to allow users to search with more accuracy through the Data Catalog. While the amount of Data Sources and Views is small in our tutorial, it will pay off over the long term to maintain good Categorization and Tagging habits to allow users to navigate the Data Catalog more easily.
- Navigate to
Administration > Set-up and Management.
- In the Administration window under
Catalog Management, click on
- Click the
+ Add Categoryicon.
- Create a category with the following details:
- Name: Customer
- Description: Data sources relating to customer
- Create another category with the following details:
- Name: CRM
- Description: Acme_crm System
- Parent: Customer
- Create a final category with the following details:
- Name: Billing
- Description: Customer Billing
- Parent: Customer
We now have a useful set of categories to link to our Views.
- Navigate to
Administration > Set-up and Management.
- In the Administration window under
Catalog Management, click on the
- Click the
+ Add Tagicon and create a new Tag with the following details:
- Name: JDBC
- Description: JDBC data sources
- Create another tag with the following details:
- Name: SOAP
- Description: SOAP Data Sources
We now have a useful set of tags to link to our Views.
Modify views for adding Categories and Tags
- We can now navigate to the
Clientview and click on the
Add Categorybutton in the
CRMand then click
- Now select the
Add Tagbutton in the
Summarytab and select
- We have now added this view to the
Customer > CRMcategory and have tagged it with the
Browse using Tags & Categories
- To start browsing your views and web services by tags, go to
Browse > Tag.
- In the sidebar, you will see the list of tags available in the Data Catalog.
- Click the tag
JDBCto see the elements that have been assigned with this selected tag.
- Similarly to browse by categories, go to
Browse > Category.
- From the list of categories in the sidebar, expand the category
Customer > CRMto explore its subcategories.
CRMto explore its views and webservices.
We have now seen how the effective use of the Descriptions, Categories and Tags can enable powerful data exploration.
In the next section, we will explore about Recommendations and Collaborate options in Data Catalog
In this section, we are going to explore the new features of Data Catalog 8.0, offered as two feature packs:
- The "AI Feature Pack" provides AI-driven recommendations of datasets to users.
- The "Semantics Feature Pack" allows for collaboration among users by adding endorsements, warnings and deprecation notes to views and web services.
The Feature Packs are licensed separately from the Denodo Platform. To begin using a Feature Pack you do not need to install a new component, only install a new license file.
In our example, we will focus on these feature packs using the
AI Feature Pack
The AI Feature Pack includes the Automatic recommendation of datasets in the Data Catalog to help you discover new elements among the data resources of your company.
Automatic recommendation of datasets in the Data Catalog
With this feature, Data Catalog displays personalized recommendations to the users, based on the past activity in the Data catalog such as datasets that are most used, recently used, recommended etc.
- To see the recommendations, go to the homepage of the Data Catalog.
- The homepage presents you with a selection of items organized by different topics including a topic named
Recommended to you. This recommendation of datasets is only available with the AI Feature Pack.
Semantic Feature Pack
The Semantics Feature Pack includes Collaboration in Data Catalog to allow Data Stewards to better communicate with their business users.
Collaboration in Data Catalog
In this section, we will see how we can create the following collaborative options in the Client view.
- Warnings and
- Deprecation notes to views and web services.
The endorsements are the comments by users on a view or a webservice to show their support. A user can only endorse a view or web service once, meaning, when a new comment is written, the previous endorsement will be replaced.
- To create endorsement, navigate to the
Summarytab of the Client view and click on
Collaboration > Endorse option.
- In the Endorse dialog, provide the details which you would like other users to see. For example, add the details as follows:
"This Client view is a key component of our model. It is associated with Address view to give expanded information about each client."
Okto save the endorsement.
- In the
Endorsed bylabel displays the number of endorsements on this view and their authors. Mouse over on an author say, 'admin' to see the endorsements comment.
Warnings are used to write and display the "advise against" messages on views and web services by users. A user can write only one warning against a view or web service.
- To create a warning message, go to the
Summarytab of the Client view, click on
Collaboration > Warn option.
- In the Warn dialog, add the following warning information:
"This view will be updated with delta records once in a week"
Okto save the warning message.
- In the
Warning bylabel displays the number of warnings on this view and their authors. Mouse over on an author say 'admin' to see their warnings.
Deprecations are used for informing users that it is obsolete and should not be used anymore. A user can write only one deprecation about a view or web service.
- To deprecate a view, go to the
Summarytab of the Client view, and Click on
Collaborate > Deprecate option.
- In the Deprecate dialog, we will add the following deprecation notes:
"This view will be deprecated from next cycle. Users will be notified about the latest view by the end of this month."
Okto save the deprecation note.
- In the
Summarytab of the view, you will see the ⚠ icon in the toolbar and a notification will pop up every time you click on the icon or access the view.
GREAT! We have now seen how the recommendations and collaborative features help users in Data Catalog.
In this tutorial, we have only had a limited number of Views, Data Sources, Tags and Categories, but it is clear that through the use of the Data Catalog, business users will be able to explore the companies data, easily and quickly, with minimal overhead on the IT team. We have also learnt about how the feature packs included in the data catalog can be used and how it helps users in a collaborative environment.