The Denodo Data Catalog is a web based self service tool included in Denodo Platform that lets both technical and business users query, search and browse information and metadata stored in a Virtual DataPort server. With this tool, users can generate new knowledge and pave the way to make better decisions.
It is strongly advised to complete the Data Services tutorial before starting this one, as we are going to be using the resources (data source, base views and integrations) created on it.
If you have already completed it, please proceed to the next section. Otherwise, follow the subsequent instructions:
- Launch the resources needed (check how in the Installation & Bootstrapping tutorial).
- Log in Denodo Design Studio (user/password:
admin
/admin
) - Import this VQL to Denodo by clicking into
File > Import
. Drag ‘n' drop the file, selectUse custom password for sensitive data decryption
, and enterdenodo
in thePassword
field.
If all steps have been executed correctly, you should observe the following in the Design Studio's elements tree:
Great! Now it's time to start the tutorial!
- For starting this web tool in a local installation, you have to open the Denodo Platform Control Center, and start the Data Catalog. Once it changes the status to "Running", click the Data Catalog link to open the Web tool (by default: https://127.0.0.1:9090/denodo-data-catalog).
Now login to the Data Catalog using the standard login details (admin
/admin
):
The first time you login to the Data Catalog, you will notice the Synchronize Metadata
popup window. This needs to be run when you open the Data Catalog for the first time, in order to ensure that the Data Catalog reflects the latest state of the Denodo Virtual DataPort server you are connected to.
Run the VDP Synchronization as follows:
- Click the
Synchronize the metadata now
link. - Click
Continue
on each Synchronization step. - Once the synchronization is complete, you should now see the views and web services available in the tree view when navigating to the
Browse > Database / Folders
page
Great! Now let's continue with the next setup.
In this section We will explore the features of the Data Catalog Content Search. With this feature you can use Denodo Scheduler to index the content of your views using either ElasticSearch or the Denodo Scheduler Index Server. You can then allow your users to perform Google-like searches on them, and to customize how they see the search results.
We are going to index the fields of the bv_crm_client view to allow more rapid discovery of client details.
Our first Step is to configure an Index. Let's see how to do that using Denodo Scheduler
Creating an Index in the Denodo Scheduler
Start the Scheduler Server and the Scheduler Index Server (you can follow the latest section of the Caching tutorial to know how to start Denodo Scheduler). Once these are all running, open the Scheduler Administration Tool for creating an Index.
Create the Index following these steps:
- In the login screen of the Scheduler Administration Tool, provide the login details
admin / admin
and URI of the Scheduler Server.
- In the Denodo Scheduler we need to create a new job to create and maintain the Index. Click
Add Job > VDPIndexer
- Give the Job a suitable name, in this case
index_clients
.
- Choose the following settings Under the
Extraction section
, while leaving the rest to default/blank:
- Data Source: VDP
- View: tutorial.bv_crm_client
- Indexing process name: tutorial.bv_crm_client
- Under the
Exporters section
, clickAdd Exporter > Scheduler-Index
and choose the following settings while leaving the rest to default/blank:
- Data Source: Scheduler-Index
- Index name: ix_client
- Save the Scheduler Job.
- Once the job is saved, you can execute the job by clicking three dots and then the
Start
option.
- The job will execute and once successfully complete, the
Result
status will change toCOMPLETE
, indicating that the Index has been populated.
Configuring the Index in the Data Catalog
We now need to configure the newly created Index in the Data Catalog, in order to ensure that the Data Catalog includes the Index as part of the searchable content.
- Open the Data Catalog and navigate to
Administration > Set-up and management
. - In the Administration window, click on
Search Engines
option
- Click on + Add server option under
Index Servers tab
.
- Add the details as follows to the
Add New Index Server
screen.
- Name: TutorialIndex
- Type: Scheduler Index
- Description: Tutorial Index
- Host: sched (or localhost if you are using a local installation of Denodo)
- Port: 9000
- Login: admin
- Password: admin
- Click
Ok
.
- Go to the
Content Search tab
, Click the Pencil Icon.
- In the
Search index path
screen, add the following details:
- Index Type: Scheduler Index
- Index Server: TutorialIndex
- Index Name: ix_client
- Click
Ok
.
- The Index will display a green checkmark under the Configured column to indicate that the Index was added successfully.
Done! In the Using Denodo Data Catalog tutorial you will see the new Index in action. For now, let's continue setting up other configurations of the Data Catalog.
In this section, we will explore the feature of creating Property Groups. With this feature, we can use the Denodo Data Catalog to define custom properties; add them to databases, views and web services; and give them a value on each element in order to improve the available information on them.
In our example, we are going to create a new property group
containing custom properties
to identify the data owner of a view, and, as an example, assign it to the bv_crm_client
view.
Creating a Property Group
Our first step is to create a property group to specify information for Data Ownership
- Open the Data Catalog and navigate to the
Administration > Set-up and management
. - In the Administration window, click on the
Property Groups
option.
- In the
Property group management
page, click on the+ Add Property Group
button
- In the pop-up page that appears, specify the following details:
Name: Data Ownership
Description: This property group is to specify the data owner information of the data asset
Show in: Summary tab
- Click
Ok
Adding a Custom Property to the Property Group
Now that we have created the Property Group, it's time to add some Custom Properties to it to specify what kind of information we want to define to enhance the information of the views.
- In the
Property group management
page, click on the three dots button located at the rightmost section of theData Ownership
property group, and selectManage Properties
option.
- In the next page, click the
+ Add Property
button.
- In the pop-up page that appears, specify the following details:
Name: Data Owner Name
Description: The name of the data owner
Property type: Text
- Click Ok
- Once we are back to the
Properties Management
page, we can add more custom properties to the property group, but for now, we will just add one property for this tutorial
Assigning the Property Group to a View
We can now assign our newly created Property Group with the relevant Custom Property to our bv_crm_client
view to enhance it with the Data Ownership information.
- Navigate back to the
Administration > Set-up and management > Property Group
page. - Click on the three dots button located at the rightmost section of the
Data Ownership
property group, and selectAssign Property Group
option.
- In the
Assign property group
page, select thebv_crm_client
view, and drag it towards theViews
section.
- In the pop-up page that appears, we can now specify the value of the
Data Owner Name
property, which we configured earlier.
SpecifyJane Smith
in the text box and clickOk
.
- Now, navigate to the
Browse > Databases /Folders
page, and click on thebv_crm_client
view to open it.
- In the Summary tab of the view, we can now see the Data Ownership information of this view
Great! Let's continue setting up other configurations of the Data Catalog.
In this section, we will explore how to configure one of the best features of Data Catalog: the Assisted Queries. This feature lets you explain your data needs in natural language via the Natural language query input. Providing more details not only refines the search results but also elevates the quality of the displayed information. This extends also to the metadata of the view.
In this tutorial we will set up the required configuration to start using the Assisted Query feature.
Create an OpenAI API Secret Key
The Data Catalog provides 3 options on the API to use for the LLM services so you need the credentials of the service in order to configure it in the Data catalog.
In this tutorial, we will use the official public OpenAI API, as they provide $5 credit for new users (as of June 2024). So you can create a new account if needed to test this feature.
- Once you have your OpenAI account, in a web browser, access the https://platform.openai.com/api-keys page
- Navigate to
API keys
menu and click the+ Create new secret key
button
- In the pop up window, specify the name as
DenodoAppKey
and click theCreate secret key
- Copy and keep the
secret key
somewhere for use in the next section
Configure the Assisted Query feature in the Data Catalog
- Login to the Data Catalog and navigate to the
Administration > Set-up and management
, click onServer
, and go to theAI Integration
tab.
- Specify the following information:
Enable query generation: Enabled (please read the Disclaimer shown in the Data Catalog)
Enable data usage: disabled
Language options: User locale
Execution mode: Manual
API: OpenAI
API key: thesecret key
obtained from previous section
Organization ID: theservice account id
added in the previous section (e.g. my-denodo-data-catalog)
Model: gpt-3.5-turbo-16k
Done! We are going to see the Assisted Query feature in action in the Using Denodo Data Catalog tutorial.
In this tutorial, we have configured various Data Catalog features which we are going to explore in the next. Head on to the next tutorial to see how these features work.
Thanks!