Data Preparation¶
Data Catalog offers you several methods to retrieve data from views in Virtual DataPort:
In the VQL Shell you express your information need using a query in VQL, the language required by Virtual DataPort.
In the Query tab of a view you define the different building blocks of a query in a wizard: its output columns, its filters and its sorting. When all parts are ready, you execute the query to obtain the results.
In the Assisted query tab of a view you define your information need in natural language and an engine based on large language models translates it into a VQL query.
Each method is intended to a different audience according to their knowledge on the view schema or their expertise in VQL or SQL.
The Data preparation tab provides you a new way for querying a view in Virtual DataPort, intended for business users with little knowledge on the view schema or VQL, based on the what you see is what you get principle. Unlike the other methods, you start retrieving data from the view. Not the entire data set, but a subset of data that works as a representative example of the view. To build the query you have several actions that modify the data. Apply one action and the data will be updated in real-time, so you perceive the effect of the action at the moment and decide if it is what you need or not. Then apply another action and another one until you are satisfied with the result. Then your query is done and you are ready to execute it on the entire view data set.
In this section, we will see:
How to load data to start a data preparation session.
How to manipulate live data to build your query.
How to execute your query when it is ready.
Note
The Data preparation tab is only available if you have privileges to execute the view and you have been
granted with the data_catalog_data_preparation
role in Virtual DataPort.
Loading Data¶
Every data preparation session starts in the Columns section. There you have two panels, Fields and Relationship fields, with all the available fields from the view and its related views. To start preparing the data you need to choose which fields will participate in the query. To do that just drag-and-drop fields from these two panels to the Output columns panel and click the Load data button when you are ready. You will be redirected to the Definition section, where there is an example of the query results with the fields you have selected in Output columns.
Take into account that the Columns section is always available and you can go back there whenever you want. In case you want to discard your data preparation session and start from the beginning click the Start over button.
Let us see in more detail the available panels in the Columns section.
The Fields panel contains the list of all the fields in the schema of the view in Virtual DataPort. In addition, it
has an extra field called NUMBER_OF_RECORDS
, which represents the number of rows in the query results. It is
equivalent to applying the Count
transformation to a column in the Definition section.
Note
If there are some fields you are not allowed to use in a query according to your column privileges, they will not be listed here.
The icon next to the field name describes the data type of the field. It can take one of the values below:
for the numeric types
int
,long
,float
,double
anddecimal
.for the
text
type.for the date types
date
,localdate
,time
,timestamp
,timestamptz
,intervalyearmonth
andintervaldaysecond
.for the
blob
type.for the
boolean
type.for the
array
type.for the
xml
type.
In addition, a field may be decorated with the icon, which means that the field is mandatory. In the Definition section, you must create a filter on this field to provide a value for it.
The Relationship fields panel contains the list of related views, that is, the views with which the current view has
a to-one association. Notice that related views are labeled as <view name> (<role name>)
.
Note
A related view may appear more than once in the list, since the current view can have several to-one associations with the same view. You can use the role name to differentiate each association.
Each related view has a icon next to its name. Click it to unfold the fields of the related view in Virtual DataPort. Henceforth, we will call them the expanded fields.
Note
If there are some expanded fields you are not allowed to use in a query according to your column privileges in the related view, they will not be listed here.
Expanded fields may be mandatory as well. In case the query depends on a related view with a mandatory field, you must create a filter on the expanded field to provide a value for it.
The Output columns panel defines the schema of the query results, that is, which fields and expressions will be part of it and in which order they will appear. Initially, it only consists of fields from the view and its related views. As new columns are created in the Definition section they are automatically added to this panel to reflect the current schema of the query results.
Notice that in the Output columns panel the expanded fields are labeled as <role name>/<field name>
.
Now it is time to explore all the options that are available in the Columns section.
Add Fields to Output Columns¶
There are several ways of adding a field or expanded field to the Output columns panel:
Drag-and-drop a field or expanded field in the list to Output columns.
Drag-and-drop a related view from Relationships fields to Output columns. All its expanded fields will be added.
Select some fields or expanded fields in the list and drag-and-drop them to Output columns.
Select some fields or expanded fields in the list, click the icon in the panel title and select the Add to output columns option in the menu.
Sort Fields¶
If the schema of the view or its related views contains a great number of fields, it could be difficult to find one specific field. You can use the search bar of each panel to filter fields by name. As an alternative, the field list can be sorted as follows:
Click the icon in the panel title.
Select the option Sort ascending or Sort descending to sort the fields in alphabetical order.
Select the option Default sort to restore the original order of the fields in the view schema.
Reorder Fields in Output Columns¶
The order of the fields in the Output columns panel determines how they will appear in the query results. To reorder a field in the output schema, you have several options:
Click the icon next to the field you want to reorder and select the Move up or Move down option. Repeat this process until the field is in the correct place.
Select those fields in Output columns you want to reorder, click the icon in the panel title and select the Move up or Move down option. Repeat this process until the fields are in the correct place.
Drag-and-drop a field to its final place in the panel.
Select those fields you want to reorder and drag-and-drop them all to their final place.
Remove Fields from Output Columns¶
There are two ways of removing a field or expanded field from the Output columns panel:
Click the icon next to the field you want to remove and select the Remove option.
Select those fields in Output columns you want to remove, click the icon in the panel title and select the Remove option.
Preparing Data¶
Once your initial data has been loaded, you can start building your query from the Definition section. There you have a representative subset of the view data that works as a preview of the query results. Apply actions on these data, one at a time, to manipulate them until you are satisfied with the result. Every action is executed in real-time and the query results are updated accordingly, so you can immediately perceive the effect of the action on the data.
In this section there are three different areas: a table with live query results in the center, the Data preparation panel on the left and the Filters and Transformations panels on the right. Let us see these elements in more detail.
The Data preparation panel contains the actions history, i.e., the list of actions that were applied to build the current query results in chronological order. Each time an action is applied, a new entry is added on this panel with a little description of it. If you want to detach this panel, click the button in the panel title and a floating window will appear with the actions history.
The table in the center shows a preview of the current query results after applying all the actions in the actions history to the view. Depending on your configuration, the data for the preview is obtained directly from the data source or from a sample of it that is locally stored. Either way, the data in the table is updated in real-time with each action.
You can select a column in the table by one of the following methods:
Click the column title.
Click the icon in the column title and select the Open filters & transformations option.
Selecting a column in the table has two consequences. First, it opens its Filters and Transformations panels, which show the applicable actions for the column. In addition, the actions related to the column in the actions history are highlighted, so you can get an idea of the data lineage of the column. For a restricted data lineage of the column that only considers transformations you can put the mouse over the icon in the column title.
To deselect a column you have several options too:
Click the column title again.
Click the icon in the column title and select the Close filters & transformations option.
Click the Deselect column button in the toolbar.
The Filters and Transformations panels show a list of actions in alphabetical order to apply to the selected column. The list of available actions may be different for each column as it depends on the column’s data type. If you are not sure what an action is for, put the mouse over the icon next to the action name to get an explanation about how it works. Panels can be expanded or collapsed with the or icons. In addition, you can use the search bar of each panel to filter actions by name.
A filter is a condition on the column values that a row must meet for it to be returned in the query results. Some filters need some parameters to be defined. If that is the case, a dialog will appear asking for them when you select the filter.
A transformation is a function that converts the column values. This conversion may modify the data type of the column value too. For instance, the length transformation converts a text in its number of characters. The icon next to the transformation name describes the data type of the value returned by the function. It can take one of the values below:
for the numeric types
int
,long
,float
,double
anddecimal
.for the
text
type.for the date types
date
,localdate
,time
,timestamp
,timestamptz
,intervalyearmonth
andintervaldaysecond
.for the
xml
type.for the
aggregation
type.
You can filter the list of transformations by return type. Click the icon in the panel title and select one of the available return types.
A transformation usually works by converting the values in the column, row by row. It takes the value in the column for
the first row, converts it to a new value, and repeats this process for the rest of rows. However, there is a special
case of transformations that do not follow this pattern: aggregation transformations. An aggregation transformation takes
values in the column from several rows and produces a single value by combining them all. Examples of aggregation
transformations are Average
, Count
or Maximum
. A column that uses an aggregation transformation is called an
aggregation column.
Note
Aggregation columns are decorated with the icon in the Output columns panel of the Columns section, so you can differentiate them from non-aggregation columns.
Once an aggregation transformation is applied to the data the semantic of the query changes as follows:
The rows of the query results are partitioned in groups. For each distinct value of the non-aggregation columns, a group of rows is generated with all the rows that contain that value. In case all the columns in the query are aggregation columns, then one group is generated with all the rows of the query results.
For each group, aggregation columns are calculated by applying the corresponding aggregation transformation on the rows of the group.
There will be one row in the query results for each group. The row will consist of:
The values of the non-aggregation columns that generated the group.
The values of the aggregation columns calculated for the group.
After selecting a transformation from the panel, you have to configure how it is applied in a new dialog. Transformations can have parameters that must be defined. In addition, you can indicate where to store the result of the transformation. By default, it is stored in the same column you are transforming. However, you can optionally create a new column with the new value.
Now it is time to explore all the actions that are available in the Definition section.
Apply a Filter¶
To apply a filter on the values of a column you must follow the steps below:
Select the column to filter by in the query results.
Select a filter in the Filters panel.
Fill in all the information requested in the dialog, in case the filter needs more configuration.
As an alternative, you have this other method to apply a filter on the values of a column:
Click the Add filter button in the toolbar.
Select the Add filter from wizard option.
Search for a column to filter by.
Select a filter from the options in the drop-down menu.
Fill in all the information requested, in case the selected filter needs more configuration.
Apply a Filter from Expression¶
In case you need an advanced filter that is not considered among the options in the Filters panel, you can create a filter from a VQL expression following the steps below:
Click the Add filter button in the toolbar.
Select the Add expression filter option.
Enter the VQL expression that represents the condition that the rows must meet to appear in the query results.
Take into account that the VQL editor in the dialog provides the following features to help you write an expression:
Highlighting. Each component of an expression is highlighted with a different color: functions, operators, literals, columns, etc.
Autocomplete. Press
Ctrl
+Space
and a menu with suggestions will appear.Validation. Every time you type in the VQL editor, the expression is validated. If the expression is correct, you will see a icon. If it is incorrect, you will see a icon. Put the mouse over the icon and a tooltip will explain why it is invalid.
See also
Further information on the capabilities of VQL to write conditions and the syntax a condition must follow can be found in the Virtual DataPort VQL Guide.
Apply a Filter from Recommendations¶
Data Catalog can suggest popular filters for you based on other users’ activity. To access all recommended filters proceed as follows:
Click the Add filter button in the toolbar.
Select the Recommendations option.
If you are only interested in popular filters with a condition on a specific column, then you have to:
Select the column for which you want recommended filters in the query results.
Click the icon in the title of the Filters panel.
Either way, a new dialog will appear with a list of recommended filters.
This list considers two kind of recommended filters:
Simple filter. A filter that depends on a column currently in the query results and that can be expressed with a condition in the Filters panel. If you click a filter like this, its corresponding dialog will open.
Filter from expression. A filter represented with a VQL expression, since it does not meet the requirements of a simple filter. If you click this kind of filter, the dialog to create a filter from expression will open. Take into account that the VQL expression can depend on parameters represented as
?
. These are placeholders that must be replaced with actual values.
Note
Recommendations about filters and derived fields feature is only available with the AI & Recommendations FeaturePack. To find out the subscription bundle you have, open the About dialog of the Data Catalog.
Apply a Transformation¶
To apply a transformation on the values of a column you must follow the steps below:
Select the column to transform in the query results.
Select a transformation in the Transformations panel.
Fill in all the information requested in the dialog.
Confirm if you want to store the new value in the same column you are transforming or to create a new column with it.
Apply a Transformation from Recommendations¶
Data Catalog can suggest popular transformations for you based on other users’ activity. To access all recommended transformations proceed as follows:
Click the Add column button in the toolbar.
Select the Recommendations option.
If you are only interested in popular transformations on a specific column, then you have to:
Select the column for which you want recommended transformations in the query results.
Click the icon in the title of the Transformations panel.
Either way, a new dialog will appear with a list of recommended transformations.
This list considers two kind of recommended transformations:
Simple transformation. A transformation that depends on a column currently in the query results and that can be expressed with a transformation in the Transformations panel. If you click a transformation like this, its corresponding dialog will open.
Transformation from expression. A transformation represented with a VQL expression, since it does not meet the requirements of a simple transformation. If you click this kind of transformation, the dialog to create a new column from an expression will open. Take into account that the VQL expression can depend on parameters represented as
?
. These are placeholders that must be replaced with actual values.
Sort by Column¶
The order of the rows in the query results is undefined by default. To establish an order you can sort the rows by the values in a column as follows:
Click the icon of the column you want to sort by.
Select the Sort A to Z or Sort Z to A option.
For those rows that share the same value in the column you have selected, the order is still undefined. You can repeat this process on another column, adding a second level of sorting, to establish how those rows should be ordered.
Create a New Column¶
In the previous sections we have seen that you can apply a transformation or recommended transformation and store the result in a new column. Let us explore what other options you have to create a new column in the query result.
To create a new column from scratch and give it a value proceed as follows:
Click the Add column button in the toolbar.
Select the Add column from wizard option.
Enter a name for the new column.
Enter a value for the new column, which can be one of the following options:
A constant value. All rows will have the same value. You need to specify the column type.
A column value. The new column will have the same value as the column you choose.
A transformation. First select the column and the transformation. And then fill in all the information requested in the dialog, in case the transformation needs more configuration.
Nothing, it will be empty. All rows will have
NULL
as a value. You need to specify the column type.An expression value. Define the value as a VQL expression as complex as you want.
Note
Because the syntax for the expanded fields is
<role_name>/<field>
, you cannot use the/
as an operator to create a VQL expression. Use the functionDIV
instead.See also
A comprehensive list with all the functions you can use to build a VQL expression can be found in the Virtual DataPort VQL Guide.
To create a new column with the same value of other column you have the following alternative:
Click the icon of the column you want to duplicate.
Select the Duplicate column option.
Enter a name.
To create a new column from a VQL expression that depends on other column you have the following alternative:
Click the icon of the column you want to use in the VQL expression of the new column.
Select the Create new column from expression option.
Enter a name.
Complete the VQL expression for the new column.
To add a field from the view or a related view as a new column follow the steps below:
Go to the Columns section.
Add a field or expanded field to the Output columns panel.
Click the Load data button.
Remove a Column¶
There are several methods to remove a column from the query results:
In the query results, click the icon of the column you want to remove and select the Remove option.
In the actions history, click the icon of an action that created a column and select the Remove column option.
In the actions history, select several actions that created a column, click the icon in the panel title and select the Remove option.
Go to the Columns section, click the icon of the column you want to remove from the Output columns panel and select the Remove option.
Go to the Columns section, select several columns in the Output columns panel, click the icon in the panel title and select the Remove option.
Removing a column from the query results has the following effects:
The column is hidden from the query results.
A new action is added to the actions history indicating that the column was removed.
All actions in the actions history that depend on the removed column are disabled to indicate that they are not considered to build the query results.
If you are satisfied with the new query results, you have to confirm the removal of the column for its effects to be definitive. To do that, follow the steps below:
Remove all depending actions on the removed column.
Click the icon of the removal action.
Select the Remove column option.
In this way the column is completely removed. Take into account that the action that created the column and the action that removed the column will not be visible anymore in the actions history.
Instead, if you changed your mind and want to revert the removal action, then proceed as follows:
Click the icon of the removal action.
Select the Undo option.
Reorder a Column¶
The order in which the columns appear in the query results is not fixed. You can change it following one of the methods below:
Drag-and-drop the column title until it is placed in the appropriate location.
Click the icon of the column you want to reorder and select the Move to the left or Move to the right option. Repeat this process until the column is in the correct place.
Go to the Columns section and reorder the columns in the Output columns panel.
Rename a Column¶
To change the name of a column follow the steps below:
Click the icon of the column you want to rename.
Select the Rename column option.
Enter a new name for the column.
Edit an Action¶
Some actions in the actions history can be edited. For instance, applying a filter or a transformation that depends on some parameters or a VQL expression. To edit an action like this, follow the steps below:
Click the icon of the action you want to edit.
Select the Edit option.
Enter a new value for the parameter in the dialog that just opened.
Remove an Action¶
We have already seen how to remove an action that creates a column. Any other action in the actions history can be removed too. You have the following alternatives:
Click the icon of the action you want to remove and select the Remove option.
Select several actions in the actions history, click the icon in the panel title and select the Remove option.
As a consequence, the actions are removed from the actions history and the query results updated accordingly.
Check the VQL of the Query¶
Even if you do not realize it, Data Catalog is building a VQL query on the view as a consequence of your data preparation session. At any moment you can check it. Just click the VQL button in the toolbar. If you want to manually edit the VQL query, click the Open in VQL Shell button.
Save the Query¶
At any moment during your data preparation session you can save the current query for later, which would be useful in the following scenarios:
The query is not finished yet and you want to continue later.
The query is too complex and you will need to execute it often.
You want to share the query with other person in your company.
You want to deploy the query as a view in a Virtual DataPort server.
To save your data preparation session, proceed as follows:
Click the Save button in the toolbar.
Enter a name for the query and, optionally, a description.
You will find your saved query in the My queries section.
Note
Take into account that:
No other user will have access to your saved query, unless you share it.
Your saved queries are associated to the Virtual DataPort server you are connected to.
See also
For further information on what actions you can perform with saved queries, check the My Queries section.
Executing the Query¶
During your data preparation session you are manipulating a representative subset of the view data. Once you are satisfied with the query results in the Definition section you can perform the same manipulations on the entire view data set. Click the Execute button in the toolbar and you will be redirected to the Results section. There you can check the query results executed on the view, export them to a file using the Export button in the toolbar or navigate though the data of its related view clicking the icons in the query results.