USER MANUALS


Data Preparation

Data Catalog offers you several methods to retrieve data from views in Virtual DataPort:

  • In the VQL Shell you express your information need using a query in VQL, the language required by Virtual DataPort.

  • In the Query tab of a view you define the different building blocks of a query in a wizard: its output columns, its filters and its sorting. When all parts are ready, you execute the query to obtain the results.

  • In the Assisted query tab of a view you define your information need in natural language and an engine based on large language models translates it into a VQL query.

Each method is intended to a different audience according to their knowledge on the view schema or their expertise in VQL or SQL.

The Data preparation tab provides you a new way for querying a view in Virtual DataPort, intended for business users with little knowledge on the view schema or VQL, based on the what you see is what you get principle. Unlike the other methods, you start retrieving data from the view. Not the entire data set, but a subset of data that works as a representative example of the view. To build the query you have several actions that modify the data. Apply one action and the data will be updated in real-time, so you perceive the effect of the action at the moment and decide if it is what you need or not. Then apply another action and another one until you are satisfied with the result. Then your query is done and you are ready to execute it on the entire view data set.

In this section, we will see:

Note

The Data preparation tab is only available if you have privileges to execute the view and you have been granted with the data_catalog_data_preparation role in Virtual DataPort.

Loading Data

Every data preparation session starts in the Columns section. There you have two panels, Fields and Relationship fields, with all the available fields from the view and its related views. To start preparing the data you need to choose which fields will participate in the query. To do that just drag-and-drop fields from these two panels to the Output columns panel and click the Load data button when you are ready. You will be redirected to the Definition section, where there is an example of the query results with the fields you have selected in Output columns.

Take into account that the Columns section is always available and you can go back there whenever you want. In case you want to discard your data preparation session and start from the beginning click the Start over button.

Wizard to select the fields that participate in the query

Wizard to select the fields that participate in the query


Let us see in more detail the available panels in the Columns section.

The Fields panel contains the list of all the fields in the schema of the view in Virtual DataPort. In addition, it has an extra field called NUMBER_OF_RECORDS, which represents the number of rows in the query results. It is equivalent to applying the Count transformation to a column in the Definition section.

Note

If there are some fields you are not allowed to use in a query according to your column privileges, they will not be listed here.

Fields of the view available to use for data preparation

Fields of the view available to use for data preparation

The icon next to the field name describes the data type of the field. It can take one of the values below:

  • numeric-type for the numeric types int, long, float, double and decimal.

  • text-type for the text type.

  • date-type for the date types date, localdate, time, timestamp, timestamptz, intervalyearmonth and intervaldaysecond.

  • blob-type for the blob type.

  • boolean-type for the boolean type.

  • array-type for the array type.

  • xml-type for the xml type.

In addition, a field may be decorated with the mandatory icon, which means that the field is mandatory. In the Definition section, you must create a filter on this field to provide a value for it.

The Relationship fields panel contains the list of related views, that is, the views with which the current view has a to-one association. Notice that related views are labeled as <view name> (<role name>).

Note

A related view may appear more than once in the list, since the current view can have several to-one associations with the same view. You can use the role name to differentiate each association.

Each related view has a expand-view icon next to its name. Click it to unfold the fields of the related view in Virtual DataPort. Henceforth, we will call them the expanded fields.

Note

If there are some expanded fields you are not allowed to use in a query according to your column privileges in the related view, they will not be listed here.

Expanded fields of the view available to use for data preparation

Expanded fields of the view available to use for data preparation

Expanded fields may be mandatory as well. In case the query depends on a related view with a mandatory field, you must create a filter on the expanded field to provide a value for it.

The Output columns panel defines the schema of the query results, that is, which fields and expressions will be part of it and in which order they will appear. Initially, it only consists of fields from the view and its related views. As new columns are created in the Definition section they are automatically added to this panel to reflect the current schema of the query results.

The Output columns panel defines the schema of the query results

The Output columns panel defines the schema of the query results

Notice that in the Output columns panel the expanded fields are labeled as <role name>/<field name>.


Now it is time to explore all the options that are available in the Columns section.

Add Fields to Output Columns

There are several ways of adding a field or expanded field to the Output columns panel:

  • Drag-and-drop a field or expanded field in the list to Output columns.

  • Drag-and-drop a related view from Relationships fields to Output columns. All its expanded fields will be added.

  • Select some fields or expanded fields in the list and drag-and-drop them to Output columns.

  • Select some fields or expanded fields in the list, click the menu-title icon in the panel title and select the Add to output columns option in the menu.

Sort Fields

If the schema of the view or its related views contains a great number of fields, it could be difficult to find one specific field. You can use the search bar of each panel to filter fields by name. As an alternative, the field list can be sorted as follows:

  • Click the menu-title icon in the panel title.

  • Select the option Sort ascending or Sort descending to sort the fields in alphabetical order.

  • Select the option Default sort to restore the original order of the fields in the view schema.

Reorder Fields in Output Columns

The order of the fields in the Output columns panel determines how they will appear in the query results. To reorder a field in the output schema, you have several options:

  • Click the menu-item icon next to the field you want to reorder and select the Move up or Move down option. Repeat this process until the field is in the correct place.

  • Select those fields in Output columns you want to reorder, click the menu-title icon in the panel title and select the Move up or Move down option. Repeat this process until the fields are in the correct place.

  • Drag-and-drop a field to its final place in the panel.

  • Select those fields you want to reorder and drag-and-drop them all to their final place.

Remove Fields from Output Columns

There are two ways of removing a field or expanded field from the Output columns panel:

  • Click the menu-item icon next to the field you want to remove and select the Remove option.

  • Select those fields in Output columns you want to remove, click the menu-title icon in the panel title and select the Remove option.

Preparing Data

Once your initial data has been loaded, you can start building your query from the Definition section. There you have a representative subset of the view data that works as a preview of the query results. Apply actions on these data, one at a time, to manipulate them until you are satisfied with the result. Every action is executed in real-time and the query results are updated accordingly, so you can immediately perceive the effect of the action on the data.

Data preparation screen to build queries by modifying real-time data

Data preparation screen to build queries by modifying real-time data


In this section there are three different areas: a table with live query results in the center, the Data preparation panel on the left and the Filters and Transformations panels on the right. Let us see these elements in more detail.

The Data preparation panel contains the actions history, i.e., the list of actions that were applied to build the current query results in chronological order. Each time an action is applied, a new entry is added on this panel with a little description of it. If you want to detach this panel, click the detach-actions-history button in the panel title and a floating window will appear with the actions history.

The table in the center shows a preview of the current query results after applying all the actions in the actions history to the view. Depending on your configuration, the data for the preview is obtained directly from the data source or from a sample of it that is locally stored. Either way, the data in the table is updated in real-time with each action.

You can select a column in the table by one of the following methods:

  • Click the column title.

  • Click the menu-title icon in the column title and select the Open filters & transformations option.

Selecting a column in the table has two consequences. First, it opens its Filters and Transformations panels, which show the applicable actions for the column. In addition, the actions related to the column in the actions history are highlighted, so you can get an idea of the data lineage of the column. For a restricted data lineage of the column that only considers transformations you can put the mouse over the info icon in the column title.

To deselect a column you have several options too:

  • Click the column title again.

  • Click the menu-title icon in the column title and select the Close filters & transformations option.

  • Click the Deselect column button in the toolbar.

The Filters and Transformations panels show a list of actions in alphabetical order to apply to the selected column. The list of available actions may be different for each column as it depends on the column’s data type. If you are not sure what an action is for, put the mouse over the info icon next to the action name to get an explanation about how it works. Panels can be expanded or collapsed with the expand or collapse icons. In addition, you can use the search bar of each panel to filter actions by name.

A filter is a condition on the column values that a row must meet for it to be returned in the query results. Some filters need some parameters to be defined. If that is the case, a dialog will appear asking for them when you select the filter.

Dialog to define the parameters of a filter

Dialog to define the parameters of a filter

A transformation is a function that converts the column values. This conversion may modify the data type of the column value too. For instance, the length transformation converts a text in its number of characters. The icon next to the transformation name describes the data type of the value returned by the function. It can take one of the values below:

  • numeric-type for the numeric types int, long, float, double and decimal.

  • text-type for the text type.

  • date-type for the date types date, localdate, time, timestamp, timestamptz, intervalyearmonth and intervaldaysecond.

  • xml-type for the xml type.

  • aggregation-type for the aggregation type.

You can filter the list of transformations by return type. Click the menu-title icon in the panel title and select one of the available return types.

A transformation usually works by converting the values in the column, row by row. It takes the value in the column for the first row, converts it to a new value, and repeats this process for the rest of rows. However, there is a special case of transformations that do not follow this pattern: aggregation transformations. An aggregation transformation takes values in the column from several rows and produces a single value by combining them all. Examples of aggregation transformations are Average, Count or Maximum. A column that uses an aggregation transformation is called an aggregation column.

Note

Aggregation columns are decorated with the info icon in the Output columns panel of the Columns section, so you can differentiate them from non-aggregation columns.

Once an aggregation transformation is applied to the data the semantic of the query changes as follows:

  1. The rows of the query results are partitioned in groups. For each distinct value of the non-aggregation columns, a group of rows is generated with all the rows that contain that value. In case all the columns in the query are aggregation columns, then one group is generated with all the rows of the query results.

  2. For each group, aggregation columns are calculated by applying the corresponding aggregation transformation on the rows of the group.

  3. There will be one row in the query results for each group. The row will consist of:

    • The values of the non-aggregation columns that generated the group.

    • The values of the aggregation columns calculated for the group.

Results for a query with aggregation columns

For each distinct value in the non-aggregation columns, the results contain a row with all the aggregation columns

After selecting a transformation from the panel, you have to configure how it is applied in a new dialog. Transformations can have parameters that must be defined. In addition, you can indicate where to store the result of the transformation. By default, it is stored in the same column you are transforming. However, you can optionally create a new column with the new value.

Dialog to configure how a transformation is applied

Dialog to configure how a transformation is applied


Now it is time to explore all the actions that are available in the Definition section.

Apply a Filter

To apply a filter on the values of a column you must follow the steps below:

  1. Select the column to filter by in the query results.

  2. Select a filter in the Filters panel.

  3. Fill in all the information requested in the dialog, in case the filter needs more configuration.

As an alternative, you have this other method to apply a filter on the values of a column:

  1. Click the Add filter button in the toolbar.

  2. Select the Add filter from wizard option.

  3. Search for a column to filter by.

  4. Select a filter from the options in the drop-down menu.

  5. Fill in all the information requested, in case the selected filter needs more configuration.

Dialog to create a new filter

Dialog to create a new filter

Apply a Filter from Expression

In case you need an advanced filter that is not considered among the options in the Filters panel, you can create a filter from a VQL expression following the steps below:

  1. Click the Add filter button in the toolbar.

  2. Select the Add expression filter option.

  3. Enter the VQL expression that represents the condition that the rows must meet to appear in the query results.

Dialog to create a new filter from a VQL expression

Dialog to create a new filter from a VQL expression

Take into account that the VQL editor in the dialog provides the following features to help you write an expression:

  • Highlighting. Each component of an expression is highlighted with a different color: functions, operators, literals, columns, etc.

  • Autocomplete. Press Ctrl + Space and a menu with suggestions will appear.

  • Validation. Every time you type in the VQL editor, the expression is validated. If the expression is correct, you will see a valid icon. If it is incorrect, you will see a invalid icon. Put the mouse over the icon and a tooltip will explain why it is invalid.

See also

Further information on the capabilities of VQL to write conditions and the syntax a condition must follow can be found in the Virtual DataPort VQL Guide.

Apply a Filter from Recommendations

Data Catalog can suggest popular filters for you based on other users’ activity. To access all recommended filters proceed as follows:

  1. Click the Add filter button in the toolbar.

  2. Select the Recommendations option.

If you are only interested in popular filters with a condition on a specific column, then you have to:

  1. Select the column for which you want recommended filters in the query results.

  2. Click the recommendations icon in the title of the Filters panel.

Either way, a new dialog will appear with a list of recommended filters.

Dialog with recommended filters based on other users' activity

Dialog with recommended filters based on other users’ activity

This list considers two kind of recommended filters:

  • Simple filter. A filter that depends on a column currently in the query results and that can be expressed with a condition in the Filters panel. If you click a filter like this, its corresponding dialog will open.

  • Filter from expression. A filter represented with a VQL expression, since it does not meet the requirements of a simple filter. If you click this kind of filter, the dialog to create a filter from expression will open. Take into account that the VQL expression can depend on parameters represented as ?. These are placeholders that must be replaced with actual values.

Note

Recommendations about filters and derived fields feature is only available with the AI & Recommendations FeaturePack. To find out the subscription bundle you have, open the About dialog of the Data Catalog.

Apply a Transformation

To apply a transformation on the values of a column you must follow the steps below:

  1. Select the column to transform in the query results.

  2. Select a transformation in the Transformations panel.

  3. Fill in all the information requested in the dialog.

  4. Confirm if you want to store the new value in the same column you are transforming or to create a new column with it.

Apply a Transformation from Recommendations

Data Catalog can suggest popular transformations for you based on other users’ activity. To access all recommended transformations proceed as follows:

  1. Click the Add column button in the toolbar.

  2. Select the Recommendations option.

If you are only interested in popular transformations on a specific column, then you have to:

  1. Select the column for which you want recommended transformations in the query results.

  2. Click the recommendations icon in the title of the Transformations panel.

Either way, a new dialog will appear with a list of recommended transformations.

Dialog with recommended transformations based on other users' activity

Dialog with recommended transformations based on other users’ activity

This list considers two kind of recommended transformations:

  • Simple transformation. A transformation that depends on a column currently in the query results and that can be expressed with a transformation in the Transformations panel. If you click a transformation like this, its corresponding dialog will open.

  • Transformation from expression. A transformation represented with a VQL expression, since it does not meet the requirements of a simple transformation. If you click this kind of transformation, the dialog to create a new column from an expression will open. Take into account that the VQL expression can depend on parameters represented as ?. These are placeholders that must be replaced with actual values.

Sort by Column

The order of the rows in the query results is undefined by default. To establish an order you can sort the rows by the values in a column as follows:

  1. Click the menu-title icon of the column you want to sort by.

  2. Select the Sort A to Z or Sort Z to A option.

For those rows that share the same value in the column you have selected, the order is still undefined. You can repeat this process on another column, adding a second level of sorting, to establish how those rows should be ordered.

Create a New Column

In the previous sections we have seen that you can apply a transformation or recommended transformation and store the result in a new column. Let us explore what other options you have to create a new column in the query result.

To create a new column from scratch and give it a value proceed as follows:

  1. Click the Add column button in the toolbar.

  2. Select the Add column from wizard option.

  3. Enter a name for the new column.

  4. Enter a value for the new column, which can be one of the following options:

    • A constant value. All rows will have the same value. You need to specify the column type.

    • A column value. The new column will have the same value as the column you choose.

    • A transformation. First select the column and the transformation. And then fill in all the information requested in the dialog, in case the transformation needs more configuration.

    • Nothing, it will be empty. All rows will have NULL as a value. You need to specify the column type.

    • An expression value. Define the value as a VQL expression as complex as you want.

      Note

      Because the syntax for the expanded fields is <role_name>/<field>, you cannot use the / as an operator to create a VQL expression. Use the function DIV instead.

      See also

      A comprehensive list with all the functions you can use to build a VQL expression can be found in the Virtual DataPort VQL Guide.

Dialog to create a new column from scratch

Dialog to create a new column from scratch

To create a new column with the same value of other column you have the following alternative:

  1. Click the menu-title icon of the column you want to duplicate.

  2. Select the Duplicate column option.

  3. Enter a name.

To create a new column from a VQL expression that depends on other column you have the following alternative:

  1. Click the menu-title icon of the column you want to use in the VQL expression of the new column.

  2. Select the Create new column from expression option.

  3. Enter a name.

  4. Complete the VQL expression for the new column.

To add a field from the view or a related view as a new column follow the steps below:

  1. Go to the Columns section.

  2. Add a field or expanded field to the Output columns panel.

  3. Click the Load data button.

Remove a Column

There are several methods to remove a column from the query results:

  • In the query results, click the menu-title icon of the column you want to remove and select the Remove option.

  • In the actions history, click the menu-item icon of an action that created a column and select the Remove column option.

  • In the actions history, select several actions that created a column, click the menu-title icon in the panel title and select the Remove option.

  • Go to the Columns section, click the menu-item icon of the column you want to remove from the Output columns panel and select the Remove option.

  • Go to the Columns section, select several columns in the Output columns panel, click the menu-title icon in the panel title and select the Remove option.

Removing a column from the query results has the following effects:

  • The column is hidden from the query results.

  • A new action is added to the actions history indicating that the column was removed.

  • All actions in the actions history that depend on the removed column are disabled to indicate that they are not considered to build the query results.

If you are satisfied with the new query results, you have to confirm the removal of the column for its effects to be definitive. To do that, follow the steps below:

  • Remove all depending actions on the removed column.

  • Click the menu-item icon of the removal action.

  • Select the Remove column option.

In this way the column is completely removed. Take into account that the action that created the column and the action that removed the column will not be visible anymore in the actions history.

Instead, if you changed your mind and want to revert the removal action, then proceed as follows:

  • Click the menu-item icon of the removal action.

  • Select the Undo option.

Reorder a Column

The order in which the columns appear in the query results is not fixed. You can change it following one of the methods below:

  • Drag-and-drop the column title until it is placed in the appropriate location.

  • Click the menu-title icon of the column you want to reorder and select the Move to the left or Move to the right option. Repeat this process until the column is in the correct place.

  • Go to the Columns section and reorder the columns in the Output columns panel.

Rename a Column

To change the name of a column follow the steps below:

  1. Click the menu-title icon of the column you want to rename.

  2. Select the Rename column option.

  3. Enter a new name for the column.

Edit an Action

Some actions in the actions history can be edited. For instance, applying a filter or a transformation that depends on some parameters or a VQL expression. To edit an action like this, follow the steps below:

  1. Click the menu-item icon of the action you want to edit.

  2. Select the Edit option.

  3. Enter a new value for the parameter in the dialog that just opened.

Remove an Action

We have already seen how to remove an action that creates a column. Any other action in the actions history can be removed too. You have the following alternatives:

  • Click the menu-item icon of the action you want to remove and select the Remove option.

  • Select several actions in the actions history, click the menu-title icon in the panel title and select the Remove option.

As a consequence, the actions are removed from the actions history and the query results updated accordingly.

Check the VQL of the Query

Even if you do not realize it, Data Catalog is building a VQL query on the view as a consequence of your data preparation session. At any moment you can check it. Just click the VQL button in the toolbar. If you want to manually edit the VQL query, click the Open in VQL Shell button.

VQL query generated from your data preparation session

VQL query generated from your data preparation session

Save the Query

At any moment during your data preparation session you can save the current query for later, which would be useful in the following scenarios:

  • The query is not finished yet and you want to continue later.

  • The query is too complex and you will need to execute it often.

  • You want to share the query with other person in your company.

  • You want to deploy the query as a view in a Virtual DataPort server.

To save your data preparation session, proceed as follows:

  • Click the Save button in the toolbar.

  • Enter a name for the query and, optionally, a description.

Dialog to save a query for later

Dialog to save a query for later

You will find your saved query in the My queries section.

Note

Take into account that:

  • No other user will have access to your saved query, unless you share it.

  • Your saved queries are associated to the Virtual DataPort server you are connected to.

See also

For further information on what actions you can perform with saved queries, check the My Queries section.

Executing the Query

During your data preparation session you are manipulating a representative subset of the view data. Once you are satisfied with the query results in the Definition section you can perform the same manipulations on the entire view data set. Click the Execute button in the toolbar and you will be redirected to the Results section. There you can check the query results executed on the view, export them to a file using the Export button in the toolbar or navigate though the data of its related view clicking the link icons in the query results.

Query results executed on the entire view data set

Query results executed on the entire view data set

Add feedback