DATA COMBINATION

Welcome to the Denodo Data Combination Tutorial!

In this tutorial we'll explore fundamental data combination operations like Joins, Group By, and Flatten, essential for merging and organizing diverse datasets. We'll also cover practical skills, including importing data from web services and using VQL Stored Procedures, enhancing your ability to integrate external data sources and maximize your analytical capabilities.

Let us suppose that the billing department of our sample company exposes the billing information using an internal web service that exposes all the open bills for each customer: amount due, due date, etc. In this section you will combine the unified customer view that you have built with this billing information so you can have a report that lists the total amount that is due for each of your customers.

List of topics:

  • Create a Web Service data source.
  • Flatten: a hierarchical structure.
  • See how derived views are constructed using Tree View.
  • Create a Join view between heterogeneous sources.
  • Create an Aggregation view.

In the previous tutorial, Denodo Basics, you learned how to access a customer database to get personal and contact information about the company's customers. It is strongly advised to complete the Denodo Basics tutorial before starting this one, as we are going to be using the same elements (data source, base views and integrations) created on it.

If you have already completed it, please proceed to the next section. Otherwise, follow the subsequent instructions:

  1. Launch the resources needed (check how in the Installation & Bootstrapping tutorial).
  2. Log in Denodo Design Studio (user/password: admin/admin):

  1. Import this VQL to Denodo by clicking into File > Import. Drag ‘n' drop the file and select "tutorial" as the Database:

If all steps have been executed correctly, you should observe the following in the Design Studio's elements tree:

Great! Now it's time to start the tutorial!

As part of the installation steps, you have available a billing web application that exposes several REST APIs web services. The different services are available at http://localhost:8080/billing/services.

This Web service has been created by the billing department and expose the information about the customers' bills using three different operations:

  • getBills: returns a list with all the bills in the system.
  • getBillByCustomerID: returns a list with all the bills for the specified input customer id.
  • getBillByPhoneCenter: returns a list with all the bills for the specified input phone center.

All the different operations will return the billing information using a hierarchical structure: the bills will be returned as part of a list. For instance, if you invoke the getBillByCustomerID operation using a customer id as input parameter, you will get a list with all the bills for that customer. For each item (bill) in the list, you will see: the customer id and ssn, the amount due for that bill, the billing start date, end date and due date, the phone center that provided the service for that bill, and the bill id.

The Denodo Platform can integrate both SOAP and REST web services. For SOAP web services you will use the Web service type of data source, for REST web services you could use either JSON or XML data sources depending on the output format of the REST web service. Follow these steps to create a SOAP data source:

  1. Click on the '1 - Data Sources' folder and select + New > Data source. Then choose SOAP web service:

  1. In the New Web service Data Source creation wizard enter:

  1. Click on the Save button (you can leave the default values for the remaining options).

Very easy! Now let's see how to create the base views.

Now you have to click on the Create base view button. You will see on the screen a list with the different operations available in the Web service and the option of creating a new base view for each one of them.

In this case, you are interested in the getBillByCustomerID operation, so click on the Create Base View link associated with this operation.

In the Creating Web Service View for operation <getBillByCustomerID> screen just choose Do not stream output and click Ok.

After the creation of the view, rename the input parameter for the web service operation from in0 to customer_id by simply clicking on the next to the text . Also, change the view name to bv_billing_bill_by_customer_array (following the VDP Naming Conventions).

Change the location to be on the /1 - connectivity/2 - base views folder, by accessing the Metadata tab. This process can be done after too, by dragging the base view to its folder.

Click on Save after these changes.

As we have seen in the previous section, the Web service operation that you just imported has a mandatory input parameter: customer_id.

To reflect this restriction in the data source, the new base view will also have limited query capabilities and when querying the view (clicking the 'Execute' button) we will have to provide a value for this input parameter ("customer_id") as a WHERE condition.

Actually, if you try to execute a query from the VQL shell without providing the WHERE condition you will get an error.

If we execute a query and use 'C003' as the value for the customer id we will get the complete list of bills for that customer as the result of the query.

A list of items is represented with an Array data type. You can see in the results of the query that the return column is displayed as Array... If you click on the icon you can see the contents of the array. In this example, you can see the 2 bills that belong to this customer.

In the previous section, you have created a base view on top of a SOAP web service data source. This web service returns a hierarchical structure that includes a return element of type array.

To represent these hierarchical structures, Denodo supports two different complex data types:

  • Registers or Records: a register or record represents an element that is formed by several subelements or subfields. Each one of the subelements in a record will have their own name, data type and value.
  • Arrays: an array represents multi-valued data that is presented as a list of items. An element of the type array can be thought of as a sub-view included in a main view. An array type always has an associated record type that acts like the schema of the sub-view and defines the columns that make up each item in the list.

To simplify array columns, Denodo has the special Flatten operation.

Let's see how to create a new view that flattens the iv_billing_bill_by_customer base view:

  1. Click on the bv_billing_bill_by_customer_array view and select New > Flatten.
  2. In the Flatten View wizard, we will select the element that needs to be flattened. In this case, the only array that we have in the base view is: return.

  1. Go to the Output tab, rename the view to iv_billing_information.

  1. On the Metadata tab, select /2 - integration as folder and then click on Save.

The new derived view has the same query capabilities as bv_billing_bill_by_customer_array but if you query the new view using the same customer_id = 'C003' condition, you will get the fields that were part of the array in the base view schema. This customer has two pending bills so there are two rows in the result set of the query, one for each bill.

That's cool!

In the Denodo Basics tutorial, you have seen how to create new derived or integration views by combining other views. Once you start combining views the complexity of the new derived views will grow and it will be useful to visualize how those views are built. For this, you have the Tree View functionality.

In that Denodo Basics tutorial, you have created the iv_crm_personal_data view using two join operations over three different base views. To see the Tree View you have to right-click on the view name and select Tree View (see image below).

In the tree view you can see how the view has been created. In this example, you can see that the view iv_crm_personal_data is created as a join Join icon between three views: bv_crm_client, bv_crm_address and bv_crm_client_type.

At the same time that the join is performed, a projection Projection icon operation takes place because only some fields are projected.

In addition to the general overview, you can see the details of the different operations involved in the creation of the view as well as the data sources and wrappers information by clicking on the different elements.

For instance, clicking on a join node, it will display information about the join type, method used and the join condition.

If you want to save a snapshot for documentation purposes just click on the icon.

At the Denodo Basics tutorial, you have seen how you can create new views using the join operation. In that example, the views involved in the joins were all coming from the same data source. In this tutorial, you will see how you can create a new join view using the exact same procedure but using views coming from two different and heterogeneous data sources.

To create the derived view you can follow these steps:

  1. Click on the iv_crm_personal_data view and select New > Join.
  2. Drag & drop the iv_billing_information view to the Join View wizard Model tab.
  3. We will use client_id = customer_id as the join condition so drag & drop a line from the client_id field in the iv_crm_personal_data view into the customer_id field in the iv_billing_information view.

In the Output tab:

  1. Name the view: iv_client_with_bills
  2. Remove the field return_customer_id from the output schema of the view (you already have the customer_id field).

  • Click Save.

Now, if you execute the new view we will get the information about the bills from the different clients.

In the previous section, you have created a view that obtained the billing information for all the customers in your database. In this view we have one record for each bill, but now let's suppose we want to calculate the total amount due by the customers instead of having the separate bills.

To do so, you can create a new view that aggregates (group by) the different customers using the customer_id to compute the total amount.

To create the new aggregation view you can follow these steps:

  1. Click on the iv_client_with_bills view and select New > Selection.
  2. In the Projection / Selection View wizard go to the Group by tab and enable the Use group by Checkbox.
  3. From the list of fields on the left add the following to the Selected group by fields and expressions box: client_id, name, and surname.

  1. Go to the Output tab and rename the view to iv_amount_due_by_client.
  2. Now, you have to create a new field that computes the addition of the amount due for each client. Click on + New > New aggregate field button.

  1. Define a new field called total_amount that will be calculated using the SUM aggregation function and the amount_due field.
SUM(CAST('float', iv_client_with_bills.amount_due))

  1. Click Ok to add the new field to the view.
  2. Click Save to confirm the creation of the view.

Now, if you execute the new view, you will get as result one record per customer with the total billing balance:

Great! Well done!

Denodo allows users to create complex logic in VQL code, so it can be reused over and over again. For example, if you have a VQL query that you write over and over again, you can save it as a VQL Stored Procedure, and then just call it to execute it.

VQL Stored Procedure Structure

In the structure of a VQL Stored Procedure you can find the following:

  1. First, we can find the header of the Stored Procedure. Here you must define the parameters of the Stored Procedure. Those parameters can be input parameters defined by IN clause, output parameters defined by the OUT clause, or parameters that can be used as input or output, defined with the IN OUT clause.
  2. Then, in the AS section, you must define the different variables needed by the Stored Procedure. If you need to execute a query, you can create a cursor that processes a SELECT statement and stores in the cursor the results of that query.
  3. Finally, the body of the Stored Procedure is located between the BEGIN and END clauses where you must define the logic of your Stored Procedure (conditions and control statements, calculations for variables, calling on CURSORs, returning the results...). If needed you can also declare your own exceptions by using the EXCEPTION clause.

Prior to illustrating an example, it is essential to comprehend some fundamental tools of the body:

  1. Conditions and Control Statements:
  1. Variable Declarations (Eg: variable_name := value )
  2. IF/THEN/ELSE
  3. CASE WHEN
  4. LOOP Statements (LOOP...END, LOOP with EXIT WHEN, WHILE and FOR LOOP)
  1. Cursors: A cursor processes a SELECT statement and stores in the cursor the results of that query.

Creating a New VQL Stored Procedure

Before creating VQL Stored Procedures, in order to keep resources organized, it is advised to make a new folder. It will be named 8 - stored procedures, inside the tutorial database. Click on the three dots of tutorial and then select + New > Folder.

Now let's see how we can create a new Stored Procedure from the Denodo Design Studio. Click on the three dots of 8 - stored procedures and then select + New > Stored procedure > VQL stored procedure

Let's create an example! The objective is to classify the clients in 4 different types ("Important", "Medium", "Small", "Unbilled") depending on the total billing information. We are going to take "upper_limit" and "lower_limit" as inputs, which indicates the limits where we are going to divide the type of clients. We want the clients to be ordered by the total bill of each one.

  1. Copy the following code

example.vql

(upper_limit IN INTEGER, lower_limit IN INTEGER, client_id OUT VARCHAR, total_amount OUT INTEGER, type_client OUT VARCHAR)
AS (
    CURSOR cdata IS 'SELECT name,
       client_id,
       surname,
       total_amount
           FROM tutorial.iv_amount_due_by_client
             ORDER BY COALESCE(total_amount, 0) DESC';
    c cdata%ROWTYPE;
    type_c VARCHAR;
)
BEGIN
    OPEN cdata;
    LOOP 
        FETCH cdata INTO c;
        CASE 
            WHEN c.total_amount >= upper_limit THEN type_c:='Important';
            WHEN c.total_amount >= lower_limit THEN type_c:='Medium';
            WHEN c.total_amount >= 1 THEN type_c:='Small';
            ELSE type_c:='Unbilled';
        END CASE;
        RETURN ROW ( client_id, total_amount, type_client) VALUES (c.client_id, c.total_amount, type_c);
        EXIT WHEN cdata%NOTFOUND;
    END LOOP;
    CLOSE cdata;

END
  1. Paste it on the new VQL Stored Procedure we created, name it sp_client_type.

  1. click on Save

  1. Execute the Stored Procedure with "upper_limit" = 500 and "lower_limit" = 100 as inputs.

Some common use cases for using VQL Stored procedures are:

  • Business Logic Implementation: as shown in this tutorial.
  • Dynamic dataset generation: for example, return a count from 1 to the defined value.
  • Data Validation: for example, update rows of a table that contain incorrect values.
  • Complex data replication: If a value from a select statement is "XXXX", then insert another value in another data source.
  • Error handling: checking access to underlying data sources, and then executing a select statement on the healthy connection.
  • Denodo Metadata modification: for example, add a common prefix to a set of Denodo base views.

Congratulations!

That's all for this tutorial, now you have learnt the basics of combining data in Denodo.