You can translate the document:

Goal

Talend is a data integration platform that provides various software and services for data integration, data management, enterprise application integration, data quality, cloud storage and Big Data.

This document aims to describe how to access Denodo Virtual DataPort (VDP) from Talend Open Studio and how data from VDP views can be used.

Connection

To connect to Denodo from Talend Open Studio we will use Denodo’s JDBC driver. This driver is located in the folder <DENODO_HOME>\tools\client-drivers\jdbc.

In order to make the connection work, you can follow the steps below.

Configure the connection to Denodo in Talend

  1. Open Talend Open Studio and create a new project.

  1. Click on the arrow of the selection drop down menu Metadata, select Db Connections, right-click and select “Create connection”.

  1. Enter a Database Connection name, for instance “denodo” and click on "Next".

  1. DB Type is JDBC, and the JDBC connection URL is: :

jdbc:vdb://<hostname>:9999/<dbname>?userAgent=myApplication

For more information see Access Through JDBC — Virtual DataPort Developer Guide.

  1. Click on  to add the Denodo driver.

  1. Select “Install a new module” and browse to the driver location (folder <DENODO_HOME>\tools\client-drivers\jdbc in a Denodo installation).

  1. Click on “Test connection” to check that the configuration is working.

For the userAgent parameter, you can modify the name, for example: Talend.

  1. To retrieve table schemas from the database connection we have just set up, right-click the connection item from the Repository tree view and select Retrieve schema from the contextual menu.

A new wizard opens up where you can filter and show different objects (tables, views and synonyms) in your database connection, select tables of interest, and define table schemas.

Define a filter to filter database objects according to your needs. Click on Next to open a view that lists your filtered database objects. The list will show all the databases with all their tables present in the database that meet the filter conditions.

  1. Select one or more tables from the list to load them into your repository file system. Your repository schemas will be based on these tables. Click Next. On the next window, four setting panels help define the schemas to create. Modify the schemas if needed.

        

  1. In the drop-down, the Denodo views will show up under “Table schemas” and “View schemas”. By default, the schema displayed on the Schema panel is based on the first table selected in the list of schemas loaded (left panel). You can change the name of the schema and, according to your needs, you can also customize the schema structure in the schema panel.

The tool bar allows you to add, remove or move columns in your schema. In addition, you can load an XML schema from a file or export the current schema as XML.        


Performance analysis

In this section, we will compare the execution times when we run queries from Talend using Denodo as the data source and when we run the same queries connecting directly from Talend to the underlying data sources. We will also see how performance can be further improved when accessing Denodo from Talend.

In this example, we have an Oracle table called “Store sales”, and a CSV file called “Item”. In our use case, we want to get the average sales price by brand, sorted in ascending order by the brand field. In the table below, we show the total rows that each of the tables has, and the number of results returned by the query that gets the data that we need:

STORE SALES

(Oracle)

ITEM

(CSV)

QUERY

Total rows

2.880.404

18.000

713

After integrating both data sources in Denodo and creating the base views over the Oracle table and the CSV file we are going to create a derived view in Denodo as follows:

  1. We create a “JOIN” between "Store sales" and "Item”:

Note that for the base view “Item” we have defined a primary key that is used as part of the join condition.

  1. Once we have made the JOIN, we make a GROUP BY using the field that represents the brand:

  1. Last, in the output, we show the field for the brand and create a calculated field (“New aggr. field”) that we are calling: “avg_sales_price” where the expression is as follows:

  1. So, the output would be as follows:

Sorted by the brand field:

We can also create an equivalent query in Talend, accessing directly the CSV file and the Oracle database:

We are going to compare this query with the same query in Talend but that queries the derived view that we have just created in Denodo.

Performing 10 executions from Talend and capturing the execution time, the comparison with or without Denodo in Talend can be seen in the following graph:

We can see that the execution time without Denodo is 90 times slower than using Denodo, with an average of 2.60 seconds using Denodo compared to 2.27 seconds without Denodo.

If we review the execution trace of the query that is run in the Denodo side, we see that Denodo is applying an "Aggregation Push-Down" optimization and not all the rows from the sources are being brought to memory:

   

Denodo delegates the query to the Oracle data source and it only brings 18.000 rows because we have a group by a field (item), which is primary key:

Denodo also helps improve some queries when using a single data source. In the following example, we have tested a similar query after creating a table in Oracle with the data from the CSV file for the items. The query joins the two tables, but both of them belong to the same data source (Oracle). The execution times are shown below:

Even using only a single source, on average, the execution time directly to Oracle is 161 times slower than using Denodo: 1.45 seconds in Denodo, and 234 seconds in Talend on average.

In conclusion, when using Denodo as a source in Talend:

  • The query performance is improved.
  • The performance is also better using just one underlying data source.
  • Using Denodo, less data is loaded in memory at execution time.
  • Denodo delegates the GROUP BY operations to the source when possible to avoid retrieving all the information for the data source.

References

Access Through JDBC — Virtual DataPort Developer Guide

Creating a JDBC connection and importing a database driver

Retrieving table schemas

Disclaimer
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here