Creating Join Views¶
A join view executes the relational algebra operation JOIN on a series of input views. These views can belong to different databases.
To create a join view, click Join on the File > New menu or, right-click the Server Explorer and click Join on the menu New.
The Tool will open the “Join view” dialog and it will add the view that is currently selected in the Server Explorer. To add more views to the join view, click the Model tab and drag the views from the Server Explorer to this tab.
You can join more than two views in the same join view.
In our example, we will join the view
incidents created in the
previous section and the Web service view
created in the section Creating Base Views from a SOAP Web Service.
The “Join view” dialog has six tabs:
Model: drag the views that will form the join.
You can define simple join conditions graphically, by linking a field of one view with another field of the other view. To configure the operator used in the condition, double-click the arrow that joins both fields and select the appropriate operator. To add more complex join conditions, go to the “Join Conditions” tab.
Using the Design Studio define the join conditions in two steps. First, connect 2 views by clicking on link button (on the top-right side of the source view) and drag to the target view. The source view and the target view define the left and right views of the join. Once the views are connected, click on the link and a dialog should show up at the bottom of the model tab with the fields of both views. Use that dialog to add simple join conditions graphically by linking a field of the left view with other field of the view on the right side.
To define a cross join, just drag the views into the dialog and do not link any field.
In this tab, you can add “View parameters”. See more about this in the section Parameters of Derived Views.
Join Conditions: tab where you can define the join strategy and the join conditions:
Configure the join strategy. To do this, click on the icon located between the names involved in the join. The options are:
Join type: Inner, Left outer, Right outer and Full.
Execution method: Hash, Merge, Nested and Nested parallel. If you select nested parallel, you have to provide the maximum number of parallel subqueries (“Nested parallel number”). If you select the option “Any” method, the Server will try to select the best suitable join method.
Order in which the input views are considered. If it is “Ordered”, the view on the left of the join will be considered the first view.
If you want to define a join condition between a field of a simple type (int, long, etc.) and a subfield of a compound type, the subfield of the compound type has to be placed in the left side of the join condition.
For example, let us say that you want to create a join view with these two views:
view_a, which has a field
view_b, which has a field
If you want the join condition to be
f1 = f2_b, you have to define the join condition as
... JOIN ... ON (f2).f2_b = f1That is, the subfield of the array must be placed on the left side of the join condition.
These preferences may have a significant impact on the performance of queries. Read the section Optimizing Join Operations for more information about this.
Edit the join conditions added in the “Model” tab, by clicking on them.
Click Add new condition to define a new join condition and Add cross join to define a cross join between two views.
In simple scenarios, you can define join conditions and cross joins graphically from the “Model” tab. However, in complex situations you need to define these from the “Join Conditions tab”:
By default, the conditions added with the “New Join Condition” dialog are added to the most internal join that contains all the fields of the condition.
In complex joins, you may need to specify to which join the new condition belongs. To do that, from the “New Join Condition” dialog, select the check box at the bottom and select the views involved in the target join.
From the “Model” tab, you cannot specify a join between a view and the result of a cross join of two views. E.g.:
(view_a CROSS JOIN view_b) INNER JOIN view_c ON view_a.field1 = view_c.fieldA
In the Design Studio, to define a join between a view and the result of a cross join of two views, follow these steps:
Connect the views
view_b(do not add join conditions)
Add the condition by connecting the field
view_a.field1with the field
The Add cross join button is not available in the Design Studio.
To create this view, follow these steps:
Drag the three views into the “Model” tab.
In the “Join Conditions” tab, click Add cross join, select
view_band click Ok.
Click Add join condition and enter the condition
view_a.field1 = view_c.fieldA. Then, select the check box at the bottom of the dialog. In the Left subview list, select
(view_a CROSS JOIN view_b)and in Right subview,
The section Join Conditions with Similarity Operators explains the meaning of the operator
To create Cross Join views, you just have to drag the views into the “Model” tab and do not define any join condition in the “Model” tab, the “Join” tab or the “Where condition” tab.
Do not use fields that are “View parameters” in the join conditions.
Where Conditions: tab that allows you to add WHERE conditions to the definition of the view.
Group By: tab that allows you to add GROUP BY attributes to the view.
Output: tab that allows you to configure the output of the view. That is, renaming the view and its fields, add derived attributes, define the primary key of the view, etc.
Metadata: tab that allows you to define the folder where the new view will be stored and provide a description for the new view.
The tabs Where Conditions, Group By, Output and Metadata work in the same way as in the Union view dialog. The section Creating Union Views explains in more detail how to use them.
In our example:
Click on Join on the menu File > New.
Drag the views
average_monthly_salesto the “Model” tab.
Add the join condition
incidents.taxId=average_monthly_sales.taxIdby dragging the name of the field
taxIdof one of the tables, to the field
taxIdof the other table.
In the “Output” tab:
Rename the view to
Remove the field
incidents.taxId. To do this, select this field and click on Remove selected.
After this, click Save () to create the view. Then, the Tool will display the schema of the new view.
Join Conditions with Similarity Operators¶
Virtual DataPort supports join conditions that use the similarity
operator. This operator returns a value between
estimates the similarity between the operands using a certain algorithm.
As well as the operands to compare, this operator receives the
similarity algorithm to use and a minimum similarity threshold as
parameters. Where the similarity between operands reaches the threshold,
the condition is true. Otherwise, it is false.
The operator ~ (Alt-126) evaluates the similarity between operands of
text. The algorithms available are based on the edition
distance between two texts:
It also includes algorithms based on the appearance of words in both texts:
Besides, it incorporates combinations of some of these algorithms. I.e.
If no algorithm is specified, Virtual DataPort chooses the one to apply. In that case, only the similarity threshold has to be specified.
Example: consider a variation of the example in which customers are
identified by their name instead of their
taxId. Unfortunately, the
name of the customers in the incidents databases does not exactly match
their name in the sales Web Service and, therefore, a join operation
with the equality operator will not give good results. In this case, a
join can be used with a similarity operator to solve this problem.
The incidents example included with the Denodo Platform can be used to
reproduce this situation. Two additional relational tables are included
phone_inc_cname. These tables
are similar to the
phone_inc tables used in the
example, although they include a
customer_name attribute to indicate
the name of the customer instead of the
taxId attribute. The sales
Web Service also includes an operation known as
GetAverageMonthlyRevenueCName that returns a compound data element
containing the name and the monthly revenue of all the customers. To
solve the problem posed by this example, follow the steps below:
Create the JDBC base views for the tables
phone_inc_cnametables (see section Creating Base Views from a JDBC Data Source).
Open the Web service “sales”, click on Create base view beside the operation
GetAverageMonthlyRevenueCNameoperation and select Do not stream output. Then, click Ok to see the schema of the new view and click Save to create the view.
Join the base views created for
phone_inc_cname(see section Creating Join Views).
“Flatten” the compound data element returned by the view created from the
GetAverageMonthlyRevenueCNameoperation. See section Creating Flatten Views for a detailed explanation of the “flattening” process.
Define a join view using the
customer_namefield between the view obtained as a result of step 3 and the view obtained as a result of step 4. Specify operator ~ as the join operator and set the similarity threshold to 0.7.
The simple editor is not currently unavailable in the Design Studio. You need to manually set the similarity condition of the join with the following syntax:
join_incidents_cname.customer_name ~ (flatten_revenue_cname.cname,0.7). To specify the similarity algorithm for this operator, add the name of the algorithm as the third parameter surrounded with single quotes (e.g.
join_incidents_cname.customer_name ~ (flatten_revenue_cname.cname,0.7,'Jaro'))
Run the view obtained in step 5. The join is made correctly, despite the fact that the customer names in the input views are not exactly the same.