Creating Join Views¶
A join view executes the relational algebra operation JOIN on a series of input views. These views can belong to different databases.
To create a join view, click Join on the File > New menu or, right-click the Server Explorer and click Join on the menu New.
The Tool will open the “Join view” dialog and it will add the view that is currently selected in the Server Explorer. To add more views to the join view, click the Model tab and drag the views from the Server Explorer to this tab.
You can join more than two views in the same join view.
In our example, we will join the view incidents
created in the
previous section and the Web service view average_monthly_sales
,
created in the section Creating Base Views from a SOAP Web Service.
The “Join view” dialog has six tabs:
Model: drag the views that will form the join. Consider this:
When you add two views to this tab, Design Studio automatically creates a join condition if there is an association between these views and this association is a referential constraint (i.e. it is a primary key/foreign key relationship). The join condition will be the same as the condition mappings of the association.
When adding a view to a join view, Design Studio suggests the views you may want to add to the join view, based on the associations of that view. The suggestions are next to the label Associated views; just click the view to add it automatically.
To define a simple join condition do this:
If you use Design Studio, first, click on the link button (on the top-right side of the source view) and drag it to the target view. After connecting the views, click the link and a dialog will show up at the bottom of the Model tab with the fields of both views. In this dialog, link a field of the left view with another of the right view. The source view and the target view are the left view of the join and the right view of the join respectively.
If you use the Administration Tool, drag the field of one view to the field of the other view. To configure the operator used in this JOIN condition, double-click the arrow that joins both fields and select the operator. To add more complex join conditions, go to the “Join Conditions” tab.
To define a cross join, just drag the views into the dialog and do not link any field.
In the “Model” tab of the view, you can add “View parameters”. The section Parameters of Derived Views explains what are view parameters and the rules you need to follow when using them.
Join Conditions: tab where you can define the join strategy and the join conditions:
Configure the join strategy. To do this, click on the icon located between the names involved in the join. The options are:
Join type: Inner, Left outer, Right outer and Full.
Execution method: Hash, Merge, Nested and Nested parallel. If you select nested parallel, you have to provide the maximum number of parallel subqueries (“Nested parallel number”). If you select the option “Any” method, the Server will try to select the best suitable join method.
Order in which the input views are considered. If it is “Ordered”, the view on the left of the join will be considered the first view.
If you want to define a join condition between a field of a simple type (int, long, etc.) and a subfield of a compound type, the subfield of the compound type has to be placed in the left side of the join condition.
For example, let us say that you want to create a join view with these two views:
view_a
, which has a fieldf1
of typeint
view_b
, which has a fieldf2
of typearray
. Thisarray
has twoint
subfields:f2_a
andf2_b
.
If you want the join condition to be
f1 = f2_b
, you have to define the join condition as... JOIN ... ON (f2).f2_b = f1
That is, the subfield of the array must be placed on the left side of the join condition.
Note
These preferences may have a significant impact on the performance of queries. Read the section Optimizing Join Operations for more information about this.
Edit the join conditions added in the “Model” tab, by clicking on them.
Click Add new condition to define a new join condition and Add cross join to define a cross join between two views.
In simple scenarios, you can define join conditions and cross joins graphically from the “Model” tab. However, in complex situations you need to define these from the “Join Conditions tab”:
By default, the conditions added with the “New Join Condition” dialog are added to the most internal join that contains all the fields of the condition.
In complex joins, you may need to specify to which join the new condition belongs. To do that, from the “New Join Condition” dialog, select the check box at the bottom and select the views involved in the target join.
From the “Model” tab, you cannot specify a join between a view and the result of a cross join of two views. E.g.:
(view_a CROSS JOIN view_b) INNER JOIN view_c ON view_a.field1 = view_c.fieldA
Note
In Design Studio, to define a join between a view and the result of a cross join of two views, follow these steps:
Connect the views
view_a
andview_b
(do not add join conditions)Connect
view_a
andview_c
.Add the condition by connecting the field
view_a.field1
with the fieldiew_c.fieldA
.
The Add cross join button is not available in Design Studio.
To create this view, follow these steps:
Drag the three views into the “Model” tab.
In the “Join Conditions” tab, click Add cross join, select
view_a
andview_b
and click Ok.Click Add join condition and enter the condition
view_a.field1 = view_c.fieldA
. Then, select the check box at the bottom of the dialog. In the Left subview list, select(view_a CROSS JOIN view_b)
and in Right subview,view_c
.
The section Join Conditions with Similarity Operators explains the meaning of the operator
~
.Note
To create Cross Join views, you just have to drag the views into the “Model” tab and do not define any join condition in the “Model” tab, the “Join” tab or the “Where condition” tab.
Note
Do not use fields that are “View parameters” in the join conditions.
Where Conditions: tab that allows you to add WHERE conditions to the definition of the view.
Group By: tab that allows you to add GROUP BY attributes to the view.
Output: tab that allows you to configure the output of the view. That is, renaming the view and its fields, add derived attributes, define the primary key of the view, etc.
Metadata: tab that allows you to define the folder where the new view will be stored and provide a description for the new view.
The tabs Where Conditions, Group By, Output and Metadata work in the same way as in the Union view dialog. The section Creating Union Views explains in more detail how to use them.
In our example:
Click on Join on the menu File > New.
Drag the views
incidents
andaverage_monthly_sales
to the “Model” tab.Add the join condition
incidents.taxId=average_monthly_sales.taxId
by dragging the name of the fieldtaxId
of one of the tables, to the fieldtaxId
of the other table.In the “Output” tab:
Rename the view to
incidents_sales
.Remove the field
incidents.taxId
. To do this, select this field and click on Remove selected.
After this, click Save () to create the view. Then, the Tool will display the schema of the new view.
Join Conditions with Similarity Operators¶
Virtual DataPort supports join conditions that use the similarity
operator. This operator returns a value between 0
and 1
that
estimates the similarity between the operands using a certain algorithm.
As well as the operands to compare, this operator receives the
similarity algorithm to use and a minimum similarity threshold as
parameters. Where the similarity between operands reaches the threshold,
the condition is true. Otherwise, it is false.
The operator ~ (Alt-126) evaluates the similarity between operands of
type text
. The algorithms available are based on the edition
distance between two texts:
ScaledLevenshtein
JaroWinkler
Jaro
Level2Jaro
MongeElkan
Level2MongeElkan
It also includes algorithms based on the appearance of words in both texts:
TFIDF
Jaccard
UnsmoothedJS
Besides, it incorporates combinations of some of these algorithms. I.e.
JaroWinklerTFIDF
.
If no algorithm is specified, Virtual DataPort chooses the one to apply. In that case, only the similarity threshold has to be specified.
Example: consider a variation of the example in which customers are
identified by their name instead of their taxId
. Unfortunately, the
name of the customers in the incidents databases does not exactly match
their name in the sales Web Service and, therefore, a join operation
with the equality operator will not give good results. In this case, a
join can be used with a similarity operator to solve this problem.
The incidents example included with the Denodo Platform can be used to
reproduce this situation. Two additional relational tables are included
known as internet_inc_cname
and phone_inc_cname
. These tables
are similar to the internet_inc
and phone_inc
tables used in the
example, although they include a customer_name
attribute to indicate
the name of the customer instead of the taxId
attribute. The sales
Web Service also includes an operation known as
GetAverageMonthlyRevenueCName
that returns a compound data element
containing the name and the monthly revenue of all the customers. To
solve the problem posed by this example, follow the steps below:
Create the JDBC base views for the tables
internet_inc_cname
andphone_inc_cname
tables (see section Creating Base Views from a JDBC Data Source).Open the Web service “sales”, click on Create base view beside the operation
GetAverageMonthlyRevenueCName
operation and select Do not stream output. Then, click Ok to see the schema of the new view and click Save to create the view.Join the base views created for
internet_inc_cname
andphone_inc_cname
(see section Creating Join Views).“Flatten” the compound data element returned by the view created from the
GetAverageMonthlyRevenueCName
operation. See section Creating Flatten Views for a detailed explanation of the “flattening” process.Define a join view using the
customer_name
field between the view obtained as a result of step 3 and the view obtained as a result of step 4. Specify operator ~ as the join operator and set the similarity threshold to 0.7.Note
The simple editor is not currently unavailable in Design Studio. You need to manually set the similarity condition of the join with the following syntax:
join_incidents_cname.customer_name ~ (flatten_revenue_cname.cname,0.7)
. To specify the similarity algorithm for this operator, add the name of the algorithm as the third parameter surrounded with single quotes (e.g.join_incidents_cname.customer_name ~ (flatten_revenue_cname.cname,0.7,'Jaro')
)Run the view obtained in step 5. The join is made correctly, despite the fact that the customer names in the input views are not exactly the same.