The concept of association in Denodo is similar to the same term in data modelling. Associations represent relationships between the elements of two views. This relationship is structural, because it specifies that elements (rows) of one view are connected to elements of another and does not represent any behavior.
Suppose a scenario consisting of first-class domain entities, such as:
Figure 1 Star Schema generated with Embarcadero ER/Studio.
The Figure 1 shows a typical star schema, where:
The primary key in the three tables is formed by the column id.
Sales has a foreign key for each dimension (cust_id, prod_id), in order to associate them.
The sales_amount column of Sales represents a measure that can be used in calculations and analysis.
The non-primary key columns of Customers and Products represent additional attributes of the dimensions (such as the email of the Customers dimension).
An association is defined by two endpoints and a list of mappings. Each endpoint is associated with a view and has a “role name”, a description and a multiplicity (how many rows there are in one view for each row of the other). In this example, the elements of the Customers view can be related with the elements of the Sales view with cardinality *. This means that every customer is related with zero or more sales.
There are several reasons why you should define associations between views when appropriate based on the following perspectives:
We will see each point with more detail in the last section “What are Associations Useful For?”.
Once you have created the base views in Denodo for your model, you can add their associations. To create a new association click Association on the File > New menu or right-click the Elements Tree and click Association on the New menu. The Tool will open the “View association” dialog, which has two main tabs:
For instance, to create an association between Sales and Customers:
Figure 2 Creating a new association (“Model” tab).
Figure 3 Creating a new association (“Output” tab).
Figure 2 and Figure 3 show an association named "sales_customers", where Sales contains the foreign key (note “Referential constraint” is selected) to Customers and it cannot be null (because of the multiplicity + -one or more-), so Sales is the Dependent view in the association, as every row from Sales will have a match in Customers.
Another association between Sales and Products has to be created in the same way.
The usefulness of associations can be seen from three different points of view:
Associations are useful for integration with third-party modelling tools used to design logical data models (sometimes known as “canonical models”), composed of high level entities and their relationships (as the ones from our example). These tools help to enforce modelling and business data policies, such as what fields the view must have, in what format, etc. This way, when this model is implemented by Data Virtualization (DV) developers, everyone will have to follow those conventions.
Integration between a DV solution and Modelling tools can go in both directions:
Regarding 1), Denodo exposes metadata through standard interfaces such as JDBC/ODBC that can be used by most Modelling tools to import Denodo data models. Metadata exposed by Denodo includes views and relationships between those views (associations between views appear as primary key – foreign key relationships).
Regarding 2), Denodo allows creating interfaces and associations either manually or via VQL, so automatic integration with Modelling tools is possible via API. Denodo has a graphical tool called Denodo Model Bridge to automate this process from the models generated by the most commonly used modelling tools, as shown in Figure 4
Canonical models are an essential component of well-designed and reusable data virtualization solutions. When designing data services in Denodo following a “contract-first” style, being able to define and publish canonical models (via interface views) before implementing the access and integration logic to populate them, decouples the work of DV developers in charge of implementing this logic from that of client application developers who build (informational/operational) solutions on top of the data services.
Figure 4 Process to generate views from external models.
Associations give Denodo information enough to let users traverse relationships between data entities without the need of knowing exact join conditions. Denodo offers two ways of consuming this self-service information:
The RESTful Web service of the Denodo Platform is an out-of-the-box HTTP service that publishes the contents of deployed data web services as well as the entire Virtual DataPort server. The output of this service can be an XML document, a JSON document or a more user-friendly HTML document.
We can traverse the associations just by clicking the link named with the role name we defined when creating the association. This way we can, for instance, invoke the web service to retrieve the sales and automatically go to the product and customer involved in each sale, as shown in next figures.
Figure 5 Query RESTful web service to retrieve all Sales.
Figure 5 shows the result of querying the web service to retrieve all the Sales. For each tuple we have the column “customer” with a link “customer of this sale”, that we created in Figure 3. By clicking this link, we can see who did which order. For instance, if we click this link from the first row, we receive the result shown in Figure 6.
Figure 6 Traverse association from Sales to Customers (cust_id = 1).
Now, from the view of customer John Doe, we could see all his sales, again thanks to the association we defined (“sales of this customer”).
Figure 7 Sales of Customer John Doe (cust_id = 1).
This way, just by defining associations between data entities, Denodo lets business users to browse data relationships by themselves.
Associations may have a big impact on performance. Thanks to the information they provide, Denodo can perform several optimizations in the queries, which reduces execution times and avoids the movement of big amounts of data through the network. Let us see the main optimization: Remove redundant inner joins
In the scope of star schemas and continuing with the model from Figure 1, suppose you now have joined the fact table with its dimensions in an only virtual view, as shown in Figure 8. Note this is a very common strategy in order to make it easy for third-party reporting tools to work.
Figure 8 Joining fact table with its dimensions.
Suppose now the following query:
SELECT <fields from sales and products>
This query only projects fields from Products and Sales, so the join with Customers is not needed. Denodo is aware of this situation and eliminates the last join with Customers from the execution plan (see Figure 9).
This optimization is only possible if the join involves PK_FK ⇒ each tuple in Sales matches with exactly one tuple in Products and with another one in Customers. The join with Customers would only add more information (the customer name, etc.) to the tuples obtained from the first join, but would not affect to their cardinality. This way, as no fields from Customers are projected, this join can be removed from the plan.
Recall that FK-PK restrictions between views in Denodo can be represented through associations (using the Referential Constraint option and adequate cardinalities).
Figure 9 Initial query plan vs. after removing unnecessary join.
VDP Administration Guide: Why Associations are useful in Denodo?