You can translate the question and the replies:

Association in Denodo

Hi, i have 3 base views (A,B and C). association has been created between A and B,B and C as per documentation it says Association helps with performance so when i run my VQL query which joins A to B and B to C .how do i know if association actually improved performance for my query ? i checked execution trace which says this static optimized =no cost optimized =no optimizer message=Missing statistics or not activated in the following nodes: .. tool i am using is denodo virtual dataport admin. we need to make decision if we should be creating association for other views but if it doesnt do anything in terms of performance then we can skip it. please quide me with this. thanks
user
30-03-2023 17:39:17 -0400
code

3 Answers

Hi,  Associations are useful to define for multiple reasons: performance, usability, third-party modeling tool, etc. Its usefulness can be seen in improving the performance, browsing the views and modeling as outlined in the section [Why You Should Define Associations Between Views?](https://community.denodo.com/docs/html/browse/7.0/vdp/administration/restful_architecture/associations/associations#why-you-should-define-associations-between-views)  For the performance aspect, there are some optimizations that can only be applied if the Virtual DataPort Server has enough information on the cardinality of the views. An example of such an optimization would be the (Join) Branch Pruning. You can find the example on the KB Article [Denodo Query Optimizations for the Logical Data Warehouse](https://community.denodo.com/kb/en/view/document/Denodo%20Query%20Optimizations%20for%20the%20Logical%20Data%20Warehouse#:~:text=the%20third%20screenshot.-,Single%2DView%20Approach,-In%20this%20section) under "Single-View Approach" which shows that with the given association the Virtual DataPort Server is able to remove a redundant join from the execution. It should be noted that Associations improve the performance in some scenarios, but not all the time as it is dependent of the nature of the query, the projected fields etc. From the Execution trace you can verify if an optimization has been applied to your specific query including more details. I would also like to note that Associations can be automatically discovered for JDBC sources. More details can be found in the documentation section [Creating an Association](https://community.denodo.com/docs/html/browse/6.0/vdp/administration/restful_architecture/associations/creating_an_association), so creating them is convenient for these sources with that feature. I hope this helps!
Denodo Team
05-04-2023 05:26:58 -0400
code
thanks for the answer but i am still not getting desired results based off of the links and documentation shared above. here is more details and questions question#1 : does association work for many to many cardinality?in our case relationship between table A and Table B is many to many because of SCD(valid_from_dttm,Valid_to_dttm). queston#2: when i join my 3 tables and don't select anything from 3rd table i don't see it removed 3rd table from the plan.as per documentation it says it removes redundant join but i don't see it does. do we know why? here is my sample scenario that i am trying Table A ID TableB_ID valid_from_dttm valid_to_dttm 1 2 01/01/2020 01/31/2020 1 2 02/01/2020 02/28/2020 1 2 03/01/2020 01/01/9999 Table B ID TableC_ID valid_from_dttm valid_to_dttm 2 3 01/01/2019 02/28/2020 2 3 03/01/2020 01/01/9999 table C ID valid_from_dttm valid_to_dttm 3 01/01/2019 01/01/2020 3 02/01/2020 02/28/2020 3 03/01/2020 01/01/9999 i manually created association between A and B . table A is set to primary with cardinality set for now is 1..* between A and B i manually created association between B and C . table B is set to primary with cardinality set for now is 1..* between B and C VQL query SELECT a.id,b.id from tableA a inner join tableB b on b.id=a.tableB_ID and a.valid_from_dttm between b.valid_from_dttm and b.valid_to_dttm inner join tableC c on c.id=b.tableC_ID and b.valid_from_dttm between c.valid_from_dttm and c.valid_to_dttm Expection: i expect table C should be removed from the query plan i should be able to see something in the execution trace that tells that optimization is applied our decision to create association depends upon if we can prove that it helps with performance.so far i am not able to prove that it helps with the performance for our dataset.
user
05-04-2023 13:24:22 -0400
Hi,   Association works for many to many cardinality by unchecking the referential constraints option. In a referential constraint the principal endpoint must have multiplicity 1 or 0..1. If you uncheck the box for referential constraints in the Virtual DataPort Administration Tool or in Design Studios you can create many to many associations.   Regarding your second question, to better explain how the Join Branch Pruning works and how the associations come into play, let's review this below example. **books**  | book_id | title | author_id | | -------- | -------- | -------- | | 1     | Title 1     | 1     | | 2     | Title 2     | 1     | | 3     | Title 3     | 2     |   **author**   | author_id | name |  | -------- | -------- |  | 1     | Author 1     |  | 2     | Author  2     |   Notice that the author_id **must be defined as a primary key**, you will see below why.   We have the books view that contains all the books information where book_id is the primary key. There is a foreign key, author_id, which can be used to join it to the author view to get the name of the author.   In order to apply the Join Branch Pruning on **books join author**, we must guarantee three things, otherwise the optimization cannot be applied (it would potentially yield incorrect results):   1.  None of the fields from the author view are needed for the query. This point is quite obvious, since we cannot prune the author view, if we need information from that view for answering the query.  2.  The join **does not** filter rows from books. For example, if we use an inner join, and there would be an **author_id** in books that **does not exist** in the **author** view. Due to there would be no match for the inner join for that author_id, it would filter out rows of books and we could not prune that join. 3.  The join **does not duplicate** rows from books.  For example, if we had two rows in the author view with the same author_id, we would have multiple matches for the author_id=1 after the join resulting in a duplication of that particular row. Thus, we couldn't prune the join.   We can guarantee that the join will not filter rows, if: a) It is a LEFT JOIN or b) There is a FK-PK JOIN where all the rows in books matches with (at least) one row in author_id. (This is where the association comes into play)   We can guarantee that the JOIN will not duplicate rows, if the join condition is done on the primary key(s) of the author table. As the JOIN field is a unique key in the author view, we know that it matches with maximum one row from the books view. (This is where defining the primary key on the author_id in the author view comes into play and is a requirement for this to work).   This is the VQL for that example using [Materialized Tables](https://community.denodo.com/docs/html/browse/latest/en/vdp/vql/materialized_tables/creating_materialized_tables/creating_materialized_tables).  ``` CREATE OR REPLACE MATERIALIZED TABLE books(          book_id int,          title  text,           author_id  int ) CONSTRAINT 'bookid' PRIMARY KEY ('book_id');   insert into books values (1,'Title 1', 1),(2,'Title 2',1),(3,'Title 3',2);   CREATE OR REPLACE MATERIALIZED TABLE author(            author_id int,           name text ) CONSTRAINT 'authorid' PRIMARY KEY ('author_id');   insert into author values (1,'Author 1'),(2,'Author 2'); ```   This is the VQL code to create the association.   ``` CREATE OR REPLACE ASSOCIATION author_books REFERENTIAL CONSTRAINT      ENDPOINT books author PRINCIPAL (1)     ENDPOINT author books (0,1)     ADD MAPPING author_id=author_id; ```   Query: ``` select book_id,title from books inner join author on books.author_id=author.author_id ```   If you drop the association and execute that query again, the optimization cannot be applied as the Virtual DataPort Server would not know if there are author_id's in the books view that do not exist in the author view (the condition for filtering rows would be violated).   Hope this helps!
Denodo Team
06-04-2023 11:48:49 -0400
code
You must sign in to add an answer. If you do not have an account, you can register here