Applies to:
Denodo 8.0
,
Denodo 7.0
,
Denodo 6.0
Last modified on: 10 Aug 2020
Tags:
Best practices
JDBC data sources
ODBC data sources
View creation
Denodo allows to create base views on top of JDBC and ODBC data sources that, when queried, will execute the SQL query provided in the definition of the base view. This document explains when and how to use this option. The limitations of this functionality will also be explained.
Developers have two options to create base views in Denodo from JDBC and ODBC sources:
SELECT ABS( ROUND( @value ) ) AS value_abs_round…
If an alias is not defined, Denodo will automatically generate a name for the corresponding column in the created base view, which can be counter-intuitive and change from one data source to another.
SELECT function(@INPUTVAR) AS value FROM sales
This document is focused on the ‘SQL Sentence’ option. The following sections explain pros and cons of using this option and some advice on when and how to use it.
Interpolation variables allow the SQL query sent to the database to be parameterized according to the conditions of the query executed on the base view. By using these variables, you do not have to provide the value of a field in an expression at design time (when creating the base view). Instead, you will provide the values of these variables at runtime, when you query the view.
For instance, suppose you have a SALES table in a database and you want to create a base view with a SQL Sentence. You could create a base view SALES_VDB by writing a query like this one:
SELECT Function(amount) as AVERAGE FROM SALES WHERE product_id = @product_id |
and then execute it with different values for the product_id field. Note that the query uses a function in the SELECT clause, this is a typical example of the use of the ‘Create Base View From Query’ option when using a database-proprietary function.
This way, if we execute the following query on the base view in Denodo:
SELECT * FROM SALES_VDB WHERE product_id = 2 |
The Server will substitute the variable @product_id with the value from the WHERE clause of the query and execute:
SELECT Function(amount) AS average FROM SALES WHERE product_id = 2 |
Each interpolation variable must be related with an attribute belonging to the generated base view, so that the variable can obtain its value at run time. If the SQL query used to access the data source returns any field with the same name as the variable, then the variable will be associated to the corresponding base view attribute. If the SQL query does not return any field with the same name as the variable, Denodo will add a new attribute to the base view, with the same name.
Denodo provides a predefined interpolation variable called WHEREEXPRESSION that simplifies the process of creating a base view from an SQL query. At runtime, the Server will replace WHEREEXPRESSION with the condition sent to the base view.
If we create a base view named SALES_VDB with the following SQL query:
SELECT sales_id, product_id, customer_id, profit_margin as margin, StoredProcedure(amount), sales_date FROM sales WHERE @WHEREEXPRESSION |
we will be able to execute queries like the following on the base view in Denodo:
SELECT * FROM SALES_VDB WHERE margin > 0 AND getyear(sales_date) > 2010 |
Note that the user has to specify the actual name of the fields for the aliases used in the SELECT clause of the query. For instance, the SQL query defines the margin alias for profit_margin. Therefore, the user must specify the profit_margin value for the attribute named margin in the VQL query.
The Server will substitute the variable WHEREEXPRESSION with the condition of the WHERE clause of the query:
SELECT sales_id, product_id, customer_id, profit_margin as margin, StoredProcedure(amount), sales_date FROM sales WHERE profit_margin > 0 AND extract (year from sales_date) > 2010 |
Note that the alias name is translated to its original name when the query is finally executed.
Sometimes, you may need to filter the results of the SQL query using fields that are not projected and, therefore, will not appear in the base view schema. Denodo provides a way to do this. A typical example happens when the SQL query used to create the base view uses GROUP BY. For instance, suppose the SALES table has a field store_country that is not included in the schema of a new virtual view SALES_VDB2 defined as:
SELECT product_id, customer_id, sum(profit_margin) as total_margin FROM sales WHERE @WHEREEXPRESSION GROUP BY product_id, customer_id |
but you want to filter the results of the SALES table by store_country, you can execute queries like this over the Denodo base view:
SELECT * FROM SALES_VDB2 WHERE store_country = ‘US’ |
and they will be translated to queries like this executed on the data source:
SELECT product_id, customer_id, sum(profit_margin) as total_margin FROM sales WHERE store_country = ‘US’ GROUP BY product_id, customer_id |
In order to make this work, you have to specify STORE_COUNTRY in the option ‘List of field names not included in the above list that can appear in the delegated conditions’ when you create a base view with WHEREEXPRESSION.
To avoid forcing the queries to always have the WHERE clause, Denodo provides the interpolation function ExecuteIfIsNotNull. For more information about this interpolation function and the WHEREEXPRESSION variable, you can check the section Using the WHEREEXPRESSION Variable of the Virtual DataPort Administration Guide.
Please note that the interpolation variable WHEREEXPRESSION cannot be used with ODBC data sources. Only JDBC data sources are allowed.
This section will explain the main pros and cons of using the ‘Create from query’ option and some recommendations about when and how to use it.
You should use this option when:
Keep also in mind that, when possible you should enable the option ‘Delegate SQL sentence as subquery’(enabled by default starting from Denodo 6.0), located in Options > Search methods > wrapper source configuration in VDP Admin Tool, for the base views created using this option in order to let Denodo delegate more queries to the data source (as shown in Figure 4 and Figure 5). One limitation on this option is explained in the following section.
Figure 1. Lack of encapsulation vs. hybrid approach
In this example we want to retrieve the sales grouped by customer, and for each group, list the results in a ranking ordered by each sale’s profit. This way, the first tuples of each group will show the best customer’s sales. Our data source is an Oracle database and we want to make use of the rank analytical function. Figure 2 shows how to create a base view in Denodo for this:
Figure 2. Example of SQL Sentence with rank function
Now, we can execute queries like:
SELECT * FROM rank_of_sales_by_customer WHERE ranking <= 5 |
in order to get the top #5 sales grouped by customer.
Figure 3. Result of query: top #5 sales by customer
Figure 4 and Figure 5 show the difference in delegation depending on whether the option ‘Delegate SQL Sentence as Sub Query’ is enabled or not:
Figure 4. Execution trace with ‘Delegate SQL Sentence as Sub Query’ set to false: filtering executed by Denodo
Figure 5. Tree view with ‘Delegate SQL Sentence as Sub Query’ set to true: filtering executed by Oracle
When the ‘Delegate SQL Sentence as Sub Query’ option is enabled, the WHERE clause is delegated to Oracle, so the filter is done directly in the source, with the corresponding positive impact on performance, because less tuples are moved through the network from the sources to Denodo.
In this example we use the function ntile to divide the results of the previous example in a number of tiles (this number is specified by the user). So the query will be:
Figure 6. NTILE function with an interpolation variable
Figure 7. Results of query with 4 tiles
Results from queries may differ if the query conditions are directly defined in the base view created by means of a SQL Sentence or if they are applied over that query.
For instance, consider the base view ntile_of_sales_by_customer_parameterized (where the filtering by ss_net_profit is directly defined in the SQL query) defined as:
SELECT ss_item_sk, ss_ticket_number, c_customer_sk customer_id, c_first_name customer_first_name, c_last_name customer_last_name, ss_net_profit, rank() OVER(partition by c_customer_sk ORDER BY ss_net_profit DESC) AS ranking, ntile(@num_of_tiles) OVER(partition by c_customer_sk ORDER BY ss_net_profit DESC) AS ntile FROM EDW_TPCDS.CUSTOMER INNER JOIN ADMIN.STORE_SALES ON (c_customer_sk = ss_customer_sk) WHERE ss_net_profit > @net_profit GROUP BY c_customer_sk, c_first_name, c_last_name, ss_item_sk, ss_ticket_number, ss_net_profit |
If we execute the query:
SELECT * FROM ntile_of_sales_by_customer_parameterized WHERE num_of_tiles = 4 AND ss_net_profit > 1000 |
We get the following results:
Figure 8. Filtering before creating groups
The result is different if we execute the same query over the previously created view ntile_of_sales_by_customer:
SELECT * FROM ntile_of_sales_by_customer WHERE num_of_tiles = 4 AND ss_net_profit > 1000 |
Figure 9. Filtering after creating groups
This is because in the first case, the tiles are calculated without the tuples with ss_net_profit <= 1000 and in the second case with them (although they are filtered later from the final results).