New Features of Virtual DataPort

This section lists the new features of Denodo Virtual DataPort 7.0 GA. For the list of features included in the updates of Denodo 7.0, check their RELEASE NOTES.

Data Sources

Salesforce.com Data Source

Denodo now has a native data source to Salesforce.com. It can perform queries and insert, update and delete records of your Salesforce account.

For previous versions of Denodo, there is a Denodo Connect component to connect to Salesforce. Starting with Denodo 7.0, users should use the out-of-the-box data source because the Denodo Connect component will not be actively maintained.

The main benefits of the new connector are:

  • You no longer have to enter the connection details to Salesforce for every base view you create.
  • You have a single data source. With the Denodo Connect component you may have to create more than one depending on the type of requests you need to send to Salesforce.
  • More operations are pushed down to Salesforce.

JDBC Data Sources

Oracle

The type “XMLTYPE” of Oracle is now mapped to the type “xml”. In previous version it is mapped to “text”.

SAP HANA

It is now possible to connect to SAP HANA using Kerberos authentication with constrained delegation.

SQL Server

Denodo now includes the Microsoft driver for Microsoft SQL Server. In previous versions, the administrator has to download it from the Microsoft’s website.

The jTDS driver included with Denodo is now capable of establishing SSL connections. The jTDS driver of previous versions of Denodo can only establish non-SSL connections.

When creating a base view over a data source that uses one of the jTDS adapters, the datetime fields are now created with the appropriate data type. In previous versions, they are created with the type “text”.

Folder for JDBC Drivers Not Included Out of the Box

Due to licensing restrictions, Denodo does not include the JDBC driver of all the supported databases. There is now a specific directory where you have to copy the JDBC drivers not included with Denodo: <DENODO_HOME>/lib-external/jdbc-drivers/name of the adapter.

New Databases Supported

The database Presto is now officially supported.

JDBC Data Sources with Kerberos Authentication

When creating a JDBC data source with the adapter “Hive for Cloudera” or “Hive for Hortonworks”, the administration tool automatically adds the “driver properties” required by the driver to use Kerberos authentication.

Generic Adapter

JDBC data sources that use the adapter Generic can now be configured to use statements instead of prepared statements, when sending queries to the database. By default, they use prepared statements.

Subtypes of DECIMAL Fields

When creating a new base view, Denodo now selects the most appropriate type for the numeric fields of the view. For example, if the database reports that the type of a field is DECIMAL with a length of 5 digits and no decimals, the type in the base view will be int (integer).

Connection Pool

Reduced thread contention when the execution engine requests a connection to the pool of connections of a data source.

DF Data Sources

Added support for using form feed (\f) as end of line delimiter.

Added support for using the vertical tab (\v) as column delimiter.

Custom Data Sources

Now the API of custom wrappers provides support to indicate the properties of the fields returned by the wrapper: subtype, length, decimals if it applies, etc.

Denodo 7.0 provides binary compatibility with custom wrappers developed for previous versions of Denodo. That is, they will continue without having to modify them nor recompiling them.

HTTP Requests of Data Sources

Paginated Sources

DF, JSON, XML data sources can now be configured to retrieve the data in pages. The goal is to simplify retrieving data from services that impose a limit on the number of records returned by a single HTTP request.

User Agent

You can now change the value of the HTTP header User-Agent that is sent in the HTTP requests sent by DF, JSON, XML and Custom data sources. To change it, execute this statement from the VQL Shell:

SET 'com.denodo.vdb.http.userAgent'='<new user agent>';

Charset

It is now possible to set the charset (UTF-8, ISO-8859-15…) with which DF, JSON, XML data sources and Custom data sources process data.

In previous versions, it is only possible to set the charset in DF data sources.

In previous versions, XML, JSON and Custom data sources detect the charset of the data on a best effort basis.

OAuth Authentication: More Authentication Grants

There is now support for the following OAuth 2.0 authorization grants:

  1. Resource owner password credentials
  2. Client credentials grant
  3. Authorization code grant (already existed in Denodo 6.0)

These authorization grants are defined by the OAuth 2.0 standard (RFC 6749).

This will provide more compatibility with services that require OAuth 2.0 authentication.

More HTTP Methods

DF, JSON, XML data sources and Custom data sources can now be configured to send an HTTP request of any of these types: GET, POST, PUT, PATCH and DELETE. Previous versions only support GET and POST.

Execution Engine

New Data Types

New data types for date and timestamp values: “localdate”, “time”, “timestamp” and “timestamptz”. These types provide several advantages over the existing “date” type. “date” is still supported to keep backward compatibility.

The section Data Types for Dates, Timestamps and Intervals of the VQL Guide explains how queries that use these types have to be built, functions available to manage them, etc.

Massive Parallel Processing

Denodo provides native integration with several Massive Parallel Processing (MPP) systems to speed up the execution of queries that involve processing billions of rows and cannot be done in “streaming mode”.

The section Parallel Processing of the Virtual DataPort Administration Guide provides more information about this.

Query Optimizer

This section describes the improvements in the optimizer, when doing automatic simplifications of the execution tree of a query.

Push GROUP BY under JOIN

The optimizer now pushes a GROUP BY under a JOIN if all these conditions are met:

  1. It is an equijoin (i.e. the join conditions are like field_1 = field_2)
  2. The fields of the primary key of one of the views involved in the JOIN are part of the join condition.

Push Down GROUP BY that Uses Statistical Functions

The query optimizer now pushes down a GROUP BY that uses the functions stdev, stdevp, var or varp. This does not occur in previous versions.

Push Down Joins over Views with Different i18n

In previous versions, a join of two views whose data is obtained from the same database is not pushed down to that database if the underlying base views have different i18n.

In Denodo 7.0, a join of two views whose data is obtained from the same database is pushed down to that database, unless the underlying base views have different i18n and at least one of the views has one or more fields of type date (deprecated).

Optimizer Reorders JOIN Operations and then GROUP BY Operations

In Denodo 6.0, the optimizer pushes down the GROUP BY operations first and then, reorders the joins in order to maximize the delegation of operations (JOINs, UNIONs…) to the database.

In Denodo 7.0, the optimizer does this the opposite way: reorders the joins in order to maximize the delegation of operations to the database, and then tries to push down the GROUP BY operations first.

This change leads to more efficient execution plans because more operations are pushed down to the database.

For example, let us say we have the views SALES and DATE (data obtained from Redshift) and PRODUCT (data obtained from Vertica).

SELECT count(*)
    ,SUM(sales.UNIT_PRICE)
FROM sales
JOIN product ON (sales.PRODUCT_ID = product."PROD_ID")
JOIN date_dim ON (sales.date_id = date_dim.date_id)
WHERE product."PROD_NAME" LIKE '%a%'
GROUP BY sales.PRODUCT_ID

In Denodo 6.0, the GROUP BY is partially pushed down under the JOIN operations. Because of this change, the JOIN cannot be reordered.

In Denodo 7.0, the JOIN operations are reordered first. This allows for the JOIN of SALES and DATE to be fully pushed down to the database.

Remove Restrictions for Pushing down a JOIN under a UNION

The optimizer is now less restrictive about when it pushes down a JOIN under a UNION:

  • In Denodo 6.0, all the fields of both branches of the UNION have to be associated.
  • In Denodo 7.0, only the fields that are used in the levels above have to be associated.

Decrease Time to Calculate the Best Execution Plan

The optimizer calculates the best execution plan faster. This enhancement particularly benefits queries with a lot of operations that are pushed down to a database.

Cost-Based Optimizer

This section describes the improvements in the Cost-Based Optimizer.

New Alternative for GROUP BY over JOIN over Partitioned UNION

In a query with a GROUP BY over a JOIN, and one of the branches of the JOIN is a partitioned UNION, the cost-based optimizer considers three execution plans:

  1. Leave the execution plan as is.
  2. Push down the GROUP BY over the JOIN, under the UNION.
  3. New in Denodo 7.0: push down the GROUP BY under the UNION without moving the JOIN.

Account for Data Movements Executed in Parallel

The cost-based optimizer evaluates differently the cost of an execution plan with several data movements that can be executed in parallel:

  • In Denodo 6.0, the cost-based optimizer considers that all data movements of a query are executed in series, one after the other.
  • Denodo 7.0 takes into account that the data movements of the query can be executed in parallel. The effect is that the cost calculated by the optimizer is more accurate, which leads to the optimizer taking a better decision.

Evaluate More Options Involving Data Movements

When the cost-based optimizer evaluates the cost of moving the data from one branch of a JOIN to the database of the other branch, it considers three execution plans:

  1. Leave the execution plan as is.
  2. Moving the entire branch to the other database.
  3. New in Denodo 7.0: if the branch has a JOIN that be fully pushed down to its source, the optimizer will consider moving the result of the JOIN, instead of moving its branches.

This third option has a high impact if there are WHERE conditions over any of the branches of the JOIN. The main reason is that it reduces a lot the data that has to be moved from one database to the other.

Other Improvements

Performance improvements in:

  • The execution of conditions and functions.
  • The execution of MERGE JOIN and MINUS.

These changes reduce the amount of memory used to execute these operations and make them faster.

Performance Improvements Inserting Data into Hive, Impala, Presto and Spark

There are several performance improvements in the process of inserting data into Hive, Impala, Presto and Spark.

  • The data files sent to these database use now the Parquet format.
  • When creating tables in the database, they are created specifying the location of the files that will be uploaded. This prevents from having to execute the statement “LOAD DATA INPATH” in the database, after uploading the data to the data store.

regexp Function: Better Error Message

The regexp function returns a clearer error message when the input regular expression is invalid.

Access to Metadata.

The Server now access the metadata less frequently and it does so faster.

Interpolation Variables are Considered View Parameters

In Denodo 7.0, interpolation variables are evaluated as view parameters so conditions like “<interpolation variable> = <value>”, are considered an assignation of value and not a filtering condition.

For example, let us say that:

  • There is a JDBC base view “customer” created over a SQL query with an interpolation variable “zip_code”.
  • There is a base view “sales”
  • You want to obtain all the sales of customers in the ZIP code 94301 and the sales for which you do not know the customer that did them.
SELECT ...
sales LEFT OUTER JOIN customer
WHERE customer.zip_code = 94301

In Denodo 6.0 this query only returns the sales of the customers in the ZIP code 94301. The reason is that customer.zip_code = 94301 assigns a value to the variable “zip_code” but also filters the results of the LEFT OUTER JOIN.

In Denodo 7.0, this query returns the results you want because customer.zip_code = 94301 only assigns a value for the variable “zip_code”, it is not used to filter the results of the LEFT OUTER JOIN.

In Denodo 6.0 it is possible to achieve the same behavior as in 7.0 by defining a view parameter in “customer” and in its WHERE condition, add a condition “zip_code = <view parameter>”.

Cache Engine

Cache Engine Supports More Databases

The cache engine now supports these databases:

  • Hive 2.0
  • Impala 2.3
  • Presto 0.1
  • Spark 1.5, Spark 1.6 and Spark 2.x

“Use Bulk Data Load APIs” Enabled by Default in Some Databases

In the cache settings, the option Use bulk data load APIs is now enabled by default for these data sources: Hive, Impala, Presto, Snowflake and Spark.

Performance Improvements

During a data movement, the cache engine creates a table in the target database. In previous versions, it creates a table with all the fields of the table in the source database, although later, it only inserts the data of the fields the query needs. In Denodo 7.0, the table of the target database is created just with the necessary fields.

Oracle: Length of Fields Defined in Characters

When the cache database is Oracle, the maximum length of the fields of type CHAR and VARCHAR2 is defined in characters. For example, if the type size of a field of a view in Denodo is 100, the type of that field in the table of Oracle for this field will be CHAR(100 CHAR). This feature only applies to fields that have its length defined in their Source type properties.

In previous versions, the length is defined in bytes (e.g. CHAR(100)). This can cause the cache of a view to not be loaded if one of the text values has several multi-byte characters. When this occurs, the query that loads the cache returns an error.

Return Error with ‘cache_wait_for_load’=’true’, ‘cache_load_on_error’=’true’

The queries that load the cache of a view, with the parameters 'cache_wait_for_load'='true', 'cache_load_on_error'='true', now finish with state ERROR. In previous versions, the query appears to finish successfully even if there is an error.

Configure Cache Engine to Use a JDBC Data Source

It is now possible to configure the cache engine to use an existing JDBC data source.

Denodo Stored Procedures

API

The API of stored procedures now provides methods to execute DDL statements like CREATE VIEW, ALTER DATASOURCE, DROP WEBSERVICE, etc. from a stored procedure, with the method DatabaseEnvironment.executeVqlCommand(…).

Timeout

When a procedure is invoked with the syntax CALL <stored procedure> ( <parameters> ), it is now possible to indicate the parameter queryTimeout in the clause CONTEXT. In previous versions, to set a timeout for a specific invocation to a procedure, you need to invoke the procedure with the syntax SELECT ... FROM <stored procedure> ( <parameters> ).

Web Services

This section lists the new features in the web services published from Virtual DataPort.

UserAgent Configurable

The value of the HTTP header “User-Agent” of requests sent to SOAP and REST web services is passed to the monitoring layer of the Virtual DataPort server. The main benefit is that the applications that monitor the Virtual DataPort server (Denodo Monitor, Solution Manager, any other JMX application) will see that the property “UserAgent” of the query is set to the value of this header.

OAuth 2.0 Authentication

Added support for OAuth 2.0 authentication in published web services (SOAP and REST).

Parameter $expand

New parameter in REST web services: $expand. This is equivalent to the clause EXPAND of the SELECT_NAVIGATIONAL statement.

Replace/Drop Individual Elements

Added support to replace or drop individual views from a REST web service, using VQL statements. In previous versions, you either do this from the administration tool or replacing the entire web service (CREATE OR REPLACE WEBSERVICE).

The same applies to operations of SOAP web services.

Swagger

Denodo REST web services now publish an OpenAPI 2 / Swagger specification that describes the available operations of the web service, and their input and output schemas.

Administration Tool

The tabs can now be closed with the wheel button.

The administration tool now executes less statements when establishing a connection with the Denodo server. The main benefit is that it connects slightly faster.

The panel to display the VQL of views now has an option to show the statistics of the view.

JDBC Driver

New Property describeNationalCharTypesAsBasicTypes

The JDBC driver has a new property (describeNationalCharTypesAsBasicTypes) that allows a client application to report the fields of type NCHAR, NVARCHAR, NCLOB and LONGNVARCHAR as CHAR, VARCHAR, CLOB and LONGVARCHAR respectively.

JDBC API: XML Data Types

The following methods are now supported by the JDBC driver:

ResultSet.getSQLXML(int columnIndex);
ResultSet.getSQLXML(String columnLabel);
PreparedStatement.setSQLXML(int parameterIndex, java.sql.SQLXML xmlObject);

External Dependencies

The JDBC driver no longer causes dependency conflicts with other open source libraries (log4j, commons-collections…)

In previous versions, there can be a conflict when the driver is loaded in an application that already includes these libraries, but an older version of them.

Administration

This section lists the new features that are useful for administrators of Virtual DataPort.

Obtain Information about Web Services

There are three new stored procedures that return information about web services:

  1. GET_CATALOG_METADATA_WS: returns information about the REST and SOAP web services, its operations and parameters.
  2. WEBCONTAINER_ELEMENT_STATUS: returns information about the REST and SOAP web services and widgets that have been deployed in the web container of Denodo.
  3. WEBCONTAINER_META_INF: returns the current configuration of the web container of Denodo.

Impose Quota Limits

Administrators can now create a restriction in a plan of the Resource Manager to limit to a user or role the number of queries per minute, hour, day or month.

Active Loggers

There is a new stored procedure GET_ACTIVE_LOGGERS(). It returns information about the log categories that have been set in the Virtual DataPort server. It returns the categories set in the file <DENODO_HOME>/conf/vdp/log4j2.xml and the ones set by the procedure LOGCONTROLLER.

Enhancements when Exporting Metadata to VQL Files

  • When an administrator exports the metadata of the whole Server (not only a few databases), the VQL file includes the values of all the properties of the Denodo server. In previous versions, only some properties and the properties added with the command SET are included in the VQL file.
  • All the VQL files now include a comment at the top saying which version and update of Virtual DataPort server generates the VQL.
  • You can now obtain the VQL of several elements with a single statement: DESC VQL LIST.
  • The command ENCRYPT_PASSWORD encrypts a password so you can use it as the password of a data source. This command has now the parameter FOR_PROPERTIES_FILE, which generates the password with certain characters escaped, as needed in a properties files.

Changes in Underlying Sources

  • There is a new stored procedure: GET_SOURCE_CHANGES. This procedure can search for changes in the underlying source of base views of any database. SOURCE_CHANGES (already available in previous versions) can only search in the current database.
  • When doing a source refresh of a base view from the administration tool, the types, properties and descriptions of the fields are no longer replaced with the information from the source. This allows for users to keep the information they manually set on the base view.

VIEW_DEPENDENCIES Procedure

The stored procedure VIEW_DEPENDENCIES now allows to distinguish between stored procedures developed by the user and stored procedure included in Denodo “out of the box”. The column dependency_type returns a different value depending on the type of the dependency.

Privileges

New Privileges

The privileges system has been upgraded to provide a more fine-grained control over what actions the users are allowed to do. The privileges you can grant to users or roles over databases are:

  • Administrator of the database
  • Connect
  • Create (includes “Create data source”, “Create view”, “Create data service”, “Create folder”)
  • Create data source (new)
  • Create view (new)
  • Create data service (new)
  • Create folder (new)
  • Metadata (new)
  • Execute
  • Write

In previous versions you can only grant the privilege “Create”. Now you can control what types of elements users are allowed to create.

You can grant the following privileges over data sources, views, REST and SOAP web services, and widgets:

  • Metadata
  • Execute
  • Write
  • Insert
  • Update
  • Delete

In previous versions, you cannot grant privileges over data sources, nor web services, nor widgets, nor stored procedures. Only administrators, administrators of the database and the owner of these elements can modify them.

Now, the ownership of an element is not taken into account when deciding if a user is allowed to modify/delete an element.

The output parameters of the procedure CATALOG_PERMISSIONS have changed accordingly to these new privileges.

CONNECT Privilege

In Denodo 7.0 you have to grant the CONNECT privilege to a user/role over a database in order to allow this user to access any of the elements of that database. The other privileges over the database or its elements are disregarded if the user/role does not have the privilege CONNECT.

In previous versions, the CONNECT privilege was only checked when establishing the connection.

FILE Privilege

The FILE privilege is now enabled by default.

Default Roles

There are new default roles that are created by default and cannot be deleted:

  • data_catalog_admin *-1: Grants the privilege of modifying the settings of the Denodo Data Catalog.
  • data_catalog_exporter *-1: Grants the privilege of exporting the results of a query, from the Denodo Data Catalog.
  • solution_manager_promotion_admin: Grants the role “Promotion Administrator” over the Solution Manager administration tool.
  • solution_manager_admin: Grants administration privileges over the Solution Manager administration tool.
  • web_panel_admin”, “Grants administration privileges over the web panel.

*-1: The roles “data_catalog_admin” and “data_catalog_exporter” are equivalent to “selfserviceadmin” and “selfserviceexporter” respectively. The latter ones already exist in previous versions, where the “Data Catalog” is called “Information Self-Service Tool”. The new roles have been created for consistency with the new name of this tool.

Creating of Jars and Maps

Only administrators can import and delete jars, and create, modify or delete internationalization maps. The reason for imposing this new restriction is that modifying these elements can potentially affect elements of all the databases.

Monitoring

Users with the jmxadmin Role

From now on, only administrators or users with the role jmxadmin can connect to the JMX interface of Denodo. In previous versions, all users could do it but the actions they could do were limited.

Query Monitor

The Query Monitor no longer lists all the queries running, to users with the role jmxadmin. In previous versions, administrators and users with this role can see all the queries that are running. Now, only administrators and users with the role “serveradmin” can do this.

Now this role only affects who can connect to the JMX interface of the Denodo server.

Denodo Monitor

  • Added a script for shutting down the Denodo Monitor. In previous versions it has to be stopped by killing the process.
  • Added support for pinging data sources to make sure they are still available. This mechanism is not enabled by default. To enable it, modify the properties vdp.datasources.ping and vdp.datasources.ping.timeout of the file ConfigurationParameter.properties in the Denodo Monitor.

Denodo4E Eclipse Plugin

The Denodo plugin for Eclipse now supports Eclipse Oxygen.

Others

Improvement in the Authentication of Users

Virtual DataPort now caches in memory the roles of the users that connect to Denodo using LDAP authentication or Kerberos authentication. This will reduce the connection time for clients.

Importing a VQL file with a properties file now consumes less memory.

The CONNECT statement now supports passing an OAuth 2.0 token, in addition to allow user/password and a Kerberos token.