Considerations When Configuring Data Sources with Pass-Through Credentials

In Virtual DataPort, some data sources support “pass-through credentials”. That is, when a query involves a data source with this feature enabled, Virtual DataPort will connect to this source using the credentials of the user that executes the query.

This allows leveraging the existing authentication and authorization system of the organization.

When “pass-through credentials” is enabled in a data source, the “Login” and “Password” fields are only used at design-time to inspect the sources and test that the connection to the source can be established.

The main reasons for enabling this feature are:

  • To take advantage of the authorization restrictions, usually per row and/or column, in the underlying sources that may vary depending on the user or role. Therefore, Virtual DataPort must use the user’s credentials to connect to the source in order to prevent obtaining unauthorized information.

    Note

    Denodo role-based permissions also allow specifying access restrictions at schema, row and column level, and it applies to any kind of data source (no matter if the data source allows such authorization restrictions or not).

  • Auditing purposes, if the underlying database needs to keep track of which users accessed what information.

The pass-through credentials feature solves these issues. However, there are some considerations to take into account when this option is used along with the cache.

JDBC data source with “Pass-through session credentials” enabled

JDBC data source with “Pass-through session credentials” enabled

The credentials of the data source are used only in design-time to introspect the schema of the source (the schemas of the database in JDBC data sources, the operations of a Web service, etc.).

At run-time, when a query involves a data source with pass-through session credentials enabled, Virtual DataPort will use the credentials of the user who is executing the query, to connect to the database.

The Cache Module, to determine whether the data requested is available or not in the cache, does not check which user issued the query that populated the cache. Thus, depending on the authorization restrictions in place within the underlying sources, you have to consider some extra steps to use the cache and get appropriate results.

For example, let us say that we have a view called CLIENT_DATA whose data is obtained from two data sources. One of the sources is called ACME_CRM, has pass-through credentials enabled and it returns different data depending on the user. For performance reasons, we would like to cache the results of CLIENT_DATA.

Tree View of the view “CLIENT\_DATA”

Tree View of the view “CLIENT_DATA”

Let us say that the query that populates the cache is executed by a user whose privileges in ACME_CRM define a column restriction that masks a column with NULL values and a row restriction that prevents her from obtaining all the rows.

If after this, a user with full privileges executes the same query, the data will be obtained from the cache. The result will be different from a result returned by the source because the two users have different privileges.

There are two options to solve these issues:

  • Do not use cache in these cases and always get the data from the data source.
  • If caching the data is a requirement, there are some extra considerations:
    • The cache must be populated using a user with no restrictions. Thus, the standard approach where the first query fills the cache and it expires using the TTL is not valid. Schedule the queries that load the cache and execute them with a user with no restrictions.
    • The authorization rules defined in the database have to be duplicated in the derived view of interest, using the row and column restrictions and the masking support of Virtual DataPort (see more about these in the section Types of Access Rights)
    • If the queries will have aggregation operations (GROUP BY, HAVING …), the aggregations have to be performed over the cached view. Otherwise, aggregated fields such as COUNT(*) may contain erroneous results. In the following figure, you can see that the aggregation (image2) is performed “before” the data is cached, which may lead to wrong results.
Example of pass-through credentials that may lead to incorrect results

Example of pass-through credentials that may lead to incorrect results

In the following figure, the aggregation is performed over the view with cache enabled. This way, the aggregation can be done on top of the correct results if the appropriate row restrictions are set in the CLIENT_DATA view.

Example of pass-through credentials used correctly

Example of pass-through credentials used correctly

The cache of a view should not be enabled in the following scenario:

  • The data source has pass-through credentials enabled
  • The query delegated to the database has an aggregation operation
  • And, the values obtained for the projected fields in the aggregation, depend on the user who launches the query (like in the COUNT(*) example, mentioned above)

The following figure displays a scenario where pass-through credentials should not be enabled.

Scenario where pass-through credentials should not be enabled

Scenario where pass-through credentials should not be enabled