Tableau and Denodo Best Practices

Applies to: Denodo 8.0
Last modified on: 07 May 2021
Tags: Best practices External clients Optimization Performance Tableau

Download document

You can translate the document:

Introduction

This document explains the best practices when integrating Tableau and Denodo together. It  has been created using Denodo 8.0 20210209 and Tableau 2020.3 so certain capabilities may be different if using other versions of the products.

How to migrate a Tableau Dashboard to use Denodo as data source

If you already have a set of Tableau Dashboards built and pointing to data sources different from Denodo, the first step should be to replace those Data Sources with a Denodo Data Source. This is an overview of the steps needed to go through that process:

  1. Create the data sources available in Tableau as Data Sources in Denodo.
  2. Import the tables used in the Dashboards as Base Views in Denodo to prepare for the migration to Denodo.
  • This information can be obtained from Tableau Server using the table “datasource” from the workgroup repository based on PostgreSQL at Tableau Server. This VQL file builds a view (denodo_workbook_usage) in Denodo that extracts information from the PostgreSQL database in Tableau Server and brings you the information needed to locate the sources of the data used in Tableau.
  1. This will require a user with privileges to access the Tableau PostgreSQL database
  • You can also check manually the sources of the data directly in Tableau. All you need to do is download the workbook that you are interested in and edit the Datasource, this will give you the information about the Tables used and where it does connect.

  1. Change the Data Source in Tableau as explained in the next section “Changing the Data Source
  2. If there are changes in the metadata (e.g. different fields names between data source and physical table), use the “Replace References” option as shown below after the view has been imported. This option allows you to replace a reference, pointing the field with error to the right correspondent field.

  

  1. Once is ready, execute your Tableau testing using your current testing methodology.
  2. Once all the reports are working, Denodo recommends to iteratively review the reports that require modifications or improvements due to different reasons such as performance, new fields needed, data from other data sources, etc.
  • By having Denodo as the only source of data for Tableau, this process can now be done more efficiently without affecting the final users during development or migration of certain dashboards.

Changing the Data Source

As mentioned above, after you have your Denodo data sources and views created, you will need to change the data source for your Tableau Dashboard. The best way to do this replacement is using the steps described below:

  1. Connect to the new data source in Denodo

  1. Go to the original dashboard, right click on the old data source and select “Replace Data Source”.

  1. Choose the new Denodo data source created in Step 1 and click Ok. That should be it!

Location of the data

When using Denodo as a data source for Tableau, the data can be obtained from different places due to the different features provided by each product, Tableau and Denodo.

In the case of Tableau, the data can be obtained from Denodo in real time or it can be materialized in a Tableau Data Extract. In the case of Denodo, the data can also be obtained from the sources in real time or it can be materialized using different techniques such as caching or summaries.

The functionalities can be mixed and matched, for example:

  • Tableau can use/refresh a Data Extract using a view in Denodo that is already cached so  Tableau’s Data Extract is created faster.
  • Tableau can access Denodo in real time but Denodo can have a view or summary already pre-cached.
  • Tableau can use a Live Connection to Denodo and Denodo use a real time connection to a source and a pre-created summary combined together.

Although it will depend on the use case, the recommended architecture will be a “Live connection” from Tableau to Denodo in combination with the different Denodo optimizations provided such as Denodo Cost Based optimizer, Summaries and Cache. This recommendation will enable the maximum level of reusability and minimum maintenance when there are more client applications than Tableau in our architecture.

Tableau options

Real Time - Live Connection

When accessing a data source in real time from Tableau, this option is called a “Live connection”. Live connections offer the convenience of real-time updates with any changes in the data source reflected in Tableau. Live connections also rely on the database for all queries and unlike extracts, databases are not always optimized for fast performance. With live connections, your data queries are only as fast as the database itself.

When using this option to access Denodo, a query will be sent to Denodo and the Denodo optimizer will be in charge of sending the queries to the data sources or leverage other Denodo optimizations such as caching and summaries.

Data Extracts

Extract files are a local copy of a subset or entire data set that you can use to share data with others, when you need to work offline and improve performance. When creating a new data source in Tableau, it is possible to use this option or the “Live connection” explained above.

Working with Large Tableau Data Extracts

A very common problem with Tableau dashboards could happen when users would try to extract very large data sets from their sources into a Tableau Extract. This could lead to the generation of a complex query for the data source to handle and may become a performance issue.  

Denodo’s recommendation is that if the “Live connection” performance is satisfactory, there's no need to use Extracts. Also, before using Extracts it is possible to leverage other Denodo optimizations as mentioned in section Avoiding extracts using summaries so the optimization could be reused by other client applications other than Tableau.

If a Tableau Extract is really needed, Denodo recommends to generate an Extract using Tableau’s Aggregate Extracts based on the fields that will be used only (clicking on the button “Hide All Unused Fields”) in combination with the available feature  “Aggregate data for visible dimensions in Tableau”. It is also possible to add extract filters and create incremental extracts by indicating  a field that will control this "delta increment" like a code or date. This considerably reduces the amount of data needed when querying Denodo so the extracts are generated more efficiently.

More information on this topic can be found here. 

Temporary Tables based on the query

Certain Tableau data sources (like Denodo) support the creation of temporary tables inside the physical data source to improve the performance of a dashboard. An example of when Tableau creates a temporary table is the creation of context filters that are reused in different parts of the dashboard.

Working with Temporary Tables

By default, Tableau’s Denodo connector has this option enabled although Tableau automatically will switch to a different method if there is any error when creating or inserting the data.

As Denodo does not store any data by default (Temporary tables are created in the Denodo Cache database), in order to leverage this option in Denodo, the cache needs to be enabled for the server or the virtual database where Tableau is connected. If the cache is disabled, the query will still work as mentioned above but it won’t use the temporary table feature but a different type of query.

If the cache is enabled in Denodo, Tableau will try to insert data into the temporary table in two different ways. One of them is using a SELECT INTO statement which is faster and the second one is running a set of individual INSERT statements which could become a bottleneck when the amount of data that Tableau needs to insert into the temporary table is very big.

Certain functionalities (and how the queries are generated) in Tableau can be modified by creating a custom TDC file. For this particular scenario, it will be possible to disable the creation of temporary tables so a different strategy is used by disabling the property CAP_CREATE_TEMP_TABLES.

More information on creating a custom TDC File can be found in the Modifying Denodo queries using TDC section.

As it is not possible to know beforehand if the amount of data will be big or not, we can not  provide a rule of thumb to determine when this option should be disabled. This document just tries to provide as much information as possible so the reader can take the best approach depending on their scenario.

Denodo options

Real Time

The Denodo optimizer parses the queries that Tableau sends and sends the optimal queries to the different data sources, in real time.

Due to Denodo’s first-class optimization techniques, in most of the cases this is the only option required and no materialization is needed.

Cache

Denodo incorporates a Cache Engine that can store a local copy of the data retrieved from the data sources in an external database such as Oracle, SQL Server, DB2, MySQL, PostgreSQL, Netezza, Teradata, Vertica, Redshift, SQL DW, Athena, Hive, Impala, Presto, SAP Hana, Snowflake, Spark, YellowBrick...This may reduce the impact of repeated queries hitting the data source and speed up data retrieval, especially with certain type of sources.

Although this option may sound similar to Tableau’s Data Extract functionality, it has many benefits such as being able to reuse the contents of the cached data among other queries or other client tools.

This option can be used in combination with Tableau Data Extract although the data will be stored in two different places (Tableau Data Extract and Denodo Cache) and will need to be refreshed accordingly.

Summaries

With Denodo, a user can store the result of a query on a table of an external database with the purpose of providing smart query acceleration when executing other queries. This feature is called summaries. Similar to the concept of aggregation awareness used by reporting tools (BO, MSTR), materialized views in Oracle or indexed views in SQL Server.

When building the execution plan of a query the optimizer can identify parts that match with the query used to create the summary. If possible, these matches will be substituted with accesses to the summary. With a well defined summary, the performance of the query to execute will be greatly boosted.

One of the main use cases of Summaries is to accelerate the execution of analytical queries like the queries sent by Tableau for most of the dashboards. By creating the right summaries, the Tableau dashboards can benefit from reusing them and improve the performance.

The main benefits are:

  • Summaries are transparent to the user. The query optimizer will automatically analyze if it can rewrite the incoming queries to take advantage of the data in the summary or if the detailed view needs to be accessed without the user being aware of its existence.
  • Summaries are source independent: they are not restricted to the data source configured for caching
  • All client tools will be able to leverage them. No need to introduce extract techniques nor specific configurations that leverage summaries specific to any particular product.
  • Denodo has introduced machine learning algorithms that will recommend what are the best summaries based on usage, further decoupling patterns of consumption from the underlying data sources.

From a developer perspective there are additional benefits:

  • Full data lineage for the summary view used in its definition.
  • Summary views can be automatically invalidated if the definition of an underlying view changes.
  • Summary view refresh can be done in batch using the Denodo Scheduler.

More information about Summaries including when to use the Denodo Cache or Summaries can be found here.

How to know what Summaries to create for the dashboards

In order to identify which summaries should be created based on the current Tableau Dashboards it is possible to follow these simple steps:

  1. Collect performance statistics with current Tableau Extracts and Summary Extracts.
  2. Send end user queries against Denodo itself without the use of Extracts.
  1. This may require some help from the Business Analysts to make these available for testing.
  1. Use Denodo Summaries Recommendation based on Machine Learning algorithms (feature available in Denodo 8 update 2, otherwise infer best aggregates based on usage patterns).

  1. Implement Summaries based on the recommendations.
  2. Send the same end user queries against Denodo itself and verify overall performance improvement.

Avoiding extracts using summaries

Normally extracts are created for a specific purpose and will not cover all different usage needs that will be required from a data source. It’s very common to have duplicated or almost identical  extracts having minor differences such as some fields or filters. This will require lots of processing in Tableau Server and lots of schedules to keep those extracts updated.

For multiple scenarios, it is possible to create a summary in Denodo that will be reused, rather than independent Tableau Data Extracts. That will minimize the processing done by Tableau Server, simplifying the maintenance and will maximize the reusability among Tableau Dashboards or queries coming from other client applications.

Performance considerations

From a high-level perspective, Denodo’s Optimizer core objectives are:

  • Minimize Data transfer through the network and at the same time
  • Maximize “local” data processing

The navigation for a query execution is as follows:

Based on this picture we can highlight some areas where a few possible “Bottlenecks” may take place:

  • Slow network, high latencies:
  • This risk has been reduced over time as networks have significantly improved. Often seen querying across continents or cloud providers without reasonable pipelines.
  • A possible solution is multi-location of Denodo on large implementations.
  • The easiest way to solve these latencies is to increase the bandwidth between on-premises and cloud.
  • Cannot push delegation to the source, source does not allow for delegation:
  • Typical of storage without compute in object storage: We recommend customers to add a compute engine on top of storage, Denodo provides a DenodoConnect component that allows customers to seamlessly use Presto for this purpose (the same engine AWS Athena uses and originally developed by Facebook).
  • Servers reaching capacity or memory settings issues of the Denodo middle tier:
  • Increase number of cores.
  • Increase memory settings in the middle tier.
  • Use of SSD to minimize impact of swapping.
  • Evaluate query plans to ensure that they are optimal in the middle tier.
  • High complexity for a query or set of queries:
  • Cache as an option.
  • Consider use of Denodo Summaries.
  • Sheer volume of data for a query or set of queries:
  • Consider parallelizing extraction, incremental extracts and similar procedures.
  • Suboptimal plan generated or selected by Denodo (for example not delegating a calculation, or choosing a plan with higher cost):
  • Could be related to the Denodo Statistics not being correct or due to missing Associations between the views or metadata not properly defined e.g. PKs or Indexes.
  • Too many resources used by a user or group of users
  • Implement Denodo Throttling capabilities of the resource manager (plans and rules).
  • In analytical cases possible use of summaries.

From a methodology standpoint, detecting bottlenecks in a query, a source (data source + network), the denodo server or the client itself refer to the following article: Best Practices to Maximize Performance IV: Detecting Bottlenecks in a Query.

Essentially the process to follow is:

  1. Look at the Denodo execution trace:
  1. On tree nodes look for keywords like swapping, memory limit, effective time, etc.
  2. On leaf nodes (data sources queries) look at the execution time, response time, actual SQL sentence, connection time or call time.
  1. Determine the main or dominant factor in the execution which may be the client (client+network), Denodo itself or the source (source+network).
  2. For that particular bottleneck reason, follow step by step instructions described in the article above.

Debugging Tableau Dashboards

When using Denodo in combination with Tableau, there are two different ways of debugging what is happening. One is using Tableau’s monitoring tools and the other one is using Denodo’s.

For example, from Tableau’s Performance Recorder, it will be possible to understand the queries that Tableau is sending to Denodo and what is the complete time that the Dashboard took to generate. From Denodo’s monitoring tools it will be possible to check what are the queries sent to the actual data sources and what optimizations Denodo is using to resolve the query sent by Tableau.

Tableau

The first step to detect issues in our Tableau dashboard will be to run a performance record in Tableau to determine the step that is taking more time than expected.

This can be done in two places depending on what is the tool we are using, Tableau Desktop or Tableau Server.

Tableau Desktop

The first step to debug a dashboard on Tableau Desktop is create a “blank” sheet, because it has to be the first to load when we open the workbook

Now you should have a blank sheet as part of your dashboard.

Save the file with the blank sheet, open and close Tableau Desktop.

Note: This step is very important so we can ensure that cache is not being used.

Reopen your Tableau workbook. The blank sheet must be the first thing you will see

Click on Help > Settings and Performance > Start Performance Recording.

Click on the dashboard you want to debug and wait for it to load completely.

After the Dashboard loads, we have to stop the performance recorder, by clicking on Help > Settings and Performance > Stop Performance Recording.

A new Tableau Desktop Window will open, and the trace will be recorded.

The green bars are the queries sent to Denodo so let’s click in a green bar and the report will show the query executed in addition to the execution details.

More info on how this process is done on Tableau Desktop can be found here.

Tableau Server

In order to be able to record the performance in Tableau Server, our user has to have the proper permissions and performance recording needs to be enabled in Tableau Server. More details in the link at the end of this section.

These are the steps required to record the performance of our dashboard:

  • At the end of the Dashboard URL add:
  • ?:record_performance=yes&:refresh=yes
  • This will configure Tableau to run a query in debug mode and also to refresh the data and not to use Tableau’s Server cached one.

  • Once the dashboard is loaded, a new button will appear

  • Clicking on that new button, will show the execution trace similarly to what is shown in Tableau Desktop as described in the previous section.

More info on how this process is done on Tableau Server can be found here.

Denodo

Denodo provides different tools to understand the queries that are being executed. This section covers them.

Query Monitor

The Query Monitor provides the queries actively running in the server. For the purpose of debugging Tableau dashboards this method will not be sufficient as queries will be deleted from the monitor once they are finished.

Denodo Monitor

This method will store in files or a database (depending on the configuration used) all the queries executed in the Denodo Server. The Denodo Monitor can be enabled in the Solution Manager or a standalone Monitor can also be started depending on your testing needs and configuration.

Before we run the Tableau Dashboard that we want to debug, it is useful execute the following command in Denodo (e.g. using VQL Shell) in order to delimit the beginning of the test:

CALL WRITELOGERROR('Starting Test Dashboard 1')

Denodo Diagnostics and Monitoring Tool (DMT)

Although the DMT allows you to check the queries actively running (similar to the Query Monitor), this tool can also be used to generate a Diagnostic and visualize the logs obtained from the Denodo Monitor.


The combination of both the Denodo Monitor and the DMT will give you a good understanding of the queries executed by the Denodo Server.

Obtaining the full trace plans

For certain scenarios, it will be necessary to obtain the full query plan generated by the Denodo Server. In order to store individual plans for every single query executed, you will have to enable the saveTrace feature as described here.

Understanding Tableau queries to Denodo

When reviewing the Denodo logs, we will find that there are additional queries than just SELECT / CREATE TEMP TABLES/ INSERT commands that reach our Denodo Server. Those queries include DESC MAP, DESC SESSION, CALL GET_VIEW_COLUMNS, DESC VIEWSTATSSUMMARY, etc. The reason why we are seeing those queries in the log, is that Tableau has to send additional metadata queries to Denodo so it can optimize the queries and obtain the column types, statistics and metadata such as PK/FK information.

This is an example of the queries that will be sent by Tableau to Denodo for a common dashboard

Although it is possible to disable some of the metadata queries using a custom TDC file, the recommendation is not to do so. The reason is that the performance of the Tableau dashboards could be affected by disabling them and it could lead to Tableau generating less optimal queries for Denodo.

Things to Consider when Debugging

Have in mind that sometimes the problem is not related with the databases or extracts, sometimes the performance is related to the Tableau Dashboard itself and how it was designed.

It is important to validate and identify if the majority of the problems are on the queries execution or in the compute layout phase. Dashboards that have so many points to render typically have a high compute layout time. In addition a high connection time (especially on server) means that maybe there are too many layers of network between Denodo and Tableau Server and this will impact the performance of the execution. Normally the execution times should be similar on Desktop or on Server, differences can indicate that server is undersized or with hardware problems.

Authentication methods

The integration between Tableau and Denodo supports different authentication methods. When using Denodo as part of our architecture, there are different authentication processes happening at different levels:

  1. Tableau User to Tableau Server.
  2. Tableau Server to Denodo Server.
  3. Denodo Server to Data Sources.

Tableau User to Tableau Server (Step 1 of the diagram)

As seen in number 1 of the diagram above, a Tableau User first connects to Tableau Server. This can be configured using different mechanisms that can be found in the matrix described here.

To do Single Sign On (SSO) between Tableau Server and Denodo (as described in the next section) the options supported are SAML and Integrated Windows Authentication (Kerberos).

Tableau Server to Denodo Server  (Step 2 of the diagram)

After a user is authenticated in Tableau Server, Tableau Server needs to make a connection to the Denodo data source (number 2 of the diagram above). This can also be done using different methods such as Kerberos, a Service account or prompting for the user credentials once per session. You can see the different options below classified by using SSO or not.

Options for Single Sign On

SSO - Integrated Authentication (configuring Kerberos in Tableau)

Kerberos delegation enables Tableau Server to use the Kerberos credentials of the viewer of a workbook or view to execute a query on behalf of the viewer. This will enable SSO to access the Denodo data source.

In order to benefit from this option, we need to use SAML or Kerberos as authentication in the Tableau user to Tableau Server login step.

Note: the user account of Tableau Server in Active Directory needs to have “constrained delegation” enabled (there is an exception described below) and you have to add the Denodo server to the list of allowed services. Otherwise, Tableau will not be able to obtain the Kerberos credential to connect to Denodo on behalf of the end user.

It is possible to use “standard delegation” instead of “constrained delegation” in the user account of Tableau when the following circumstances apply:

  • The users log in to Tableau Server with Kerberos authentication.
  • And in Denodo, the JDBC data sources do not use “pass-through credentials”.

Detailed information about this topic can be found here.

Options without Single Sign On

Service Account

It is possible to use a “service account” to authenticate to Denodo from Tableau Server. This option is easier to implement but all users authenticated by Tableau will log in to Denodo under the same username. This is the option used when the Denodo data source uses “Username and Password” as the “Authentication” option as shown in the screenshot below:

Credentials prompt by session

With this method, when a user logs to a Dashboard for the first time, Tableau will request the username and password to access Denodo. The user only has to enter this information once per session as Tableau will cache the credentials for the duration of the session. This option is configured when publishing a data source separately or embedded in a dashboard as shown below:

Denodo Server to Data Sources  (Step 3 of the diagram)

Options for Single Sign On

Pass-Through

When using this option in a Denodo data source, the Denodo server will use the credentials of the user to connect to this database. The behavior changes depending on the authentication method used to connect to the Server:

  • If Tableau connects to Denodo using Kerberos authentication, Denodo will request a Kerberos ticket to connect to the database on behalf of this user. Note that for this option to work, the database must also support Kerberos authentication.
  • To be able to do “full pass-through credentials” with Kerberos, the service account for the Denodo server in Active Directory has to have constrained delegation enabled. Otherwise, Tableau will be able to connect to Denodo with Kerberos on behalf of the user but the queries to data sources of Denodo with “pass-through credentials” will fail.
  • If Tableau connects to Denodo using login and password, Denodo will use this login and password to connect to the database.

Example of a full “pass-through” flow:

  1. A Tableau user logs in to Tableau Server with Kerberos (a browser opens and logs in automatically).
  2. Tableau Server gets a Kerberos ticket to connect to Denodo on behalf of this user.
  3. Denodo obtains a Kerberos ticket to connect to the data source e.g. SQL Server on behalf of the user that connected to Tableau.

Options without Single Sign On

Kerberos

Denodo will connect to the data source using a Kerberos ticket generated at the Denodo Server level (not at the Tableau Server level like when using the Pass-Through option explained above).

Login and Password

The Denodo server will connect to the data source using the login and password specified.

Common Errors

This section covers some of the common errors that can be found when integrating Tableau and Denodo together.

Invalid queries in certain reports or dashboards

There could be scenarios where the query generated by Tableau does not work as expected and returns an error when accessing Denodo. In those scenarios it will be useful to provide your Denodo or Tableau Support Agents the queries that are being generated by Tableau, the versions of both products, a VQL file containing the Denodo metadata and a sample of Tableau’s .TWB file if possible.

For these scenarios, one test that can be executed, is to use a Generic ODBC connection to Denodo instead of using the Denodo certified adapter. The purpose of these tests is to validate if the query syntax used by Tableau when selecting the Generic ODBC adapter makes the query work so the Support Agent can pinpoint the issue faster.

Using the generic ODBC connector is only recommended for testing/debugging purposes and Tableau will not provide support to customers if this generic connector is used when accessing Denodo instead of the official Denodo connector.

Error reusing the session id

It is very common for users to share Tableau’s Dashboards URLs after it gets loaded (e.g. during a POC or Testing phase). Those urls usually contain an id parameter with the session id similar to:

When including the iid parameter as part of your url, Tableau may try to reuse a session that may no longer exist and Tableau will return an error.

Instead, you should remove the iid parameter before sharing the URL so Tableau generates a new session id that should work as expected. The url should look like this:

Modifying Denodo queries using TDC

Tableau allows you to modify how queries are sent to Denodo by creating a custom TDC file containing different configuration parameters. By default, Tableau’s Denodo connector has the proper values and there is no need to modify them.

For certain scenarios, such as the creation of a temporary table (as described above), the modification of the fetch size (as explained here) or when debugging a report or dashboard that fails, it could be useful to know how to modify those properties following the steps that can be found in the Denodo community here or Tableau’s official documentation

Although it is not official, this website has a comprehensive list of parameters that can be used in a TDC file

Issues when incorrectly changing the Data Source 

As mentioned in the section How to migrate a Tableau Dashboard to use Denodo as data source of this document, the recommended way to change a Dashboard’s Data Source is using the Replace Datasource Method, mapping the new data source and then use the Replace function.

Sometimes, users try to edit the original data source and then change the connection parameters to point to the new Source without following the suggested steps. Doing this may result in errors as Tableau may use the wrong connector to map the new source and also formulas may not be migrated correctly.

To do this operation safely , always use the Replace Datasource technique described in this document.

Queries that change between Tableau versions

Although Tableau and Denodo work together continuously to minimize the impact of upgrading any of the products, the way certain queries are generated by Tableau may change between the different versions. These changes should in most of the cases improve or fix the queries that Tableau is sending to Denodo although there could be certain scenarios where that may not be the case. Denodo’s recommendation is to always test your Tableau dashboards after and upgrade (of Tableau or Denodo) following your current testing procedures.

In the same way, a dashboard or report not working could have been fixed in a more modern version of Tableau (or Denodo) so it is always recommended to test on the latest Tableau/Denodo version if possible when debugging an issue.

If you find any type of issue in your dashboards or reports, please always specify the exact versions of the products that worked / stopped working to your Denodo or Tableau Support Agents when reporting a support case in order to identify the issue faster.

Additional information

Additional information about Tableau and Denodo can be found in the links provided below:

  • How to connect Tableau with Denodo can be found here and here.
  • How to optimize row fetching in Denodo can be found here.

Questions

Ask a question
You must sign in to ask a question. If you do not have an account, you can register here

Featured content

DENODO TRAINING

Ready for more? Great! We offer a comprehensive set of training courses, taught by our technical instructors in small, private groups for getting a full, in-depth guided training in the usage of the Denodo Platform. Check out our training courses.

Training