You can translate the question and the replies:

Join with Excel data source

Hi, Is there a way to optimize a join from Excel that has 1.7M registers with other one that has 25M registers. The problem here is it takes longer than 15 minutes so it times out. Any suggestions? Thank you, Best regards.
20-01-2020 08:38:14 -0500

4 Answers

Hi, Because of the nature of [Excel data sources]( there is little optimization that can be done. One of Denodo’s most powerful tools to query optimization is the delegation of certain operations (such as Joins) to the data source. However, this can only be done to sources that allow such operations. As the concept of Join does not exist in Excel sheets it is impossible for Denodo to delegate it to the source and as such the Join has to be performed in real time by the VDP server. I would take two different approaches here depending on whether these Excel sheets are more static (little to no data is added on a regular basis) or dynamic (data is being added and updated all the time). * **Data is static**: I would [configure the server to use a cache]( Then I would enable [full cache]( on both Excel views. This way Denodo will pull the data from the cache and because caches in Denodo have to be stored in a JDBC datasource Join operations can be fully delegated and [performance will be greatly improved]( * **Data is dynamic**: In this case what I would do is increase the 15 minutes query timeout to avoid triggering it. I would use the [CONTEXT clause]( ‘querytimeout’ parameter Hope this helps!
Denodo Team
21-01-2020 14:21:50 -0500
Was very helpful, thano you! Best regards
22-01-2020 04:45:21 -0500
Hi, I didn't explain it well, my bad, it is a join with a Excel datasource and a Oracle Datasource, is your response still the same? Best regards
22-01-2020 10:11:48 -0500
Hi, In those cases what I do to improve performance is make use of the [Data Movement]( feature, you can learn all about this in the documentation linked. Basically enabling this I make Denodo make a temporary copy of the smaller tables (your 1.7M rows Excel) in the data source with the biggest one (your 25M rows Oracle). This way the join can be fully delegated to the Oracle data source (improving performance) and the only the Excel table and the results from the join are moved across the network (improving performance even more). Hope this helps!
Denodo Team
29-01-2020 19:03:14 -0500
You must sign in to add an answer. If you do not have an account, you can register here