You can translate the question and the replies:

Incremental Full Cache of Large Dataset

I have a use case to implement solution for one of my end user team to query large volume of transactional data on daily basis from production Data Ware House ( SQL Server 12 RDS). The end user wants to run their queries, but due to the complexity and volume of data they are prohibited from running the queries on Production DWH(datawarehouse) . So was exploring if I can use denodo Full Cache (Incremental) for caching the data on Redshift and use the incremental option to re-cache only the changed data on daily basis. Is this feasible? To give some background the datawarehouse holds historic fact and dimension data using Change Data Capture (CDC) functionality , so the fact table contains 2 billion rowson average 100K rows added , will the incremental cache scale for this usecase?
07-08-2019 07:22:59 -0400

3 Answers

Hi, In Denodo, it is possible to process billions of rows by moving very little data through the network and choosing the right optimization techniques. For your use case, you could use one of the below approaches: * By enabling full cache mode on the views and merge the changes into the cache database using incremental mode. To perform this, You could refer to sections [Full mode]( and [Incremental mode]( of Virtual DataPort Administration Guide. * Alternatively when dealing with huge volume of data, you could use [MPP as cache engine]( and configure any one of the Hadoop-based MPP systems like Impala, Spark, Presto as cache engine for this purpose. To configure MPP in Denodo, you could have a look at the KB article [How to configure MPP Query Acceleration in Denodo]( Hope this Helps!
Denodo Team
08-08-2019 06:11:40 -0400
Thanks, This was very useful. Exploring to go with "Incremental Mode" by caching the fact table (2 billion rows) on redshift. Since only new rows can be added to the fact table is there way to insert only newly added records to the cache on pre-defined frequency rather than refreshing the base view on regular basis?
09-08-2019 03:07:02 -0400
Hi, Yes, you can perform incremental data load to merge the cache table with the most recent data retrieved from the source, I would perform the below steps: * In Virtual DataPort Administration Tool, select your base view, click options and in Cache tab choose ‘Full’ cache mode. * Check Incremental and add condition 'last_modified' > '@LASTCACHEREFRESH'. Ensure that the base view have a field 'last_modified' that indicates when a row was last inserted/updated. This condition will select rows from the original data source which have been added or updated since the last cache refresh. Save the base view. * In Denodo Scheduler Administration tool , create a new VDPcache job to perform the cache load. To do so, you could refer to [Non Incremental Load Process]( * Choose data source, view and create a Load process as a Non-Incremental Load Process. * For parameter, **Invalidate cache before load**, choose option ‘Matching rows’. This will invalidate rows that match the rows returned by the query before a cache load. * Test and schedule this job based on your preferred time window under trigger section. For more information, you could refer to the Knowledge Base article [Incremental Queries in Denodo]( Hope this helps!
Denodo Team
16-08-2019 08:40:20 -0400
You must sign in to add an answer. If you do not have an account, you can register here