You can translate the question and the replies:

Need Talend performance

We have a requirement where we are reading data from three different files and doing joins among these files with different columns in the same job. Each file size is around 25-30 GB. Our system RAM size is just 16GB. Doing joins with tmap. Talend is keeping all the reference data in physical memory. In my case, i cannot provide that much memory. Job fails due to out of memory. If i use join with temp disk option in tmap, job was dead slow while working with Talend database. Please help me with these questions. How Talend process the data larger than RAM size? Pipeline parallelism is in place with talend? Am i missing anything in the code to accomplish that? tuniq & Join operations was done in physical memory,causing the job to run dead slow. Disk option is available to handle these functionality, but it was too slow. How performance can be improved without pushing the data to DB(ELT). Whether talend can handle huge data in millions.Need to handle this kind of data with lesser amount of RAM? Thanks
10-08-2018 09:08:55 -0400

1 Answer

Hi, With Denodo Platform, I have joined data coming from different large data sets. When Denodo Platform gets the data from the different sources, it joins the data and then start returning data to the final consuming application. As Denodo returns the results while it combines it, you will not need to worry about memory. You may use swapping policy which Virtual DataPort provides to avoid memory overflows when dealing with huge dataset. You can have a look at the [Configuring the Memory Usage and Swapping Policy]( section of the Virtual DataPort Administration Guide for more information. Hope this helps!
Denodo Team
28-08-2018 02:29:56 -0400
You must sign in to add an answer. If you do not have an account, you can register here