We have a requirement where we are reading data from three different files and doing joins among these files with different columns in the same job.
Each file size is around 25-30 GB. Our system RAM size is just 16GB. Doing joins with tmap. Talend is keeping all the reference data in physical memory. In my case, i cannot provide that much memory. Job fails due to out of memory. If i use join with temp disk option in tmap, job was dead slow while working with Talend database.
Please help me with these questions.
How Talend process the data larger than RAM size?
Pipeline parallelism is in place with talend? Am i missing anything in the code to accomplish that?
tuniq & Join operations was done in physical memory,causing the job to run dead slow. Disk option is available to handle these functionality, but it was too slow.
How performance can be improved without pushing the data to DB(ELT). Whether talend can handle huge data in millions.Need to handle this kind of data with lesser amount of RAM?
Thanks