You can translate the question and the replies:

How does Denodo S3ParquetFileWrapper exactly work?

We are investigating about using Parquet files as data source for an upcoming service and see that Denodo can handle those files using the HDFS Custom wrapper that connects to S3. Since the query performance for parquet files is well known and better than any other row-based file, we would like to know the following: - How does Denodo query the files from S3? Are they first downloaded and then queries are applied or does it "push" queries to AWS S3? - Are the files first downloaded and then cached? This would be the optimal approach in case there are files in TB sizes. - Could Denodo use other services like AWS Athena or S3 Select? - How is it different to HDFSParquetFileWrapper? Is it specially more efficient for S3? Thanks in advance.
user
13-09-2022 05:30:39 -0400
code

1 Answer

Hello, When the query is executed to extract data from files on S3, the VDP server connects with the data source real-time and retrieves the data requested. Denodo does not download and store the copy of data for any purpose. The query execution does get pushed down to AWS S3 for query optimization. I refer to the document [Denodo Distributed File System Custom Wrapper](https://community.denodo.com/docs/html/document/denodoconnects/6.0/Denodo%20Distributed%20File%20System%20Custom%20Wrapper%20-%20User%20Manual), you can scroll down to section **S3ParquetFileWrapper > Query Optimization** to know in detail about the query optimizations provided for S3 parquet files. Denodo, by default, does not cache the data of the view. I enable the cache manually for each view. There are two types of mode in cache: Full cache mode and Partial cache mode. Refer the document [Cache Module](https://community.denodo.com/docs/html/browse/8.0/en/vdp/administration/cache_module/cache_module) to know more about the cache mechanism in Denodo and to configure the cache for your view. I was able to integrate Denodo with AWS Athena. Please refer to the document [Connect Denodo to Amazon Athena](https://community.denodo.com/kb/en/view/document/How%20to%20connect%20to%20Amazon%20Athena%20from%20Denodo?tag=Data+Sources) for more information. The VDP wrapper **HDFSParquetFileWrapper**, is a general custom wrapper for reading HDFS parquet files. Similarly, the VDP wrapper **S3ParquetFileWrappe**r is a custom wrapper for reading Parquet files in S3. I could see that it has the same behavior as HDFSParquetFileWrapper but it accesses S3 exclusively, and it is much easier to configure. Hope this helps!
Denodo Team
13-09-2022 19:43:37 -0400
code
You must sign in to add an answer. If you do not have an account, you can register here