You can translate the question and the replies:

Use Denodo 8 connect HDFS Parquet file

HI team, I am using this manual https://community.denodo.com/docs/html/document/denodoconnects/7.0/Denodo%20Distributed%20File%20System%20Custom%20Wrapper%20-%20User%20Manual to let Denodo 8 to connect HDFS paruet file. I have extend "Denodo Distributed File System Custom Wrapper" trhough VDP client. And create a hdfs data source, but I got the error: There was an error while creating this base view: Error while executing custom wrapper method 'getSchemaParameters': Call From jpn00123456/10.200.128.55 to sdcert234.company.com:5181 failed on connection exception: java.net.ConnectException: Connection refused: no further information; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused what is the reason? some setting of my data source: class_name:com.denodo.connect.dfs.wrapper.DFSParquetFileWrapper Select Jar: denodo-dfs-customwrapper File system URI: hdfs://sdclxd00918.nomura.com:5181 for wrapper, I Just have the parquet file path like below: /data/mapr/store/parquet/ll/UAT/uuu/jhkyi/20210405_0/part-00000-7e3cfdc3-5cca-4441-9dd1-d3c81c2240f9-c000.snappy.parquet What is the Kerberos for? actually we need credential to access our hadoop cluster, how can we setup it? thanks
user
15-07-2021 06:01:24 -0400

3 Answers

Hi, Based on the exception you're being given, it sounds like there could be some misconfiguration when trying to connect and authenticate into your hadoop cluster. I was able to set up a connection to a parquet file by doing the following steps: - **Click File > Extension Management** - Import the denodo-dfs-customwrapper or ensure that it is there. - **Create a New > Data Source > Custom** - Select Jars > Select the denodo-dfs-customwrapper - Select Class name: com.denodo.connect.dfs.wrapper.DFSParquetFileWrapper - Refresh - **Enter the input parameters of your data source** - File system URI: in your case, hdfs://sdclxd00918.nomura.com:5181 - Set up the **core-site.xml** to authenticate into your hadoop cluster. As I understand it, you have not set this up yet. To do this, you need to go into the conf folder of the downloaded dfs custom wrapper, for example: **/denodo-dfs-customwrapper-8.0-20210503/conf/core-site.xml** - In the **core-site.xml** file, you will find templates to input your credentials for various data sources (i.e. Azure Data Lake Storage, Google Cloud, AWS S3A, etc.). You only need to uncomment the properties for your data source then input the necessary credentials that will let you authenticate into it. More information on the core-site.xml file can be [found here](https://community.denodo.com/docs/html/document/denodoconnects/8.0/en/Denodo%20Distributed%20File%20System%20Custom%20Wrapper%20-%20User%20Manual). - Save the core-site.xml, then back in the data source in VDP, set the parameter "Custom core-site.xml file" to local and declare the path to the file. - As for your question on Kerberos, you would only need to enable it if you want to authenticate to your hdfs data source with Kerberos credentials, however setting up the core-site.xml alone should be sufficient. - **Afterwards, you can try creating a base view.** - Your parquet file path looks fine, but you need to define the file name pattern too. I would use this: **(.*)part-(.*)\\.snappy\\.parquet** So in summary, try setting up the core-site.xml to authenticate into your hdfs data source, and define the parquet file name pattern as well. Hope this helps!
Denodo Team
16-07-2021 14:28:54 -0400
Hi Team, Thanks for your answers. Two questions here: 1. how can I know the port of the hdfs URI: is it an common one or case by case. 2. for core-site.xml, I saw it has the description for (Azure Data Lake Storage, Google Cloud, AWS S3A), we are just parquet file on hdfs/mapr, how we setup of the credential for it? Thanks.
user
19-07-2021 02:43:26 -0400
Hi, 1. **how can I know the port of the hdfs URI: is it an common one or case by case.** a. The port of your HDFS URI is something you would have configured, so is defined on a case-by-case basis. Based on the HDFS URI you previously shared: hdfs://sdclxd00918.nomura.com:5181, I assume the port you are using is 5181. Although, since you have been receiving the ConnectionRefused exception it is possible that this port is incorrect or is not open on the server. From a terminal in your server sdclxd00918.nomura.com I would try running "telnet localhost <port>" to see if the port is open there. Also, upon reviewing the HDFS documentation I see that the default port number for the Hadoop server service is 8020, so you may want to try that too if you have not already. 2. **for core-site.xml, I saw it has the description for (Azure Data Lake Storage, Google Cloud, AWS S3A), we are just parquet file on hdfs/mapr, how we setup of the credential for it?** a. The core-site.xml provided in the custom wrapper just provides templates for some data sources, as you have pointed out. But since you're accessing a parquet file located in your Hadoop cluster, you could instead use the core-site.xml that is provided in your Hadoop cluster. This is typically located in the "conf" directory of your Hadoop installation directory. After finding this core-site.xml, you could then copy this file to your <DENODO_HOME>/conf folder, and restart your VDP servers. Afterwards while configuring the HDFS data source, point to this copy of the core-site.xml. b. Finally, if you use Kerberos to authenticate into your Hadoop cluster, you could use those credentials when configuring the data source. Hope this helps!
Denodo Team
27-07-2021 13:23:22 -0400
You must sign in to add an answer. If you do not have an account, you can register here