Hadoop-Compatible Storage¶
There are several use cases that might require accessing an object storage. For example:
Access data in Parquet or Delta format using the embedded MPP
Access DF/JSON/XML/Excel data sources
Configure that storage for Bulk Data Load
If the object storage is different from HDFS or S3 but it is compatible with the Hadoop API you can still access to it:
Select HDFS
Specify the right hadoop properties.
The following section describes the configuration in order to access Azure storage accounts.
For other file systems visit section Hadoop compatible file systems of the Hadoop documentation.
For data sources DF/JSON/XML/Excel using HDFS paths, in the authentication section select None.
Azure Storage¶
Warning
TLS 1.0 and 1.1 support will be removed for new & existing Azure storage accounts starting Nov 2024. In addition, recent TLS 1.3 support in Azure storage accounts may cause connections to Azure storage using SSL/TLS to fail. In that case, include the following JVM parameters to specify the TLS versions Virtual DataPort should allow excluding version 1.3. For instance:
-Dhttps.protocols="TLSv1,TLSv1.1,TLSv1.2" -Djdk.tls.client.protocols="TLSv1,TLSv1.1,TLSv1.2".
Visit the official Azure documentation for more information about this issue.
For updates of Denodo 8 older than 8.0u20230914, follow these additional steps first:
Download the jars hadoop-azure.jar, jetty-util and jetty-util-ajax.
Import the jars in the platform:
Click the menu File > Extension Management. Then, in the tab Libraries, click Import.
Select the jars as resource type and click Add to add the jars.
Restart Virtual DataPort.
The following steps describe how to configure the access depending on the Azure storage and the authentication method selected:
Azure Data Lake Storage Gen 2 with shared key authentication
URI syntax:
abfs://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>
.Note
For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.
Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:
Name
Value
fs.azure.account.key.<account_name>.dfs.core.windows.net
<Access key>
fs.azure.always.use.ssl
false
Note
SSL usage can be triggered by setting the property
fs.azure.always.use.ssl
totrue
or by accessing the resource from a route like thisabfss://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>
(in this alternative the property should be removed).Azure Data Lake Storage Gen 2 with OAuth2 client credentials
URI syntax:
abfs://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>
.Note
For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.
Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:
Name
Value
fs.azure.account.auth.type
OAuth
fs.azure.account.oauth.provider.type
org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.endpoint
https://login.microsoftonline.com/<directory (tenant) ID>/oauth2/token
fs.azure.account.oauth2.client.id
<Application (client) ID>
fs.azure.account.oauth2.client.secret
<Application (client) secret>
Note
SSL usage can be triggered by accessing the resource from a route like this
abfss://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>
.Azure Blob File System
URI syntax:
wasb://<container>\@<account_name>.blob.core.windows.net/<path>/<file_name>
.Note
For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.
Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:
Name
Value
fs.azure.account.key.<account_name>.blob.core.windows.net
<Access key>
fs.azure.always.use.ssl
false
Note
SSL usage can be triggered by setting the property
fs.azure.always.use.ssl
totrue
or by accessing the resource from a route like thiswasbs://<container>\@<account_name>.blob.core.windows.net/<path>/<file_name>
(in this alternative the property should be removed).