USER MANUALS

Hadoop-Compatible Storage

There are several use cases that might require accessing an object storage. For example:

If the object storage is different from HDFS or S3 but it is compatible with the Hadoop API you can still access to it:

  1. Select HDFS

  2. Specify the right hadoop properties.

    • The following section describes the configuration in order to access Azure storage accounts.

    • For other file systems visit section Hadoop compatible file systems of the Hadoop documentation.

  3. For data sources DF/JSON/XML/Excel using HDFS paths, in the authentication section select None.

Azure Storage

Warning

TLS 1.0 and 1.1 support will be removed for new & existing Azure storage accounts starting Nov 2024. In addition, recent TLS 1.3 support in Azure storage accounts may cause connections to Azure storage using SSL/TLS to fail. In that case, include the following JVM parameters to specify the TLS versions Virtual DataPort should allow excluding version 1.3. For instance:

-Dhttps.protocols="TLSv1,TLSv1.1,TLSv1.2" -Djdk.tls.client.protocols="TLSv1,TLSv1.1,TLSv1.2".

Visit the official Azure documentation for more information about this issue.

For updates of Denodo 8 older than 8.0u20230914, follow these additional steps first:

  1. Download the jars hadoop-azure.jar, jetty-util and jetty-util-ajax.

  2. Import the jars in the platform:

    1. Click the menu File > Extension Management. Then, in the tab Libraries, click Import.

    2. Select the jars as resource type and click Add to add the jars.

    3. Restart Virtual DataPort.

The following steps describe how to configure the access depending on the Azure storage and the authentication method selected:

  1. Azure Data Lake Storage Gen 2 with shared key authentication

    1. URI syntax: abfs://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>.

    Note

    For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.

    1. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:

      Name

      Value

      fs.azure.account.key.<account_name>.dfs.core.windows.net

      <Access key>

      fs.azure.always.use.ssl

      false

      Note

      SSL usage can be triggered by setting the property fs.azure.always.use.ssl to true or by accessing the resource from a route like this abfss://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name> (in this alternative the property should be removed).

  2. Azure Data Lake Storage Gen 2 with OAuth2 client credentials

    1. URI syntax: abfs://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>.

    Note

    For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.

    1. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:

      Name

      Value

      fs.azure.account.auth.type

      OAuth

      fs.azure.account.oauth.provider.type

      org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider

      fs.azure.account.oauth2.client.endpoint

      https://login.microsoftonline.com/<directory (tenant) ID>/oauth2/token

      fs.azure.account.oauth2.client.id

      <Application (client) ID>

      fs.azure.account.oauth2.client.secret

      <Application (client) secret>

      Note

      SSL usage can be triggered by accessing the resource from a route like this abfss://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>.

  3. Azure Blob File System

    1. URI syntax: wasb://<container>\@<account_name>.blob.core.windows.net/<path>/<file_name>.

    Note

    For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.

    1. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:

      Name

      Value

      fs.azure.account.key.<account_name>.blob.core.windows.net

      <Access key>

      fs.azure.always.use.ssl

      false

      Note

      SSL usage can be triggered by setting the property fs.azure.always.use.ssl to true or by accessing the resource from a route like this wasbs://<container>\@<account_name>.blob.core.windows.net/<path>/<file_name> (in this alternative the property should be removed).

Add feedback