USER MANUALS

Hadoop-Compatible Storage

There are several use cases that might require accessing an object storage. For example:

If the object storage is different from HDFS or S3 but it is compatible with the Hadoop API you can still access to it by selecting the HDFS option and specifying the right hadoop properties. For example, you can work with Azure file systems. The steps to use these routes are:

  1. Configure the connection according to the Azure file system type:

    1. Azure Data Lake Gen 1

      1. URI syntax: adl://<account_name>.azuredatalakestore.net/<path>/<file_name>.

      2. Configure the Hadoop properties. They are necessary to configure the authentication. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with client keys authentication in OAuth2.0:

        Name

        Value

        fs.adl.oauth2.refresh.url

        <URL of OAuth endpoint>

        fs.adl.oauth2.credential

        <Credential value>

        fs.adl.oauth2.client.id

        <Client identifier>

        fs.adl.oauth2.access.token.provider.type

        ClientCredential

    2. Azure Data Lake Storage Gen 2 with shared key authentication

      1. URI syntax: abfs://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>.

      Note

      For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.

      1. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:

        Name

        Value

        fs.azure.account.key.<account_name>.dfs.core.windows.net

        <Access key>

        fs.azure.always.use.ssl

        false

        Note

        SSL usage can be triggered by setting the property fs.azure.always.use.ssl to true or by accessing the resource from a route like this abfss://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name> (in this alternative the property should be removed).

    3. Azure Data Lake Storage Gen 2 with OAuth2 client credentials

      1. URI syntax: abfs://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>.

      Note

      For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.

      1. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:

        Name

        Value

        fs.azure.account.auth.type

        OAuth

        fs.azure.account.oauth.provider.type

        org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider

        fs.azure.account.oauth2.client.endpoint

        https://login.microsoftonline.com/<directory (tenant) ID>/oauth2/token

        fs.azure.account.oauth2.client.id

        <Application (client) ID>

        fs.azure.account.oauth2.client.secret

        <Application (client) secret>

        Note

        SSL usage can be triggered by accessing the resource from a route like this abfss://<container>\@<account_name>.dfs.core.windows.net/<path>/<file_name>.

    4. Azure Blob File System

      1. URI syntax: wasb://<container>\@<account_name>.blob.core.windows.net/<path>/<file_name>.

      Note

      For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet Format.

      1. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:

        Name

        Value

        fs.azure.account.key.<account_name>.blob.core.windows.net

        <Access key>

        fs.azure.always.use.ssl

        false

        Note

        SSL usage can be triggered by setting the property fs.azure.always.use.ssl to true or by accessing the resource from a route like this wasbs://<container>\@<account_name>.blob.core.windows.net/<path>/<file_name> (in this alternative the property should be removed).

  2. For data sources DF/JSON/XML/Excel using HDFS paths, in the authentication section select None.

Add feedback