USER MANUALS

Hadoop-Compatible Storage

There are several use cases that might require accessing an object storage. For example:

If the object storage is different from HDFS, S3 or ADLS Gen2 but it is compatible with the Hadoop API you can still access to it by selecting the HDFS option and specifying the right hadoop properties. For example, you can work with Azure Blob File System or Google Cloud Storage. The steps to use these routes are:

  1. Configure the connection according to the Azure Blob File System type:

    1. URI syntax: wasb://<container>\@<account_name>.blob.core.windows.net/<path>/<file_name>.

    Note

    For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped as shown in the example to avoid the confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage data in Parquet and Delta Format.

    1. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with shared key:

      Name

      Value

      fs.azure.account.key.<account_name>.blob.core.windows.net

      <Access key>

      fs.azure.always.use.ssl

      false

      Note

      SSL usage can be triggered by setting the property fs.azure.always.use.ssl to true or by accessing the resource from a route like this wasbs://<container>\@<account_name>.blob.core.windows.net/<path>/<file_name> (in this alternative the property should be removed).

  2. Configure the connection according to the Google Cloud Storage type:

    1. URI syntax: gs://<bucket>/<path>/.

    2. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with JSON keyfile service account authentication:

      Name

      Value

      google.cloud.auth.service.account.enable

      true

      google.cloud.auth.service.account.json.keyfile

      <JSON keyfile path>

      fs.gs.impl.disable.cache

      true

  3. For data sources DF/JSON/XML/Excel using HDFS paths, in the authentication section select None.

Add feedback