USER MANUALS

Hadoop-Compatible Storage

There are several use cases that might require accessing an object storage. For example:

If the object storage is different from HDFS, S3 or ADLS Gen2 but it is compatible with the Hadoop API you can still access to it by selecting the HDFS option and specifying the right hadoop properties. For example, you can work with Azure Blob File System using WASB driver or Google Cloud Storage. The steps to use Google Cloud Storage routes are:

  1. Configure the connection according to the Google Cloud Storage type:

    1. URI syntax: gs://<bucket>/<path>/.

    2. Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with JSON keyfile service account authentication:

      Name

      Value

      google.cloud.auth.service.account.enable

      true

      google.cloud.auth.service.account.json.keyfile

      <JSON keyfile path>

      fs.gs.impl.disable.cache

      true

  2. For data sources DF/JSON/XML/Excel using HDFS paths, in the authentication section select None.

Note

For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped using a backslash (\@) to avoid confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage Data in Parquet, Delta and Iceberg format.

Add feedback