Hadoop-Compatible Storage¶
There are several use cases that might require accessing an object storage. For example:
Access data in Parquet or Delta format using the embedded MPP
Access DF/JSON/XML/Excel data sources
Configure that storage for Bulk Data Load
If the object storage is different from HDFS, S3 or ADLS Gen2 but it is compatible with the Hadoop API you can still access to it by selecting the HDFS option and specifying the right hadoop properties. For example, you can work with Azure Blob File System using WASB driver or Google Cloud Storage. The steps to use Google Cloud Storage routes are:
Configure the connection according to the Google Cloud Storage type:
URI syntax:
gs://<bucket>/<path>/
.Configure the following hadoop properties. In the Hadoop documentation you can check the available methods and the properties to configure them. Here is an example with JSON keyfile service account authentication:
Name
Value
google.cloud.auth.service.account.enable
true
google.cloud.auth.service.account.json.keyfile
<JSON keyfile path>
fs.gs.impl.disable.cache
true
For data sources DF/JSON/XML/Excel using HDFS paths, in the authentication section select None.
Note
For data sources DF/JSON/XML/Excel using HDFS paths, the “@” character in the URI must be escaped using a backslash (\@) to avoid confusion with an environment variable. This does not apply if the configuration is for Bulk Data Load or Object Storage Data in Parquet, Delta and Iceberg format.