USER MANUALS


Azure Data Lake Storage (ADLS) Data Sources

Azure Data Lake Storage (ADLS) data sources can be used to access Azure data lake. It is necessary to create this type of data source to export documents to the cloud from some exporters.

When creating a Azure Data Lake Storage (ADLS) data source, the following parameters must be specified:

  • Account Name. The Azure storage account where you can store your data.

  • Container. The ADLS container name.

  • Authentication. There are three ways to configure the ADLS credentials.

    • Shared key: this is the simplest authentication mechanism based on account name and password. You must provide the password (the Shared key).

    • OAuth 2.0 client credentials: specify the Token endpoint, the Client identifier, and the Client secret. Scheduler will use the endpoint to get the OAuth 2.0 tokens using the client credentials you provide.

    • Azure managed identity: automatically obtain the Azure Data Lake Storage credentials from the Azure Virtual Machine where this Scheduler server is running. The OAuth 2.0 tokens are issued by a special endpoint only accessible from the executing Virtual Machine (http://169.254.169.254/metadata/identity/oauth2/token). Optionally, you can also specify a Client identifier, a Tenant identifier, or a Token endpoint. In order to use this authentication method, the Scheduler server must be running on an Azure Virtual Machine with a managed identity configured to allow access to your Azure Data Lake Storage container. You can find more information about this in the Azure documentation.

  • Custom properties. Set the same properties that you would put in the Hadoop configuration files like core-site.xml to configure the ABFS Hadoop connector.

Add feedback