ADLS Path (ABFS)¶
Use this type of path to obtain the data from a file or a set of files located in a container in Azure Data Lake Storage. Denodo connects Azure Data Lake Storage Gen2 (ADLS Gen2) using the Azure Blob File System (ABFS) driver.
The following sections provide details about the configuration of this path and the different authentication methods. Finally, find information about the Filters tab in Compressed or Encrypted Data Sources; filters work the same way for any type of path (local, HTTP, FTP…).
ADLS Configuration¶
In URI, enter the path you want to obtain the data from. It can point to a file or a directory and you can use interpolation variables (see section Paths and Other Values with Interpolation Variables).
The Azure Blob File System (ABFS) driver requires the following URI syntax:
abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>
Visit Microsoft documentation for details about this syntax.
In Custom properties you can set the same properties that you would put in the Hadoop configuration files like core-site.xml
to configure the ABFS Hadoop connector.
Paths Pointing to a Directory¶
When you create a base view over a data source that points to a directory, Virtual DataPort infers the schema of the new view from the first file in the directory and it assumes that all the other files have the same schema.
Only for delimited-file data sources: if the path points to a directory and you enter a value in File name pattern, the data source will only process the files whose name matches the regular expression entered
in this box. For example, if you only want to process the files with the extension log
, enter (.*)\.log
.
Note
For XML data sources, if a Validation file has been provided, all files in the directory have to match that Schema or DTD.
ADLS Authentication¶
There are three ways to configure the ADLS credentials:
Azure Shared Key: this is the simplest authentication mechanism based on account name and password. The account name is inferred from the ADLS container URI. You must provide the password (the “shared key”).
OAuth 2.0 Client Credentials: specify the Token endpoint, the Client ID, and the Client Secret. VDP will use the endpoint to get the OAuth 2.0 tokens using the client credentials you provide.
Azure Managed Identity: automatically obtain the Azure Data Lake Storage credentials from the Azure Virtual Machine where this Virtual DataPort server is running. The OAuth 2.0 tokens are issued by a special endpoint only accessible from the executing Virtual Machine (http://169.254.169.254/metadata/identity/oauth2/token). Optionally, you can also specify a Client ID, a Tenant ID, or a Token endpoint. In order to use this authentication method, the Virtual DataPort server must be running on an Azure Virtual Machine with a managed identity configured to allow access to your Azure Data Lake Storage container. You can find more information about this in the Azure documentation.
Notice if the authentication method is Oauth 2.0, the connection will always use TLS even if the URI does not specify the extra ‘s’ in abfss.