USER MANUALS

Azure Data Lake Gen 2

Before deploying the Denodo Embedded MPP on Azure Kubernetes Service check Denodo Embedded MPP Azure Checklist to make sure you have everything you need.

There are three options to deploy a Denodo Embedded MPP that will access Data Lake Storage gen2 datasets:

  1. The recommended one: provide no credentials to the cluster.sh deploy command.

    cluster.sh deploy --credstore-password xxx
    

    Used when the Denodo Embedded MPP will run in Azure Kubernetes Service and will access Data Lake Storage gen2 using Azure Managed Identities.

    For this you need to add the following properties to the presto/conf/catalog/core-site.xml and hive-metastore/conf/core-site.xml, before the Embedded MPP is deployed:

    core-site.xml using Azure Managed Identities
    <property>
      <name>fs.azure.account.auth.type</name>
      <value>OAuth</value>
    </property>
    
    <property>
      <name>fs.azure.account.oauth.provider.type</name>
      <value>org.apache.hadoop.fs.azurebfs.oauth2.MsiTokenProvider</value>
    </property>
    
    <property>
      <name>fs.azure.account.oauth2.msi.tenant</name>
      <value>MSI Tenant ID</value>
    </property>
    
    <property>
      <name>fs.azure.account.oauth2.msi.endpoint</name>
      <value>http://169.254.169.254/metadata/identity/oauth2/token</value>
    </property>
    
    <property>
      <name>fs.azure.account.oauth2.client.id</name>
      <value>Client ID</value>
    </property>
    
  1. Provide no credentials to the cluster.sh deploy command.

    cluster.sh deploy --credstore-password xxx
    

    But, you have to provide the OAuth2 client credentials in the core-site.xml files: presto/conf/catalog/core-site.xml and hive-metastore/conf/core-site.xml, before the Embedded MPP is deployed:

    core-site.xml using OAuth2 client credentials
    <property>
      <name>fs.azure.account.auth.type</name>
      <value>OAuth</value>
    </property>
    
    <property>
      <name>fs.azure.account.oauth.provider.type</name>
      <value>org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider</value>
    </property>
    
    <property>
      <name>fs.azure.account.oauth2.client.endpoint</name>
      <value>https://login.microsoftonline.com/<directory_id>/oauth2/token</value>
    </property>
    
    <property>
      <name>fs.azure.account.oauth2.client.id</name>
      <value>Client ID</value>
    </property>
    
    <property>
      <name>fs.azure.account.oauth2.client.secret</name>
      <value>Secret</value>
    </property>
    
  2. Provide the Azure credentials for the Shared Key authentication method to the cluster.sh deploy command:

    cluster.sh deploy --abfs-storage-account xxx --abfs-storage-key yyy --credstore-password zzz
    
    • --abfs-storage-account: the name of the Storage Account

    • --abfs-storage-key: the access key that protects access to your Storage Account. If this access key is not specified in the command line, cluster.sh deploy will prompt for it, keeping access keys out of the bash history

Add feedback