How to connect to Azure Blob Storage from Denodo

Applies to: Denodo 7.0
Last modified on: 10 Dec 2019
Tags: OAuth DF data sources Azure Cloud Blob Storage

Download document

You can translate the document:

Goal

This document describes how to connect to Azure Blob Storage from Denodo Virtual DataPort.

Content

Azure Blob storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data, such as text or binary data.

Virtual DataPort can connect to Azure Blob storage in order to use it as a data source and to import information.

Prerequisites

In order to connect to Azure Blob Storage from the Denodo, the following prerequisites to be performed in the Azure Portal:

  1. Create an App and enable the necessary settings to connect from external applications via the OAuth authentication mechanism.
  2. Create a Storage account for Azure Blob Storage.

Create an App

  • To create the App, navigate to “Azure Active Directory” pane in the Azure Portal and select “App registrations”. Click on “New Registration” and provide the required details.

  • Once the App is created, it provides you the Application ID, Directory ID, and Client ID as shown in the below image:

  • In addition to this, the “Endpoints” tab of the registered App provides you with the endpoint URL with respect to the OAuth version. This URL will be used in the Data Source configuration of the Denodo Platform.

  • The “Authentication” section of the App provides you with the “Redirect URI” as shown in the below image:

  • Make sure to enable the “ID Tokens” option which allows you to use the Client Secret value that is generated next step in the OAuth configuration of the Denodo Platform.

  • Now, navigate to the “Certification & Secrets” tab to create a new Client Secret value. To do so, click on “Certification & Secrets > New client secret” and provide the Description and the Expiration Date.  

  • Then, click on “API permissions” to permit the Azure AD users to access the Azure Storage from the external applications. Once you click on “API permissions > Add permission”, choose “Azure Storage” from the list of APIs and enable the “User impersonation” option.

  • The “Grant Consent” option provides the consent to connect to Azure Storage from the external applications. To enable this option, the current user must be an administrator or the user should have Admin privileges. If you do not have Admin privileges, then you can contact your administrator to provide access.

  • The “Expose an API” section allows setting the scope for a user such as reading the contents of the blob storage. To add a scope, click on “Add a scope”. When a new scope is added, the respective client application is added to the “Authorized client applications” section.

Create a Storage account

  • After configuring the App, click on the “Storage accounts” section in the left pane. To add new storage, click on “Add” and choose the “Account Kind” as BLOB STORAGE from the dropdown list.

  • The newly created Storage account dashboard looks like the image shown below:

  • Now, create a new Container in the storage area to upload files by clicking on the “+Container” option. A container is a Folder like structure to organize the data.

  • Finally, to access the files stored in the above-created blob storage, you could use the URL which is unique for each file as shown in the below image:

Connecting to Azure Blob Storage from Denodo

After completing the above steps, you could follow the steps given under this section to connect to Azure Blob Storage from your Denodo Platform.

 

  • In the Virtual DataPort Administration Tool, select the type of data source needed depending on the type of file which you want to recover from Azure Blob storage by navigating to New > Data Source in the contextual menu. In this example, a “Delimited File” data source will be used.

  • Select the HTTP Client” option as the Data Route parameter.

  • Configure the “HTTP Client” data route to access the Azure Blob storage by clicking the “Configure” button.
  1. HTTP Method: GET.
  2. URL: Specify the URL to retrieve the files from the Azure Blob storage.
  3. Configure the “HTTP Headers”.

Note: We need to specify the storage service version number(using HTTP Header x-ms-version) to recover the files from Blob storage, if we are using the AD/Oauth2.0 to authenticate the connection, the version number should be later than 2017-09-11 (i.e, the current version 2018-03-28).

   

  • In the “Authentication” tab, choose the authentication as “OAuth 2.0” in the drop-down list.
  1. Specify the Client Identifier and Client secret generated in the “Create an App” section of Prerequisites.
  2. Launch the OAuth Credentials Wizard by clicking the link.

  • In the “OAuth 2.0 Credentials Wizard”, enter the “Authorization server URL” and  “Token endpoint URL” which is available in the “Endpoints” section of the registered App.
  • Enter a redirect URI. The recommendation is to use the default (http://localhost:9090/oauth/2.0/redirectURL.jsp).
  • Click on Generate the authorization URL, the Virtual DataPort will generate an encoded URL according to the parameters provided and displayed next to the “Open URL” link.

  • Open the URL in any browser. A new response page will be displayed.

  • Copy the generated URL from the response and paste it in the field Paste the authorization response URL. Then, click on Obtain the OAuth 2.0 credentials.

 

  • Once the OAuth 2.0 credentials have been obtained, click “Ok” to store them.
  • Then, click on “Test Connection” and if the connection is successful, click on “Save”.

  • Once the data source is created, click the “Create base view” to create a base view to introspect source metadata available through the Data Source.

  • Click “Save” to create the base view.

  • Now, the base view created on top of the Delimited file stored in Azure Blob Storage is ready for the execution and to be combined with the rest of the sources.

Connecting to Azure Blob Storage through Denodo Distributed File System Custom Wrapper

Denodo also provides you a Distributed File System Custom Wrapper through which the various formats of files stored in Azure Blob Storage can be accessed. 

What is the Denodo Distributed File System Custom Wrapper?

The Distributed File System Custom Wrapper distribution contains five Virtual DataPort custom wrappers capable of reading several file formats stored in Azure Blob Storage, HDFS, Amazon S3, Azure Data Lake Storage, Azure Data Lake Storage Gen 2 and Google Cloud Storage.

Supported formats are:

  • Delimited text files
  • Sequence files
  • Map files
  • Avro files
  • Parquet files

The Denodo Distributed File System Custom Wrapper component is available to download for Denodo support users from the Denodo Connects section of the Denodo Support Site.

Connecting to Azure Blob Storage from the Denodo Platform

As a first step, from the downloaded denodo-hdfs-customwrapper distribution, select the denodo-hdfs-

customwrapper-${version}-jar-with-dependencies.jar file and import it to Virtual DataPort. To do so, from the Virtual DataPort Administration tool:

  • Go to “File > Extension management” and create a new item selecting the jar file.

  • Next, create a new Custom data source by clicking “New > Data source > Custom”.

  • As an example, we will consider accessing a Parquet file stored in Azure Blob Storage. For this, choose the class name as “com.denodo.connect.hadoop.hdfs.wrapper.HDFSParquetFileWrapper” while configuring the Custom data source and click “Save”.

  • To access the Blob Storage using this Custom wrapper, the access credentials needs to be saved in a file named “custom core-site.xml”. Follow the below step to save the authentication properties.

Configuring authentication properties

Place the credentials in the wrapper configuration file Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <property>

     <name>fs.azure.account.key.<account>.blob.core.windows.net</name>

     <value>YOUR ACCESS KEY</value>

  </property>

</configuration>

  • Now, click on the “Create base view” and provide the necessary parameters as follows,

  • File system URI: wasb://<container>\@<account>.blob.core.windows.net or wasbs:// for SSL encrypted HTTPS access
  • Parquet file path: /<filename.parquet>
  • Custom core-site.xml file: <provide the location of custom core-site.xml file>

  • Click “Ok” and “Save” the base view.

  • Now, the base view created on top of the Parquet file stored in Azure Blob Storage is ready for the execution and can be combined with the rest of the sources.

  • Similarly, you can access other types of files available in Azure Blob Storage by using the corresponding Class Name in the Data Source Configuration.

Note:

The user who is accessing Azure Blob Storage must be a registered Azure Active Directory user.

References

Virtual DataPort Administration Guide: OAuth Authentication

Virtual DataPort Administration Guide: Delimited File Sources

Microsoft Azure Storage Documentation: Understand the OAuth 2.0 Authorization Code flow 

Microsoft Azure Storage Documentation: Register an application with the Microsoft identity platform

Azure Storage Documentation: Create a container

Denodo Distributed File System Custom Wrapper: User Manual

Questions

Ask a question
You must sign in to ask a question. If you do not have an account, you can register here

Featured content

DENODO TRAINING

Ready for more? Great! We offer a comprehensive set of training courses, taught by our technical instructors in small, private groups for getting a full, in-depth guided training in the usage of the Denodo Platform. Check out our training courses.

Training