To see the latest version of the document click here
  • User Manuals »
  • Denodo Distributed File System Custom Wrapper - User Manual

Denodo Distributed File System Custom Wrapper - User Manual

Warning

Although this wrapper is capable of reading files stores in HDFS, Amazon S3, Azure Blob Storage, Azure Data Lake Storage and Google Cloud Storage, most of the technical artifacts of this wrapper include HDFS in their names for legacy compatibility:

  • Jars: denodo-hdfs-custom-wrapper-xxx
  • Wrappers:com.denodo.connect.hadoop.hdfs.wrapper.HDFSxxxWrapper

Introduction

The Distributed File System Custom Wrapper distribution contains five Virtual DataPort custom wrappers capable of reading several file formats stored in HDFS, Amazon S3, Azure Data Lake Storage, Azure Blob Storage, Azure Data Lake Storage Gen 2 and Google Cloud Storage.

Supported formats are:

  • Delimited text files
  • Sequence files
  • Map files
  • Avro files
  • Parquet files

Also, there is a custom wrapper to retrieve information from the distributed file system and display it in a relational way:

  • DFSListFilesWrapper

This wrapper allows to inspect distributed folders, retrieve lists of files (in a single folder or recursively) and filter files using any part of its metadata (file name, file size, last modification date, etc.).

Delimited Text Files

Delimited text files store plain text and each line has values separated by a delimiter, such as tab, space, comma, etc.

SequenceFiles

Sequence files are binary record-oriented files, where each record has a serialized key and a serialized value.

MapFiles

A map is a directory containing two sequence files. The data file (/data) is identical to the sequence file and contains the data stored as binary key/value pairs. The index file (/index), which contains a key/value map with seek positions inside the data file to quickly access the data.

 

Map file format

Avro Files

Avro data files are self-describing, containing the full schema for the data in the file. An Avro schema is defined using JSON. The schema allows you to define two types of data:

  • primitive data types: string, integer, long, float, double, byte, null and boolean.

  • complex type definitions: a record, an array, an enum, a map, a union or a fixed type.

Avro schema:

{

  "namespace": "example.avro",

   "type": "record",

   "name": "User",

   "fields":

    [

             {"name": "name", "type": "string"},

             {"name": "favorite_number", "type": ["int", "null"]},

             {"name": "favorite_color", "type": ["string", "null"]}

     ]

}

Parquet Files

Parquet is a column-oriented data store of the Hadoop ecosystem. It provides data compression on a per-column level and encoding schemas.

The data are described by a schema that starts with the word Message and contains a group of fields. Each field is defined by a repetition (required, optional, or repeated), a type and a name.

Parquet schema:

Message Customer {

required int32 id;

required binary firstname (UTF8);

required binary lastname (UTF8);

}

Primitives types in parquet are boolean, int32, int64, int96, float, double, binary and fixed_len_byte_array. There are no String types but there are logical types which allows interpreting binaries as a String, JSON or other types.

Complex types are defined by a group type, which adds a layer of nesting.

Usage

The Distributed File System Custom Wrapper distribution consists of:

  • /conf: A folder containing a sample core-site.xml file with properties you might need commented out.

  • /dist:

  • denodo-hdfs-customwrapper-${version}.jar. The custom wrapper.
  • denodo-hdfs-customwrapper-${version}-jar-with-dependencies.jar. The custom wrapper plus its dependencies. This is the wrapper we recommend to use, as it is easier to install in VDP.

  • /doc:  A documentation folder containing this user manual

  • /lib: All the dependencies required by this wrapper in case you need to use the denodo-hdfs-customwrapper-${version}.jar 

Importing the custom wrapper into VDP

In order to use the Distributed File System Custom Wrapper in VDP, we must configure the Admin Tool to import the extension.

From the Distributed File System Custom Wrapper distribution, we will select the denodo-hdfs-customwrapper-${version}-jar-with-dependencies.jar file and upload it to VDP.

Important

As this wrapper, (the jar-with-dependencies version), contains the Hadoop client libraries themselves, increasing the JVM's heap space for VDP Admin Tool is required to avoid a Java heap space when uploading the jar to VDP.

No other jars are required as this one will already contain all the required dependencies.

                    Distributed File System extension in VDP

Creating a Distributed File System Data Source

Once the custom wrapper jar file has been uploaded to VDP using the Admin Tool, we can create new data sources for this custom wrapper --and their corresponding base views-- as usual.

Go to New → Data Source → Custom and specify one of the possible wrappers:

  • com.denodo.connect.hadoop.hdfs.wrapper.HDFSDelimitedTextFileWrapper

  • com.denodo.connect.hadoop.hdfs.wrapper.HDFSSequenceFileWrapper

  • com.denodo.connect.hadoop.hdfs.wrapper.HDFSMapFileWrapper

  • com.denodo.connect.hadoop.hdfs.wrapper.HDFSAvroFileWrapper

  • com.denodo.connect.hadoop.hdfs.wrapper.WebHDFSFileWrapper (deprecated)

  • com.denodo.connect.hadoop.hdfs.wrapper.HDFSParquetFileWrapper

  • com.denodo.connect.hadoop.hdfs.wrapper.DFSListFilesWrapper

Also check ‘Select Jars’ and choose the jar file of the custom wrapper.

Distributed File System Data Source

Creating a Base View

Once the custom wrapper has been registered, we will be asked by VDP to create a base view for it.

HDFSDelimitedTextFileWrapper

Custom wrapper for reading delimited text files. Its base views need the following parameters:

  • File system URI: A URI whose scheme and authority identify the file system.

  • HDFS: hdfs://<ip>:<port>. 

  • Amazon S3: s3a://\@<bucket>. For configuring the credentials see Amazon S3 section.

  • Azure Data Lake Storage:

adl://<account name>.azuredatalakestore.net/

           For configuring the credentials see Azure Data Lake Storage section.

  • Azure Blob Storage:

wasb://<container>\@<account>.blob.core.windows.net

For configuring the credentials see Azure Blob Storage section.

  • Azure Data Lake Storage Gen 2:

abfs://<filesystem>\@<account>.dfs.core.windows.net

For configuring the credentials see Azure Data Lake Storage Gen 2 section.

  • Google Cloud Storage:

gs://<bucket>

For configuring the credentials see Google Storage section.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the URI contains @ you have to enter \@.

  • Path: input path for the delimited file or the directory containing the files.

  • File name pattern (optional): If you want this wrapper to only obtain data from some of the files of the directory, you can enter a regular expression that matches the names of these files, including the sequence of directories they belong to.

For example, if you want the base view to return the data of all the files that follow a pattern in their names, e.g. invoice_jan.csv, invoice_feb.csv, … set the File name pattern to (.*)invoice_(.*)\\.csv, (notice that the regular expression is escaped as explained in the note below). Files belonging to these directories were going to be processed by the wrapper:

  • /accounting/invoices/2019/invoice_jan.csv
  • /accounting/invoices/2019/invoice_feb.csv
  • ...

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the File name pattern contains \ you have to enter \\.

  • Delete after reading: Requests that the file or directory denoted by the path be deleted when the wrapper terminates.

  • Custom core-site.xml file (optional): configuration file that overrides the default core parameters.

  • Custom hdfs-site xml file (optional): configuration file that overrides the default HDFS parameters.

  • Separator (optional): delimiter between the values of a row. Default is the comma (,) and cannot be a line break (\n or \r).

            Some “invisible” characters have to be entered in a special way:

Character

Meaning

 \t

 Tab

 \f

 Formfeed

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the separator is the tab character \t you have to enter \\t.

  • Quote (optional): Character used to encapsulate values containing special characters. Default is    

quote (“).

  • Comment marker (optional): Character marking the start of a line comment. Comments are

disabled by default.

  • Escape (optional): Escape character. Escapes are disabled by default.

  • Null value (optional): String used to represent a null value. Default is: none; nulls are not distinguished from empty strings.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the null value is \N you have to enter \\N.

  • Ignore spaces: Whether spaces around values are ignored. False by default.

  • Header:  If selected, the wrapper considers that the first line contains the names of the fields in this file. These names will be the fields’ names of the base views created from this wrapper. True by default.

  • Ignore matching errors: Whether the wrapper will ignore the lines of the file that do not have the expected number of columns. True by default.

If you clear this check box, the wrapper will return an error if there is a row that does not have the expected structure. When you select this check box, you can check if the wrapper has ignored any row in a query in the execution trace, in the attribute “Number of invalid rows”.

HDFSDelimitedTextFileWrapper base view edition

View schema

The execution of the wrapper returns the values contained in the file or group of files, if the Path input parameter denotes a directory.

View results

HDFSSequenceFileWrapper

Custom wrapper for reading sequence files. Its base views need the following parameters:

  • File system URI: A URI whose scheme and authority identify the file system.

  • HDFS: hdfs://<ip>:<port>. 

  • Amazon S3: s3a://\@<bucket>. For configuring the credentials see Amazon S3 section.

  • Azure Data Lake Storage:

adl://<account name>.azuredatalakestore.net/

           For configuring the credentials see Azure Data Lake Storage section.

  • Azure Blob Storage:

wasb://<container>\@<account>.blob.core.windows.net

For configuring the credentials see Azure Blob Storage section.

  • Azure Data Lake Storage Gen 2:

abfs://<filesystem>\@<account>.dfs.core.windows.net

For configuring the credentials see Azure Data Lake Storage Gen 2 section.

  • Google Cloud Storage:

gs://<bucket>

For configuring the credentials see Google Storage section.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

 

E.g if the URI contains @ you have to enter \@.

  • Path: input path for the sequence file or the directory containing the files.

  • File name pattern (optional): If you want this wrapper to only obtain data from some of the files of the directory, you can enter a regular expression that matches the names of these files, including the sequence of directories they belong to.

For example, if you want the base view to return the data of all the files that follow a pattern in their names, e.g. file_1555297166.seq, file_1555300766.seq, … set the File name pattern to (.*)file_(.*)\\.seq, (notice that the regular expression is escaped as explained in the note below). Files belonging to these directories were going to be processed by the wrapper:

  • /result/file_1555297166.seq
  • /result/file_1555300766.seq
  • ...

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the File name pattern contains \ you have to enter \\.

  • Delete after reading: Requests that the file or directory denoted by the path be deleted when the wrapper terminates.

  • Custom core-site.xml file (optional): configuration file that overrides the default core parameters.

  • Custom hdfs-site xml file (optional): configuration file that overrides the default HDFS parameters.

  • Key class: key class name implementing org.apache.hadoop.io.Writable interface.

  • Value class: value class name implementing org.apache.hadoop.io.Writable interface.

HDFSSequenceFileWrapper base view edition

View schema

The execution of the wrapper returns the key/value pairs contained in the file or group of files, if the Path input parameter denotes a directory.

View results

HDFSMapFileWrapper

Custom wrapper for reading map files. Its base views need the following parameters:

  • File system URI: A URI whose scheme and authority identify the file system.

  • HDFS: hdfs://<ip>:<port>. 

  • Amazon S3: s3a://\@<bucket>. For configuring the credentials see Amazon S3 section.

  • Azure Data Lake Storage:

adl://<account name>.azuredatalakestore.net/

           For configuring the credentials see Azure Data Lake Storage section.

  • Azure Blob Storage:

wasb://<container>\@<account>.blob.core.windows.net

For configuring the credentials see Azure Blob Storage section.

  • Azure Data Lake Storage Gen 2:

abfs://<filesystem>\@<account>.dfs.core.windows.net

For configuring the credentials see Azure Data Lake Storage Gen 2 section.

  • Google Cloud Storage:

gs://<bucket>

For configuring the credentials see Google Storage section.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the URI contains @ you have to enter \@.

  • Path: input path for the directory containing the map file. Also the path to the index or data file could be specified.  When using Amazon S3, a flat file system where there is no folder concept, the path to the index or data should be used.

  • File name pattern (optional): If you want this wrapper to only obtain data from some of the files of the directory, you can enter a regular expression that matches the names of these files, including the sequence of directories they belong to.

For example, if you want the base view to return the data of all the files that follow a pattern in their names, e.g. invoice_jan.whatever, invoice_feb.whatever, … set the File name pattern to (.*)invoice_(.*)\\.whatever, (notice that the regular expression is escaped as explained in the note below). Files belonging to these directories were going to be processed by the wrapper:

  • /accounting/invoices/2019/invoice_jan.whatever
  • /accounting/invoices/2019/invoice_feb.whatever
  • ...

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the File name pattern contains \ you have to enter \\.

  • Delete after reading: Requests that the file or directory denoted by the path be deleted when the wrapper terminates.

  • Custom core-site.xml file (optional): configuration file that overrides the default core parameters.

  • Custom hdfs-site xml file (optional): configuration file that overrides the default HDFS parameters.

  • Key class: key class name implementing the

org.apache.hadoop.io.WritableComparable interface. WritableComparable is used because records are sorted in key order.

  • Value class: value class name implementing the

org.apache.hadoop.io.Writable interface.

HDFSMapFileWrapper base view edition

View schema

The execution of the wrapper returns the key/value pairs contained in the file or group of files, if the Path input parameter denotes a directory.

View results

HDFSAvroFileWrapper

Custom wrapper for reading Avro files.

Important

We recommend not to use the HDFSAvroFileWrapper to directly access Avro files, as this is an internal serialization system mainly meant for use by applications running on the Hadoop cluster. Instead, we recommend to use an abstraction layer on top of those files like e.g. Hive, Impala, Spark...

Its base views need the following parameters:

  • File system URI: A URI whose scheme and authority identify the file system.

  • HDFS: hdfs://<ip>:<port>. 

  • Amazon S3: s3a://\@<bucket>. For configuring the credentials see Amazon S3 section.

  • Azure Data Lake Storage:

adl://<account name>.azuredatalakestore.net/

           For configuring the credentials see Azure Data Lake Storage section.

  • Azure Blob Storage:

wasb://<container>\@<account>.blob.core.windows.net

For configuring the credentials see Azure Blob Storage section.

  • Azure Data Lake Storage Gen 2:

abfs://<filesystem>\@<account>.dfs.core.windows.net

For configuring the credentials see Azure Data Lake Storage Gen 2 section.

  • Google Cloud Storage:

gs://<bucket>

For configuring the credentials see Google Storage section.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the URI contains @ you have to enter \@.

  • File name pattern (optional): If you want this wrapper to only obtain data from some of the files of the directory, you can enter a regular expression that matches the names of these files, including the sequence of directories they belong to.

For example, if you want the base view to return the data of all the files that follow a pattern in their names, e.g. employees_jan.avro, employees_feb.avro, … set the File name pattern to (.*)employees_(.*)\\.avro, (notice that the regular expression is escaped as explained in the note below). Files belonging to these directories were going to be processed by the wrapper:

  • /hr/2019/employees_jan.avro
  • /hr/2019/employees_feb.avro
  • ...

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the File name pattern contains \ you have to enter \\.

  • Delete after reading: Requests that the file denoted by the path be deleted when the wrapper terminates.

  • Custom core-site.xml file (optional): configuration file that overrides the default core parameters.

  • Custom hdfs-site xml file (optional): configuration file that overrides the default HDFS parameters.

There are also two parameters that are mutually exclusive:

  • Avro schema path: input path for the Avro schema file or

  • Avro schema JSON: JSON of the Avro schema.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, } in the Avro schema JSON parameter,  you have to escape these characters with \. For example:

\{

  "type": "map",

        "values": \{

        "type": "record",

        "name": "ATM",

        "fields": [

                 \{ "name": "serial_no", "type": "string" \},

                 \{ "name": "location",    "type": "string" \}

        ]

        \}

\}

HDFSAvroFileWrapper base view edition

Content of the /user/cloudera/schema.avsc file:

{"type" : "record",

  "name" : "Doc",

  "doc" : "adoc",

  "fields" : [ {

    "name" : "id",

    "type" : "string"

  }, {

    "name" : "user_friends_count",

    "type" : [ "int", "null" ]

  }, {

    "name" : "user_location",

    "type" : [ "string", "null" ]

  }, {

    "name" : "user_description",

    "type" : [ "string", "null" ]

  }, {

    "name" : "user_statuses_count",

    "type" : [ "int", "null" ]

  }, {

    "name" : "user_followers_count",

    "type" : [ "int", "null" ]

  }, {

    "name" : "user_name",

    "type" : [ "string", "null" ]

  }, {

    "name" : "user_screen_name",

    "type" : [ "string", "null" ]

  }, {

    "name" : "created_at",

    "type" : [ "string", "null" ]

  }, {

    "name" : "text",

    "type" : [ "string", "null" ]

  }, {

    "name" : "retweet_count",

    "type" : [ "int", "null" ]

  }, {

    "name" : "retweeted",

    "type" : [ "boolean", "null" ]

  }, {

    "name" : "in_reply_to_user_id",

    "type" : [ "long", "null" ]

  }, {

    "name" : "source",

    "type" : [ "string", "null" ]

  }, {

    "name" : "in_reply_to_status_id",

    "type" : [ "long", "null" ]

  }, {

    "name" : "media_url_https",

    "type" : [ "string", "null" ]

  }, {

    "name" : "expanded_url",

    "type" : [ "string", "null" ]

  } ] }                        

View schema

The execution of the view returns the values contained in the Avro file specified in

the WHERE clause of the VQL sentence:

 SELECT * FROM avro_ds_file

 WHERE avrofilepath = '/user/cloudera/file.avro'

                                                                                                   

View results

After applying a flattening operation results are as follows.

    

Flattened results

Field Projection

The recommended way for dealing with projections in HDFSAvroFileWrapper is by means of the JSON schema parameters:

  • Avro schema path or
  • Avro schema JSON

By giving to the wrapper a JSON schema containing exclusively the fields we are interested in, the reader used by the HDFSAvroFileWrapper will return to VDP only these fields, making the select operation faster.

If we configure the parameter Avro schema JSON with only some of the fields of the /user/cloudera/schema.avsc file used in the previous example, like in the example below (notice the escaped characters):

Schema with the selected fields:

\{

  "type" : "record",

  "name" : "Doc",

  "doc" : "adoc",

  "fields" : [ \{

    "name" : "id",

    "type" : "string"

  \}, \{

    "name" : "user_friends_count",

    "type" : [ "int", "null" ]

  \}, \{

    "name" : "user_location",

    "type" : [ "string", "null" ]

  \}, \{

    "name" : "user_followers_count",

    "type" : [ "int", "null" ]

  \}, \{

    "name" : "user_name",

    "type" : [ "string", "null" ]

  \}, \{

    "name" : "created_at",

    "type" : [ "string", "null" ]

  \} ]

\}

the base view in VDP will contain a subset of the previous base view of the example: the ones matching the new JSON schema provided to the wrapper.

Base view with the selected fields

View results with the selected fields

WebHDFSFileWrapper

Warning

WebHDFSFileWrapper is deprecated.

  • For XML, JSON and Delimited files the best alternative is using the VDP standard data sources, using the HTTP Client in its Data route parameter. These data sources offers a better solution for HTTP/HTTPs access as they include proxy access, SPNEGO authentication, OAuth2 etc.

  • For Avro, Sequence, Map and Parquet files the best alternative is using the specific custom wrapper type:

HDFSAvroFileWrapper, HDFSSequenceFileWrapper, HDFSMapFileWrapper or HDFSParquetFileWrapper with webhdfs scheme in their File system URI parameter. And placing their credentials in the xml configuration files.

Custom wrapper for reading delimited text files using the WebHDFS.

About WebHDFS

WebHDFS provides HTTP REST access to HDFS. It supports all HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming.

The advantage of WebHDFS are:

  • Version-independent REST-based protocol which means that can be read and written to/from Hadoop clusters no matter their version.

  • Read and write data in a cluster behind a firewall.  A proxy WebHDFS (for example: HttpFS) could be used, it acts as a gateway and is the only system that is allowed to send and receive data through the firewall.

The only difference between using or not the proxy will be in the host:port pair where the HTTP requests are issued:

  • Default port for WebHDFS is 50070.

  • Default port for HttpFS is 14000.

Custom wrapper

The base views created from the WebHDFSFileWrapper need the following parameters:

  • Host IP: IP or <bucket>.s3.amazonaws.com for Amazon S3.

  • Host port: HTTP port. Default port for WebHDFS is 50070. For HttpFS is 14000. For Amazon S3 is 80.

  • User: The name of the authenticated user when security is off. If is not set, the server may either set the authenticated user to a default web user, if there is any, or return an error response.

           When using Amazon S3 <id>:<secret> should be indicated.

  • Path: input path for the delimited file.

  • Separator: delimiter between values. Default is the comma.

  • Quote: Character used to encapsulate values containing special characters. Default is    

quote.

  • Comment marker: Character marking the start of a line comment. Comments are

disable by default.

  • Escape: Escape character. Escapes are disabled by default.

  • Null value (optional): String used to represent a null value. Default is: none; nulls are not distinguished from empty strings.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the null value is \N you have to enter \\N.

  • Ignore spaces: Whether spaces around values are ignored. False by default.

  • Header: Whether the file has a header or not. True by default.

  • Delete after reading: Requests that the file or directory denoted by the path be deleted when the wrapper terminates.

WebHDFSFileWrapper base view edition

View schema

The execution of the wrapper returns the values contained in the file.

View results

HDFSParquetFileWrapper

Custom wrapper for reading Parquet files.

Important

We recommend not to use the HDFSParquetFileWrapper to directly access Parquet files, as this is an internal columnar data representation mainly meant for use by applications running on the Hadoop cluster. Instead, we recommend to use an abstraction layer on top of those files like e.g. Hive, Impala, Spark...

Its base views need the following parameters:

  • File system URI: A URI whose scheme and authority identify the file system.
  • HDFS: hdfs://<ip>:<port>. 

  • Amazon S3: s3a://\@<bucket>. For configuring the credentials see Amazon S3 section.

  • Azure Data Lake Storage:

adl://<account name>.azuredatalakestore.net/

           For configuring the credentials see Azure Data Lake Storage section.

  • Azure Blob Storage:

wasb://<container>\@<account>.blob.core.windows.net

For configuring the credentials see Azure Blob Storage section.

  • Azure Data Lake Storage Gen 2:

abfs://<filesystem>\@<account>.dfs.core.windows.net

For configuring the credentials see Azure Data Lake Storage Gen 2 section.

  • Google Cloud Storage:

gs://<bucket>

For configuring the credentials see Google Storage section.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the URI contains @ you have to enter \@.

  • Parquet File Path: path of the file that we want to read.

  • File name pattern (optional): If you want this wrapper to only obtain data from some of the files of the directory, you can enter a regular expression that matches the names of these files, including the sequence of directories they belong to.

For example, if you want the base view to return the data of all the files that follow a pattern in their names, e.g. flights_jan.parquet, flights_feb.parquet, … set the File name pattern to (.*)flights_(.*)\\.parquet, (notice that the regular expression is escaped as explained in the note below). Files belonging to these directories were going to be processed by the wrapper:

  • /airport/LAX/2019/flights_jan.parquet
  • /airport/LAX/2019/flights_feb.parquet
  • ...

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the File name pattern contains \ you have to enter \\.

  • Custom core-site.xml file (optional): configuration file that overrides the default core parameters.

  • Custom hdfs-site xml file (optional): configuration file that overrides the default HDFS parameters.

HDFSParquetWrapper base view edition

View schema

The execution of the wrapper returns the values contained in the file.

View results

DFSListFilesWrapper

Custom wrapper to retrieve file information from a distributed file system.

Its base views need the following parameters:

  • File system URI: A URI whose scheme and authority identify the file system.

  • HDFS: hdfs://<ip>:<port>. 

  • Amazon S3: s3a://\@<bucket>. For configuring the credentials see Amazon S3 section.

  • Azure Data Lake Storage:

adl://<account name>.azuredatalakestore.net/

           For configuring the credentials see Azure Data Lake Storage section.

  • Azure Blob Storage:

wasb://<container>\@<account>.blob.core.windows.net

For configuring the credentials see Azure Blob Storage section.

  • Google Cloud Storage:

gs://<bucket>

For configuring the credentials see Google Cloud Storage section.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the URI contains @ you have to enter \@.

  • Custom hdfs-site xml file (optional): configuration file that overrides the default HDFS parameters.

DFSDListFilesFilesWrapper base view edition

The entry point for querying the wrapper is the parameter parentfolder. The wrapper will list the files that are located in this supplied directory. It is possible to do this in a recursive way, retrieving also the contents of the subfolders, setting the parameter recursive to true.

Execution panel

The schema of the custom wrapper contains the following columns:

  • parentfolder: the path of the parent directory. The wrapper will list all the files located in this directory

  • This parameter is mandatory in a SELECT operation

  • relativepath: the location of the file respect of the parentfolder. Useful when executing a recursive query.

  • filename: the name of the file or folder, including the extension for files.

  • extension: the extension of the file. It will be null if the file is a directory.

  • fullpath: the full path of the file with the scheme information.

  • pathwithouscheme: the full path of the file without the scheme information.

  • filetype: either ‘file’ or ‘directory’.

  • encrypted:  true if the file is encrypted, false otherwise.

  • datemodified: the modification time of the file in milliseconds since January 1, 1970 UTC.

  • owner: the owner of the file.

  • group: the group associated with the file.

  • permissions: the permissions of the file, using the symbolic notation (rwxr-xr-x).

  • size: the size of the file in bytes. It will be null for folders.

  • recursive: if false the search for files will be limited to the files that are direct children of the parentfolder. If true, the search will be done recursively, including subfolders of parentfolder.

  • This parameter is mandatory in a SELECT operation

View schema

The following VQL sentence returns the files in the ‘/user/cloudera’ hdfs directory, recursively:

SELECT * FROM listing_dfs

WHERE parentfolder = '/user/cloudera' AND recursive = true

View results

We can filter our query a bit more and retrieve only those files that were modified after '2018-09-01':

SELECT * FROM listing_dfs

WHERE parentfolder = '/user/cloudera' AND recursive = true

AND datemodified > DATE '2018-09-01'

View results

Extending capabilities with the DFSListFilesWrapper

The wrappers of this distribution that reads file formats like Parquet, Avro, Delimited Files, Sequence or Map, can increase their capabilities when combined with the DFSListFilesWrapper.

As all of these wrappers need an input path for the file or the directory that is going to be read, we can use the DFSListFilesWrapper for retrieving the file paths that we are interested in, according to some attribute value of their metadata, e.g. modification time.

For example, suppose that we want to retrieve the files in the /user/cloudera/df/awards directory that were modified in November.

The following steps explain how to configure this scenario:

  1. Create a DFSListFilesWrapper base view that will list the files of the /user/cloudera/df/awards directory.

  1. Create an HDFSDelimitedTextFileWrapper base view that will read the content of the csv files.

Parameterize the Path of the base view by adding an interpolation variable to its value, e.g. @path, (@ is the prefix that identifies a value parameter as an interpolation variable). 

By using the variable @path, you do not have to provide the final path value when creating the base view. Instead, the values of the Path parameter will be provided at runtime by the DFSListFilesWrapper view through the join operation (configured in the next step).

  1. Create a derived view joining the two previously created views. The join condition will be:

DFSListFilesWrapper.pathwithoutscheme = HDFSDelimitedTextFileWrapper.path

  1. By executing the join view with these conditions:

SELECT * FROM join:view

WHERE recursive = true

      AND parentfolder = '/user/cloudera/df/awards'

      AND datemodified > DATE '2018-11-1'

we obtain data only from the delimited files that were modified in November.

Amazon S3

The Distributed File System Custom Wrapper can access data stored in Amazon S3 with the following Hadoop FileSystem clients:

  • S3.

It is deprecated and it is not supported by the new version of this custom wrapper, version 7.0, as it was deleted from Hadoop 3.x versions.

  • S3N.

Use S3A instead, as S3A client can read all files created by S3N.

S3N is not supported by the new version of this custom wrapper, version 7.0, as it was deleted from Hadoop 3.x versions.

  • S3A. 

S3A client can read all files created by S3N. It should be used wherever possible.

Configuring S3 authentication properties

Place the credentials in the wrapper configuration file Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.s3.awsAccessKeyId</name>

  <description>AWS access key ID</description>

  <value>YOUR ACCESS KEY ID</value>

</property>

<property>

  <name>fs.s3.awsSecretAccessKey</name>

  <description>AWS secret key</description>

  <value>YOUR SECRET ACCESS KEY</value>

</property>

</configuration>

Configuring S3N authentication properties

Place the credentials in the wrapper configuration file Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.s3n.awsAccessKeyId</name>

  <description>AWS access key ID</description>

  <value>YOUR ACCESS KEY ID</value>

</property>

<property>

  <name>fs.s3n.awsSecretAccessKey</name>

  <description>AWS secret key</description>

  <value>YOUR SECRET ACCESS KEY</value>

</property>

</configuration>

Configuring S3A authentication properties

S3A supports several authentication mechanisms. By default the custom wrapper will search for credentials in the following order:

  1. In the Hadoop configuration files.

For using this authentication method, declare the credentials in the wrapper configuration file Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.s3a.access.key</name>

  <description>AWS access key ID.</description>

  <value>YOUR ACCESS KEY ID</value>

</property>

<property>

  <name>fs.s3a.secret.key</name>

  <description>AWS secret key.</description>

  <value>YOUR SECRET ACCESS KEY</value>

</property>

</configuration>

  1. Then, the environment variables named AWS_ACCESS_KEY_ID and

AWS_SECRET_ACCESS_KEY are looked for.

  1. Otherwise, an attempt is made to query the Amazon EC2 Instance Metadata Service to retrieve credentials published to EC2 VMs. This mechanism is available only when running your application on an Amazon EC2 instance, but provides the greatest ease of use and best security when working with Amazon EC2 instances.

Using IAM Assumed Roles

To use assumed roles, the wrapper must be configured to use the Assumed Role Credential Provider, org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider, in

the configuration option fs.s3a.aws.credentials.provider in the wrapper configuration file Custom core-site.xml.

This Assumed Role Credential provider will read in the fs.s3a.assumed.role.* options needed to connect to the Session Token Service Assumed Role API:

  1. First authenticating with the full credentials. This means the normal

fs.s3a.access.key and fs.s3a.secret.key pair, environment variables, or some other supplier of long-lived secrets.

        

If you wish to use a different authentication mechanism, other than

org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider, set it in the property fs.s3a.assumed.role.credentials.provider.

  1. Then assuming the specific role specified in fs.s3a.assumed.role.arn

  1. It will then refresh this login at the configured rate in fs.s3a.assumed.role.session.duration.

Below you can see the properties required for configuring IAM Assumed Roles in this custom wrapper, using its configuration file, Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.s3a.aws.credentials.provider</name>

  <value>org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider</value>

  <value>org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider</value>

</property>

<property>

  <name>fs.s3a.assumed.role.arn</name>

  <description>

    AWS ARN for the role to be assumed. Required if the      

    fs.s3a.aws.credentials.provider contains

    org.apache.hadoop.fs.s3a.AssumedRoleCredentialProvider

  </description>

  <value>YOUR AWS ROLE</value>

</property>

<property>

  <name>fs.s3a.assumed.role.credentials.provider</name>

  <description>

    List of credential providers to authenticate with the

    STS endpoint and retrieve short-lived role credentials.

    Only used if AssumedRoleCredentialProvider is the AWS credential  

    Provider. If unset, uses  

    "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider".

  </description>

  <value>org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider</value>

</property>

<property>

  <name>fs.s3a.assumed.role.session.duration</name>

  <value>30m</value>

  <description>

    Duration of assumed roles before a refresh is attempted.

    Only used if AssumedRoleCredentialProvider is the AWS credential

    Provider.

    Range: 15m to 1h

  </description>

</property>

<property>

  <name>fs.s3a.access.key</name>

  <description>AWS access key ID.</description>

  <value>YOUR ACCESS KEY ID</value>

</property>

<property>

  <name>fs.s3a.secret.key</name>

  <description>AWS secret key.</description>

  <value>YOUR SECRET ACCESS KEY</value>

</property>

</configuration>

Signature Version 4 support

When the V4 signing protocol is used, AWS requires the explicit region endpoint to be used —hence S3A must be configured to use the specific endpoint. This is done in the configuration option fs.s3a.endpoint in the Custom core-site.xml of the wrapper. You can use the core-site.xml, located in the conf folder of the distribution, as a guide. Otherwise a Bad Request exception could be thrown.

As an example of configuration, the endpoint for S3 Frankfurt is

S3.eu-central-1.amazonaws.com:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

  <name>fs.s3a.endpoint</name>

  <description>AWS S3 endpoint to connect to. An up-to-date list is
   provided in the AWS Documentation: regions and endpoints. Without

  this property, the standard region (s3.amazonaws.com) is assumed.
 </
description>

  <value>s3.eu-central-1.amazonaws.com</value>

</property>

</configuration>

You can find the full list of supported versions for AWS Regions in their website: Amazon Simple Storage Service (Amazon S3).

Azure Data Lake Storage

The Distributed File System Custom Wrapper can access data stored in Azure Data Lake Storage.

Configuring authentication properties

Place the credentials in the wrapper configuration file Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>

     <name>fs.adl.oauth2.access.token.provider.type</name>

     <value>ClientCredential</value>

 </property>

 <property>

     <name>fs.adl.oauth2.refresh.url</name>

     <value>YOUR TOKEN ENDPOINT</value>

 </property>

 <property>

     <name>fs.adl.oauth2.client.id</name>

     <value>YOUR CLIENT ID</value>

 </property>

 <property>

     <name>fs.adl.oauth2.credential</name>

     <value>YOUR CLIENT SECRET</value>

 </property>

 </configuration>

Azure Blob Storage

The Distributed File System Custom Wrapper can access data stored in Azure Blob Storage.

Configuring authentication properties

Place the credentials in the wrapper configuration file Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <property>

     <name>fs.azure.account.key.<account>.blob.core.windows.net</name>

     <value>YOUR ACCESS KEY</value>

  </property>

</configuration>

Azure Data Lake Storage Gen 2

Since the Distributed File System Custom Wrapper for Denodo 7.0, (as this functionality requires Java 8), this wrapper can access data stored in Azure Data Lake Storage Gen 2.

Configuring authentication properties

Place the credentials in the wrapper configuration file Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <property>

     <name>fs.azure.account.key.<account>.dfs.core.windows.net</name>

     <value>YOUR ACCOUNT KEY</value>

  </property>

</configuration>

Google Cloud Storage

Since the Distributed File System Custom Wrapper for Denodo 7.0, (as this functionality requires Java 8), this wrapper can access data stored in Google Cloud Storage.

Configuring authentication properties

Place the credentials in the wrapper configuration file Custom core-site.xml. You can use the core-site.xml, located in the conf folder of the distribution, as a guide.

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

   

 <property>

   <name>google.cloud.auth.service.account.enable</name>

   <value>true</value>

   <description>Whether to use a service account for GCS authorization.      

   If an email and keyfile are provided then that service account

   will be used. Otherwise the connector will look to see if it running

   On a GCE VM with some level of GCS access in its service account  

   scope, and use that service account.</description>

 </property>

 <property>

   <name>google.cloud.auth.service.account.json.keyfile</name>

   <value>/PATH/TO/KEYFILE</value>

   <description>The JSON key file of the service account used for GCS

   access when google.cloud.auth.service.account.enable is  

   true.</description>

 </property>

</configuration>

Permissions

Wrappers that read files content from Google Cloud Storage, like HDFSDelimitedTextFileWrapper, HDFSAvroFileWrapper, etc. requires accessing with a member with storage.objects.get permissions.

The DFSListFilesWrapper, as it lists files from buckets, requires accessing with a member with

storage.buckets.get permissions.

For more information on roles and permissions see https://cloud.google.com/storage/docs/access-control/iam-roles.

Compressed Files

The Distributed File System Custom Wrapper transparently read compressed files in any of these compression formats:

  • gzip
  • DEFLATE (zlib)
  • bzip2
  • snappy
  • LZO
  • LZ4
  • Zstandard

Secure cluster with Kerberos

The configuration required for accessing a Hadoop cluster with Kerberos enabled is the same as the one needed to access to the distributed file system and, additionally, the user must supply the Kerberos credentials.

The Kerberos parameters are:

  • Kerberos enabled: Check it when accessing a Hadoop cluster with Kerberos enabled.

  • Kerberos principal name (optional): Kerberos v5 Principal name, e.g. primary/instance\@realm.

! Note

If you enter a literal that contains one of the special characters used to indicate interpolation variables @, \, ^, {, },  you have to escape these characters with \.

E.g if the Kerberos principal name contains @ you have to enter \@.

  • Kerberos keytab file (optional): Keytab file containing the key of the Kerberos principal.

  • Kerberos password (optional): Password associated with the principal.

  • Kerberos Distribution Center (optional): Kerberos Key Distribution Center.

The Distributed File System Custom Wrapper provides three ways for accessing a kerberized Hadoop cluster:

  1. The client has a valid Kerberos ticket in the ticket cache obtained, for example, using the kinit command in the Kerberos Client.

In this case only the Kerberos enabled parameter should be checked. The  wrapper would use the Kerberos ticket to authenticate itself against the Hadoop cluster.

  1. The client does not have a valid Kerberos ticket in the ticket cache. In this case you should provide the Kerberos principal name parameter and

  1. Kerberos keytab file parameter or
  2. Kerberos password parameter.

In all these three scenarios the krb5.conf file should be present in the file system. Below there is an example of the Kerberos configuration file:

[libdefaults]

  renew_lifetime = 7d

  forwardable = true

  default_realm = EXAMPLE.COM

  ticket_lifetime = 24h

  dns_lookup_realm = false

  dns_lookup_kdc = false

[domain_realm]

  sandbox.hortonworks.com = EXAMPLE.COM

  cloudera = CLOUDERA

[realms]

  EXAMPLE.COM = {

    admin_server = sandbox.hortonworks.com

    kdc = sandbox.hortonworks.com

  }

 CLOUDERA = {

  kdc = quickstart.cloudera

  admin_server = quickstart.cloudera

  max_renewable_life = 7d 0h 0m 0s

  default_principal_flags = +renewable

 }

[logging]

  default = FILE:/var/log/krb5kdc.log

  admin_server = FILE:/var/log/kadmind.log

  kdc = FILE:/var/log/krb5kdc.log

The algorithm to locate the krb5.conf file is the following:

  • If the system property java.security.krb5.conf is set, its value is assumed to specify the path and file name.

  • If that system property value is not set, then the configuration file is looked for in the directory

  • <java-home>\lib\security (Windows)
  • <java-home>/lib/security (Solaris and Linux)
  • If the file is still not found, then an attempt is made to locate it as follows:

  • /etc/krb5/krb5.conf (Solaris)
  • c:\winnt\krb5.ini (Windows)
  • /etc/krb5.conf (Linux)

There is an exception. If you are planning to create VDP views that use the same Key Distribution Center and the same realm the Kerberos Distribution Center parameter can be provided instead of having the krb5.conf file in the file system.

View edition

Troubleshooting

Symptom

Error message: “SIMPLE authentication is not enabled.  Available:[TOKEN, KERBEROS]”.

Resolution

You are trying to connect to a Kerberos-enabled Hadoop cluster. You should configure the custom wrapper accordingly. See Secure cluster with Kerberos section for configuring Kerberos on this custom wrapper.

Symptom

Error message: “Cannot get Kerberos service ticket: KrbException: Server not found in Kerberos database (7) ”.

Resolution

Check that nslookup is returning the fully qualified hostname of the KDC. If not, modify the /etc/hosts of the client machine for the KDC entry to be of the form "IP address fully.qualified.hostname alias".

Symptom

Error message: “Invalid hostname in URI s3n://<id>:<secret>@<bucket>”.

Resolution

This method of placing credentials in the URL is discouraged. Configure the credentials on the core-site.xml instead (see Amazon S3 support section).

Symptom

Error message: "Error accessing Parquet file: Could not read footer: java.io.IOException: Could not read footer for file FileStatus{path= hdfs://serverhdfs/apps/hive/warehouse/parquet/.hive-staging_hive_2017-03-06_08-/-ext-10000; isDirectory=true; modification_time=1488790684826; access_time=0; owner=hive; group=hdfs; permission=rwxr-xr-x; isSymlink=false}"

Resolution

Hive could store metadata into a parquet file folder. You can check in the error message, if the custom wrapper is trying to access to any metadata. In the error of the example you can see that it is accessing a folder called .hive-staging*. The solution is to configure Hive to store metadata in other location.

Symptom

Error message: “Could not initialize class org.xerial.snappy.Snappy”

Resolution

On Linux platforms, an error may occur when Snappy compression/decompression is enabled although its native library is available from the classpath.

The native library snappy-<version>-libsnappyjava.so for Snappy compression is included in the snappy-java-<version>.jar file. When the JVM initializes the JAR, the library is added to the default temp directory. If the default temp directory is mounted with noexec option, it results in the above exception.

One solution is to specify a different temp directory, that has already been mounted without the noexec option, as follows:

 -Dorg.xerial.snappy.tempdir=/path/to/newtmp

Appendices

How to use the Hadoop vendor’s client libraries

In some cases, it is advisable to use the libraries of the Hadoop vendor you are connecting to (Cloudera, Hortonworks, …), instead of the Apache Hadoop libraries distributed in this custom wrapper.

In order to use the Hadoop vendor libraries there is no need to import the Distributed File System Custom Wrapper as an extension as it is explained in the Importing the custom wrapper into VDP section.

You have to create the custom data sources using the Classpath parameter instead of the ‘Select Jars option.

Click Browse to select the directory containing the required dependencies for this custom wrapper, that is:

  • The denodo-hdfs-customwrapper-${version}.jar file of the dist directory of this custom wrapper distribution (highlighted in orange in the image below).

  • The contents of the lib directory of  this custom wrapper distribution, replacing the Apache Hadoop libraries with the vendor specific ones (highlighted in blue in the image below, the suffix indicating that they are Cloudera jars).

        

        Here you can find the libraries for Cloudera and Hortonworks Hadoop distributions:

  • Hortonworks repository:

http://repo.hortonworks.com/content/repositories/releases/org/apache/hadoop/

C:\Work\denodo-hdfs-libs directory

Distributed File System Data Source

! Note

When clicking Browse, you will browse the file system of the host where the Server is running and not where the Administration Tool is running.

How to connect to MapR XD (MapR-FS)

From MapR documentation: “MapR XD Distributed File and Object Store manages both structured and unstructured data. It is designed to store data at exabyte scale, support trillions of files, and combine analytics and operations into a single platform.

As MapR XD supports HDFS-compatible API, you can use the DFS Custom Wrapper to connect to MapR FileSystem. This section explains how to do that.

Install MapR Client

To connect to the MapR cluster you need to install the MapR Client on your client machine (where the VDP server is running):

 

  • Verify that the operating system on the machine where you plan to install the MapR Client is supported, see MapR Client Support Matrix.

Set $MAPR_HOME environment variable to the directory where MapR client was installed. If MAPR_HOME environment variable is not defined /opt/mapr is the default path.

Copy mapr-clusters.conf file

Copy mapr-clusters.conf from the MapR cluster to the $MAPR_HOME/conf folder in the VDP machine.

demo.mapr.com secure=true maprdemo:7222

Generate MapR ticket (secure clusters only)

Every user who wants to access a secure cluster must have a MapR ticket (maprticket_<username>) in the temporary directory (the default location).

Use the $MAPR_HOME/maprlogin command line tool to generate one:

C:\opt\mapr\bin>maprlogin.bat password -user mapr

[Password for user 'mapr' at cluster 'demo.mapr.com': ]

MapR credentials of user 'mapr' for cluster 'demo.mapr.com' are written to 'C:\Users\<username>\AppData\Local\Temp/maprticket_<username>'

! Note

If you get an error like

java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty when executing maprlogin

you need to specify a truststore before executing the maprlogin command.

For this, you can copy the /opt/mapr/ssl_truststore from MapR Cluster to $MAPR_HOME/conf directory in the local machine.

Add JVM option

Add -Dmapr.library.flatclass to the VDP Server JVM options.

VDP Server JVM options

Otherwise, VDP will throw the exception java.lang.UnsatisfiedLinkError from JNISecurity.SetParsingDone() while executing the Kafka Custom Wrapper.

Create custom data source

In order to use the MapR vendor libraries you should not import the DFS Custom Wrapper into Denodo.

You have to create the custom data source using the ‘Classpath’ parameter instead of the ‘Select Jars’ option. Click Browse to select the directory containing the required dependencies for this custom wrapper:

  • The denodo-hdfs-customwrapper-${version}.jar file of the dist directory of this custom wrapper distribution.

  • The contents of the lib directory of  this custom wrapper distribution, replacing the Apache Hadoop libraries with the MapR ones.

The MapR Maven repository is located at http://repository.mapr.com/maven/. The name of the JAR files that you must use contains the version of Hadoop, Kafka, Zookeeper and MapR that you are using:

  • hadoop-xxx-<hadoop_version>-<mapr_version>
  • maprfs-<mapr_version>

As MapRClient native library is bundled in maprfs-<mapr_version> jar you should use the maprfs jar that comes with the Mapr Client, previously installed, as the library is dependent on the operating system.

  • zookeeper-<zookeeper_version>-<mapr_version>
  • json-<version>
  • the other dependencies of the lib directory of  this custom wrapper distribution

! Important

MapR native library is included in these Custom Wrapper dependencies and can be loaded only once.

Therefore, if you plan to access to other MapR sources with Denodo, like:

  • MapR Database with HBase Custom Wrapper
  • MapR Event Store with Kafka Custom Wrapper
  • Drill with JDBC Wrapper.

you have to use the same classpath to configure all the custom wrappers and the JDBC driver; see  'C:\Work\MapR Certification\mapr-lib' in the image above.

With this configuration Denodo can reuse the same classloader and load the native library only once.

Configure base view

Configure the DFS wrapper parameters as usual:

MapR base view edition