How to connect to Apache Drill from Denodo

Applies to: Denodo 7.0
Last modified on: 29 Apr 2019
Tags: Apache Drill MapR Security JDBC data sources Kerberos

Download document

Goal

This document explains how to integrate Apache Drill into the Denodo Platform using its JDBC driver.

Overview

Apache Drill is a distributed and schema-free SQL query engine that can connect to data stored in multiple formats (avro, parquet, csv, json,...) and datastores and provide a common SQL interface.

Apache Drill

As Drill was kicked off by MapR (many of the committers and contributors to Drill have come from MapR), and it is included in the MapR Data Platform this document will take MapR as the reference environment.

Drill Setup

Check what storage plugins are configured by visiting the Drill Web Console at https://<HOST>:8047/storage.

For this guide we have enabled the storage plugins that allows MapR Filesystem, MapR Database and Hive to serve as data sources for Drill:

Drill Storage Plugins configuration

You can find here the supported data sources and formats in Drill on MapR:

https://mapr.com/docs/61/Drill/drill_storage_and_format_plugin_support_matrix.html

Creating a JDBC Data Source

From the Virtual DataPort Administration tool, create a new JDBC data source by selecting File > New > Data source > JDBC. This will open a wizard to create a connection with a JDBC driver.

Fill all the required fields:

  1. Database adapter: Generic

  1. JDBC driver. There are two possibilities:

  1. Drill JDBC driver provided by MapR

Driver class: com.mapr.drill.jdbc41.Driver

Driver class path: Copy the driver jars to a folder and fill the wizard field with the folder absolute path. 

! Important

MapR native library is included in the MapR Drill driver and can be loaded only once.

Therefore, if you plan to access to other MapR sources with Denodo, like:

  • MapR FileSystem with HDFS Custom Wrapper
  • MapR Database with HBase Custom Wrapper
  • MapR Event Store with Kafka Custom Wrapper

you have to use the same classpath to configure all the custom wrappers and the Drill JDBC driver.

With this configuration Denodo can reuse the same classloader and load the native library only once.

! Important

The slf4j dependency should be deleted from the MapRDrillJDBC41-1.<version> directory, otherwise Denodo will fail:

  1. The JDBC driver (drill-jdbc-all jar) included with Drill distribution from http://apache.osuosl.org/drill.

This driver supports Kerberos and Plain authentication mechanisms, but does not support the MapR-SASL authentication mechanism.

Driver class: org.apache.drill.jdbc.Driver

Driver class path: Copy the driver jars to the folder <DENODO_HOME>/lib-external/jdbc-drivers/database name - version and fill the wizard field with the value ‘database name - version’

! Important

The slf4j dependency should be deleted from the drill-jdbc-all jar, otherwise, Denodo will fail:

  1. Database URI. There are two possibilities:

  1. Recommended: Connect to the ZooKeeper. ZooKeeper returns to the client the available Drillbits in the cluster to which the query can be submitted.

The URI is of the form:

jdbc:drill:zk=<zk.connect>/<drill_directory_in_zookeeper>/<cluster_ID>

Being the configuration file  <DRILL-HOME>/conf/drill-distrib.conf:

                

the JDBC URI will be:

jdbc:drill:zk=maprdemo:5181/drill/demo_mapr_com-drillbits

  1. Connect directly to a specific Drillbit, though this is generally not recommended due to the hardcoding that can happen in client.

jdbc:drill:drillbit=<host>:<port>

  1. Delimiter. The default setting for identifier quotes in Drill is backticks. Therefore, in the Read & Write tab of the data source, set backticks (`) as the Delimiter Identifier.

Drill Delimiter configuration

  1. Ping query. Configure Select 1 in Connection Pool Configuration > Ping query

Drill Ping Query configuration

  1. Source Configuration. Drill does not support Prepared-statement dynamic parameters. Therefore, set Allow Literal as Parameters to false in the Source Configuration tab of the data source.

Drill Source Configuration

Otherwise, you will receive the following errors when adding a WHERE clause to filter the data returned by a base view:

  • MapR Driver:
  • [*]SQLE [MapR][JDBC](10940) Invalid parameter index: 1

  • [MapR][DrillJDBCDriver](500980) Encountered error while creating prepared statement. Details: PLAN ERROR: Cannot convert RexNode to equivalent Drill expression. RexNode Class: org.apache.calcite.rex.RexDynamicParam, RexNode Digest: ?0

  • Apache driver:

         Received exception with message 'Prepared-statement

dynamic parameters are not supported.'

Drill data source with MapR JDBC Driver

Drill data source with Apache Drill JDBC Driver

Creating base views

Once we have configured our data source pointing to Drill, we can create different views for accessing the desired entities.

Denodo will display a tree with the schemas available in Drill. In this example it will include the Hive and MapR Database schemas, as well as the workspaces configured in the file system storage plugin:

Drill schemas

  • For Hive databases you can inspect the full schema, its tables and their fields, as the metadata is available in the Hive metastore.

Then, you have to select the tables you want to import and click the Create selected button.

  • For MapR Database you can inspect the schema partially. Wide-column NoSQL databases can be schema-less by design; every row has its own set of column name-value pairs in a given column family, and the column value can be of any data type. A MAP complex type represents this variable column name-value structure.

As shown in the image above, day, hour and total are column families and are represented as MAP.

 

Then, you have to select the tables you want to import and click the Create selected button.

  • For MapR Filesystem you cannot inspect the schema as it is discovered on the fly based on a specific query.

In this case, you have to create the base views using the option Create from query:

Drill base view from query

Now, the base views are ready for the execution and to be combined with the rest of the sources.

Drill base view

Drill base view results

Complex types

Drill supports the following composite types:

  • Map
  • Array

Denodo treats these as fields of type text. In the image below, flight is a complex type, although Denodo is detecting it as text:

With the Apache Drill driver Denodo will display NULL values for this complex type detected as of type text: https://redmine.denodo.com/issues/42639.

        

As a workaround, you can replace the text type by the register type for the flight field in the vql.

With the Drill driver provided by MapR Denodo displays the complex type as raw text:

        

As a workaround, you can use the Create from query option for selecting single fields from complex types and also, you can take advantage of the field conversion. This solution works with both JDBC drivers: Apache and MapR.

MapR Security

These are the configuration steps for connecting to Drill with MapR-SASL authentication, the default MapR security configuration:

Denodo Setup

  1. Create or copy from the MapR cluster the mapr-clusters.conf file,  with the MapR cluster name and the list of CLDB nodes, in the $MAPR_HOME/conf directory.

If MAPR_HOME environment variable is not defined /opt/mapr is the default path.

demo.mapr.com secure=true maprdemo:7222

  1. Every user who wants to access a secure cluster must have a MapR user ticket, maprticket_<username>, in the temporary directory (the default location). 

You can either copy a user ticket that was generated on the MapR cluster, or you can install the MapR client and use the maprlogin command line tool to generate one.

The JDBC user must be the same as the user that created the ticket.

C:\opt\mapr\bin>maprlogin.bat password -user mapr

[Password for user 'mapr' at cluster 'demo.mapr.com': ]

MapR credentials of user 'mapr' for cluster 'demo.mapr.com' are written to 'C:\Users\<username>\AppData\Local\Temp/maprticket_<username>'

! Note

If using the maprlogin command you get an error like

java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty when executing maprlogin

you need to specify a truststore.

For this, you can copy the /opt/mapr/ssl_truststore from MapR Cluster to $MAPR_HOME/conf directory in the local machine.

  1. Add the following JVM setting to Denodo server: -Dmapr.library.flatclass. Otherwise, Denodo will throw the exception java.lang.UnsatisfiedLinkError from JNISecurity.SetParsingDone().

  1. Create the JDBC Source following the instructions in the Creating a JDBC Data Source section, but:

  1. Using the Drill JDBC driver provided by MapR. Since the Apache Drill driver does not support the MapR-SASL authentication mechanism.

  1. Adding the property auth=MAPRSASL to the JDBC URI

The URI is of the form:

jdbc:drill:zk=<zk.connect>/<drill_directory_in_zookeeper>/<cluster_ID>;auth=MAPRSASL

                

the JDBC URI will be:

jdbc:drill:zk=maprdemo:5181/drill/demo_mapr_com-drillbits;auth=MAPRSASL

Kerberos Security

These are the configuration steps for connecting to Drill with Kerberos authentication:

Denodo Setup

Create the JDBC Source following the instructions in the Creating a JDBC Data Source section, but:

  1. For this scenario, we chose the Apache Drill driver, as it is not common for a MapR cluster being configured with Kerberos authentication.

! Important

The slf4j, javax.xml and org.w3c dependencies should be deleted from the drill-jdbc-all jar, otherwise Denodo will fail:

  1. Add the properties principal, user and keytab to the JDBC URI

The URI is of the form:

jdbc:drill:zk=<zk.connect>/<drill_directory_in_zookeeper>/<cluster_ID>;principal=primary/instance@REALM;user=client/instance@REALM;keytab=client_keytab

the JDBC URI will be:

jdbc:drill:zk=demo:2181/drill/drillbits1;principal=drill/demo@EXAMPLE;user=client/demo@EXAMPLE;keytab=/work/drill/client.keytab

Questions

Ask a question
You must sign in to ask a question. If you do not have an account, you can register here

Featured content

DENODO TRAINING

Ready for more? Great! We offer a comprehensive set of training courses, taught by our technical instructors in small, private groups for getting a full, in-depth guided training in the usage of the Denodo Platform. Check out our training courses.

Training