Goal
This document explains how to integrate Apache Drill into the Denodo Platform using its JDBC driver.
Overview
Apache Drill is a distributed and schema-free SQL query engine that can connect to data stored in multiple formats (avro, parquet, csv, json,...) and datastores and provide a common SQL interface.
Apache Drill
Drill Setup
Check which storage plugins are configured by visiting the Drill Web Console at https://<HOST>:8047/storage.
For this guide we have enabled the storage plugins that allows JAR files in the Drill classpath, and the local file system or any distributed file system such as S3 or Hadoop as data sources for Drill:
Drill Storage Plugins configuration
You can find here the supported data sources and formats in Drill on HPE EZMERAL which replaced MapR:
Creating a JDBC Data Source
Download the JDBC driver
Before being able to connect to your Apache Drill instance, you need to download the JDBC driver.
The JDBC driver (drill-jdbc-all.jar file) included with Drill distribution from http://apache.osuosl.org/drill
- This driver supports Kerberos and Plain authentication mechanisms, but does not support the MapR-SASL authentication mechanism.
Install the JDBC driver
Prior versions of the Apache Drill driver may include dependencies that conflict with the ones included in the Denodo Platform. These steps are included in case you encounter problems with an older driver. Keep in mind, as of the creation of this document, the recent version of the Apache Drill JDBC driver was 1.21.1. Please consider these notes if you encounter connection problems:
- Apache Drill driver
! Important note |
The org.slf4j, javax.xml, org.w3c and org.xml dependencies need to be deleted from the drill-jdbc-all.jar file, otherwise Denodo will fail with errors like these: |
For deleting these dependencies, you should open the drill-jdbc-all.jar file as an archive and delete the following folders from it:
- org/slf4j
- org/w3c
- org/xml
- javax/xml
Starting from Denodo 8.0, the preferred way of installing the JDBC driver is to upload the JDBC driver via the Denodo Design Studio. This can be done under File > Extension management > Libraries > Import. Select jdbc_other and provide a custom name (Custom version) field for using it later. Import all the necessary jars for connecting to Drill.
Create the JDBC data source
From the Web Design Studio or from the Virtual DataPort Administration tool, create a new JDBC data source by selecting File > New > Data source > search for Generic database (JDBC). This will open a Data source dialog on the Configuration tab to create a connection with a JDBC driver.
Fill all the required fields:
- Database adapter: Generic
- Driver class: You need to specify org.apache.drill.jdbc.Driver (Advanced tab) in the Design Studio for the Apache Drill driver.
- Driver class path: Select the driver class path that you have uploaded previously and fill the wizard field with the value ‘database name - version’
- Database URI. There are two possibilities:
- Recommended: Connect to the ZooKeeper. ZooKeeper returns to the client the available Drillbits in the cluster to which the query can be submitted.
The URI is of the form:
jdbc:drill:zk=<zk.connect>/<drill_directory_in_zookeeper>/<cluster_ID> |
Being the configuration file <DRILL-HOME>/conf/drill-override.conf:
the JDBC URI will be:
jdbc:drill:zk=localhost:2181/drill/drillbits1 |
- Connect directly to a specific Drillbit, though this is generally not recommended due to the hardcoding that can happen in clients.
jdbc:drill:drillbit=<host>:<port> |
- Delimiter. The default setting for identifier quotes in Drill is backticks. Therefore, in the Read & Write tab of the data source configuration, set backticks (`) as the Delimiter Identifier.
Drill Delimiter configuration
- Ping query. Enable Test connections and configure Select 1 in Connection Pool Configuration > Ping query
Drill Ping Query configuration
- Source Configuration. Drill does not support Prepared-statement dynamic parameters. Therefore, set Allow Literal as Parameters to No in the Source Configuration tab of the data source.
Drill Source Configuration
Otherwise, you will receive the following error when adding a WHERE clause to filter the data returned by a base view:
- Apache driver:
Received exception with message 'Prepared-statement
dynamic parameters are not supported.'
Drill data source with Apache Drill JDBC Driver
Creating base views
Once we have configured our data source pointing to Drill, we can create different views for accessing the desired entities.
- For Hive databases you can inspect the full schema, its tables and their fields, as the metadata is available in the Hive metastore.
Then, you have to select the tables you want to import and click the Create selected button.
- For dfs Filesystem you cannot inspect the schema as it is discovered on the fly based on a specific query.
In this case, you have to create the base views using the option Create from query:
Drill base view from query
Now, the base views are ready for the execution and to be combined with the rest of the sources.
Drill base view
Drill base view results
Complex types
Drill supports the following composite types:
- Map
- Array
Denodo treats these as fields of type text. In the image below, flight is a complex type, although Denodo is detecting it as text:
You can use the Create from query option for selecting single fields from complex types and also, you can take advantage of the field conversion.
- Add the following JVM setting to Denodo server: -Dmapr.library.flatclass. Otherwise, Denodo will throw the exception java.lang.UnsatisfiedLinkError from JNISecurity.SetParsingDone().
Kerberos Security
These are the configuration steps for connecting to Drill with Kerberos authentication:
Denodo Setup
Create the JDBC Source following the instructions in the Creating a JDBC Data Source section, but:
- For this scenario, using the Apache Drill driver. Note that you may have to remove the dependencies from the driver as previously explained.
- Add the properties principal, user and keytab to the JDBC URI
The URI is of the form:
jdbc:drill:zk=<zk.connect>/<drill_directory_in_zookeeper>/<cluster_ID>;principal=primary/instance@REALM;user=client/instance@REALM;keytab=client_keytab |
the JDBC URI will be:
jdbc:drill:zk=demo:2181/drill/drillbits1;principal=drill/demo@EXAMPLE;user=client/demo@EXAMPLE;keytab=/work/drill/client.keytab |
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.