Connecting to other Big Data systems

In this section of the tutorial, we will learn how to access various Big Data systems from Denodo Platform 7.0. As Denodo Platform supports a wide variety of data sources, it's a good idea to list what it's the best option for connecting to these systems from Denodo Platform:

Connection Notes
Apache Impala

Create a JDBC data source and select the adapter "Impala-2.3". Then, fill the necessary parameters to connect to Impala database.

Sample URI: jdbc:impala://host:port/database

* Denodo does not distribute the Impala JDBC driver. It can be downloaded from here. Once downloaded, unzip the driver and place all the libraries in the folder $DENODO_HOME/lib-external/jdbc-drivers/impala-2.3

Impala is supported by Denodo Platform 7.0 as MPP Engine and Cache Database (bulk data load).
HBase You could access HBase data sources from Virtual DataPort by utilizing any one of the following:

  • Denodo HBase Custom Wrapper: this is a DenodoConnect component that could perform read operations on an HBase database. This component and its documentation is distributed from the Denodo Support Site.
  • HBase REST (Stargate) which is a Web services interface built into HBase. In Virtual DataPort Administration Tool, you could create an XML data source and utilize the URL of HBASE REST API. For instance: http://[server]:[port]/[table]/*. Ensure that the HBase REST service is deployed into Hadoop installation and started.
Also, you can access from Denodo Platform to Hive which can work on top of HBase.

* You could refer previous tutorial section Connecting to Apache hive for further information about accessing Hive from Virtual DataPort.
Presto

Create a JDBC data source and select the adapter "Presto 0.1x". Then, fill the necessary parameters to connect to Presto .

Sample URL: jdbc:presto://host:port/schema/default

* Presto JDBC driver is distributed in the Denodo installation.

Presto is supported by Denodo Platform 7.0 as MPP Engine and Cache Database (bulk data load).
SparkSQL

Create a JDBC data source and select the adapter "Spark SQL 1.5" or "Spark SQL 1.6" (depending on the target version). Then, fill the necessary parameters to connect to SparkSQL.

Sample URL: jdbc:hive2://host:port/database

* The SparkSQL JDBC driver is distributed in the Denodo installation.

SparkSQL is supported by Denodo Platform 7.0 as MPP Engine and Cache Database (bulk data load).
HDFS

You can access HDFS from Dendo Platform by using the Denodo HDFS Custom Wrapper. This component and its documentation is distributed from the Denodo Support Site.

* For more details on how to use this custom wrapper, you can refer to the user manual included in its distribution.

This component is capable of reading several file formats stored in HDFS such as:
  • Delimited text files,
  • Sequence files,
  • Map files,
  • Avro files
  • and Parquet files.
Splunk

Splunk offers a REST API for accessing, creating, updating and deleting resources. In Denodo Platform, create an XML data source and use the URL of Splunk REST API

Sample URL: https://localhost:8089/services/search/jobs/

MapReduce

You can access to MapReduce jobs from Denodo Platform using the Denodo MapReduce Custom Wrapper. This component and its documentation is distributed from the Denodo Support Site.

* For more details on how to use this custom wrapper, you can refer to the user manual included in its distribution.

This custom component:
  1. Connects to the Hadoop server via SSH
  2. Then, executes a MapReduce job
  3. And finally, reads the results from HDFS

Check the next section to see more details accessing some specific sources!