Apache Hive is a software that facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
Before we start with the Big Data integration using Hive, we are going to add a sample file with a list of clients to our Apache Hadoop distribution.
This file represents the potential clients data we obtained from the marketing department. You can find the file under
Feel free to use any Apache Hadoop distribution to follow this tutorial.
To create a Hive table, login to the system and follow these steps:
$ hadoop fs -copyFromLocal /path/newClients.csv /home/denodo/
hive > CREATE TABLE prospect (
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
hive > LOAD DATA INPATH /home/denodo/newClients.csv OVERWRITE INTO TABLE prospect;
hive > SELECT * FROM prospect;
It should return a few records.
Once the data is incorporated into the Hive table it will be very easy to access this table from the Denodo server using Hive's JDBC drivers:
To incorporate some of the tables into the Denodo Virtual Schema, you have to check the box next to the tables or views you want to import. In this case, check prospect and then click the Create selected button.
You can later query this base view, or combine its data with data from other views.
When the importing process is finished, you will see the new views in the elements tree panel. If you double-click on the view name, the schema of the base view is shown in the workspace.
If you execute a query on the recently created view, Denodo will delegate the query to the Hive data source. Hive will translate the query into the necessary MapReduce jobs and it will return the results to Denodo. The results returned should be the same as the ones that were returned when executing the same query from the Hive command line.
There you go, your own Hive datasource in Denodo!