Apache Hive is a software that facilitates querying and managing large datasets residing in distributed storage. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.
Before we start with the Big Data integration using Hive, we are going to add a sample file with a list of clients to our Apache Hadoop distribution.
This file represents the data that is obtained by the marketing department. You can find the file under
Feel free to use any Apache Hadoop distribution to follow this tutorial.
To create a Hive table, log in to the system and follow these steps:
$ hadoop fs -copyFromLocal /path/newClients.csv /home/denodo/
hive> CREATE TABLE prospect (
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
hive> LOAD DATA INPATH '/home/denodo/newClients.csv' OVERWRITE INTO TABLE prospect;
hive> SELECT * FROM prospect;
It should return a few records.
Once the data is incorporated into the Hive table it will be very easy to access this table from the Denodo server using Hive's JDBC drivers.
To incorporate some of the tables into the Denodo virtual schema, you have to check the box near the tables or views you want to import. In this case check prospect and then click on the Create selected base views button.
You can later query this base view or combine its data with data from other views.
When the importing process is finished, in the elements tree panel you will see the new views. If you double-click on the view name, the schema of the base view is shown in the workspace.
If you execute a query on the recently created view, Denodo will delegate the query to the Hive data source. Hive will translate the query into the necessary MapReduce jobs and it will return the results to Denodo. The results returned should be the same as the ones that were returned when executing the same query from the Hive command line.