HCatalog is a table and storage management layer for Hadoop that makes easier to read and write data. HCatalog presents users with a relational view of the data in HDFS and ensures that users do not need to worry about where or in what format their data is stored.
In this section of the Big Data tutorial we will learn how to integrate the
HCatalog REST API to access the information of the tables created
in a Hadoop installation. This API allows to send HTTP requests to access Hadoop MapReduce, Pig, Hive, and HCatalog DDL resources. These resources are accessed
using the following URL format:
resource is the name of the HCatalog
resource. You can see the list of HCatalog resources here
In addition to the base URL you will also have to specify the user name running the request. To do so using default security setting you will just have to
user.name parameter to the URL. For instance, if you want to get the list of databases in HCatalog you will have to send a request to the
As you can see if you send this request from a web browser the output format of the REST API is JSON so we are going to import the the HCatalog resources into Denodo using JSON data sources.
To import a new JSON data source that gets all the databases in the system:
New > Data source > JSON.
http://<servername>/templeton/v1/ddl/database?user.name=<your_user_name>as URL, replacing <servername> and <your_user_name> with appropriate values.
Because of the hierarchical structure of the JSON returned by the HCatalog REST API, you can see that the new base view has only one top level field of type array. This array field contains a list of items with one single field (called field_0 by default) that will actually contain the database name.
Now that we have a view with the databases in our Big Data deployment it will be interesting to know the different tables created in a database. To get the
list of tables using the HCatalog API you can send a request to the following URL:
where you have to replace <databasename> with the actual name of the database. If we want to import this into Denodo we would have to create a
different data source for each database in the system. But, would not it be easier if we could provide the database name as an input parameter when querying
the data source? It is possible to do this by using interpolation variables when defining the data source.
The interpolation variables will start with the @ character followed by the variable name and they can be included as part of the URL when defining
a data source that uses HTTP connections. For instance, we can enter the URL above as:
To import a new JSON data source that gets all the tables in a database entered as input parameter:
New > Data source > JSON.
http://<servername>/templeton/v1/ddl/database/@param_database/table?user.name=<your_user_name>as URL, replacing <servername> and <your_user_name> with appropriate values.
With this section, you are finished with the Big Data Tutorial.