HCatalog is a table and storage management layer for Hadoop that makes it easier to read and write data. HCatalog presents users with a relational view of the data in HDFS and ensures that users do not need to worry about where or in what format their data is stored.
In this section of the Big Data tutorial we will learn how to integrate the
HCatalog REST API to access the information of the tables created
in a Hadoop installation. This API allows users to send HTTP requests to access Hadoop MapReduce, Pig, Hive, and HCatalog DDL resources. These resources are accessed
using the following URL format:
resource is the name of the HCatalog
resource. You can see the list of HCatalog resources here
In addition to the base URL, you will also have to specify the username running the request. To do so using the default security setting, you will have to
user.name parameter to the URL. For instance, if you want to get the list of databases in HCatalog you will have to send a request to the
As you can see, if you send this request from a web browser, the output format of the REST API is JSON, so we are going to import the the HCatalog resources into Denodo using JSON data sources.
To import a new JSON data source that gets all the databases in the system:
New > Data source > JSON.
http://<servername>/templeton/v1/ddl/database?user.name=<your_username>as the URL, replace <servername> and <your_username> with appropriate values.
Because of the hierarchical structure of the JSON returned by the HCatalog REST API, you can see that the new base view has only one top level field of type array. This array field contains a list of items with one single field (called field_0 by default) that actually contains the database name.
Now that we have a view with the databases in our Big Data deployment, it will be interesting to know the different tables created in a database. To get the
list of tables using the HCatalog API you can send a request to the following URL:
where you have to replace <databasename> with the actual name of the database. If we want to import this into Denodo, we would have to create a
different data source for each database in the system. But, wouldn't it be easier if we could provide the database name as an input parameter when querying
the data source? Certainly, and it is possible to do this by using interpolation variables when defining the data source.
The interpolation variables start with the @ character followed by the variable name and they can be included as part of the URL when defining
a data source that uses HTTP connections. For instance, we can enter the URL above as:
To import a new JSON data source that gets all the tables in a database entered as input parameter:
New > Data source > JSON.
http://<servername>/templeton/v1/ddl/database/@param_database/table?user.name=<your_username>as URL, replacing <servername> and <your_username> with appropriate values.
As with the previous datasource, the list of tables is contained under a field of type array called tables.
If you try to query the new base view you will see that it has a mandatory input parameter param_database and that you will have to provide a value for
as a WHERE condition any time that you run a query on the base view.
If you actually run a query, you will see that it returns the tables for the database that you have specified as input parameter.
To navigate to the list of tables you can just double-click on the tables field that is displaying the [Array]... value.
With that, you are finished with the Big Data Tutorial!