Accessing HCatalog
HCatalog is a table and storage management layer for Hadoop that makes it easier to read and write data. HCatalog presents users with a relational view of the data in HDFS and ensures that users do not need to worry about where or in what format their data is stored.
In this section of the Big Data tutorial we will learn how to integrate the
HCatalog REST API to access the information of the tables created
in a Hadoop installation. This API allows users to send HTTP requests to access Hadoop MapReduce, Pig, Hive, and HCatalog DDL resources. These resources are accessed
using the following URL format: http://<servername>/templeton/v1/resource
where resource
is the name of the HCatalog
resource. You can see the list of HCatalog resources here
In addition to the base URL, you will also have to specify the username running the request. To do so using the default security setting, you will have to
add the user.name
parameter to the URL. For instance, if you want to get the list of databases in HCatalog you will have to send a request to the
following URL: http://<servername>/templeton/v1/ddl/database?user.name=<your_username>

As you can see, if you send this request from a web browser, the output format of the REST API is JSON, so we are going to import the the HCatalog resources into Denodo using JSON data sources.
To import a new JSON data source that gets all the databases in the system:
- Right-click on the Big Data folder and select
New > Data source > JSON
. - Enter hcatalog_db_list_ds as Name and select HTTP Client as Data Route.
- Click on
Configure
.
- In the Edit HTTP Connection window set with
http://<servername>/templeton/v1/ddl/database?user.name=<your_username>
as the URL, replace <servername> and <your_username> with appropriate values.
- Click
and then Create base view.
- Rename the view hcatalog_db_list.
- Click
again to confirm the creation of the base view.
Because of the hierarchical structure of the JSON returned by the HCatalog REST API, you can see that the new base view has only one top level field of type array. This array field contains a list of items with one single field (called field_0 by default) that actually contains the database name.
Now that we have a view with the databases in our Big Data deployment, it will be interesting to know the different tables created in a database. To get the
list of tables using the HCatalog API you can send a request to the following URL:
http://<servername>/templeton/v1/ddl/database/<databasename>/table?user.name=<your_username>
where you have to replace <databasename> with the actual name of the database. If we want to import this into Denodo, we would have to create a
different data source for each database in the system. But, wouldn't it be easier if we could provide the database name as an input parameter when querying
the data source? Certainly, and it is possible to do this by using interpolation variables when defining the data source.
The interpolation variables start with the @ character followed by the variable name and they can be included as part of the URL when defining
a data source that uses HTTP connections. For instance, we can enter the URL above as:
http://<servername>/templeton/v1/ddl/database/@param_database/table?user.name=<your_username>
To import a new JSON data source that gets all the tables in a database entered as input parameter:
- Right-click on the Big Data folder and select
New > Data source > JSON
. - Enter hcatalog_tables_ds as Name and select HTTP Client as Data Route. Then, click on
Configure
. - In the Edit HTTP Connection window set
http://<servername>/templeton/v1/ddl/database/@param_database/table?user.name=<your_username>
as URL, replacing <servername> and <your_username> with appropriate values. - If you click Ok, you will see a new Edit variable values dialog that will ask for an example value for the interpolation variables
used in the data source definition. In this case, you will have to provide the name of a database that exists in the system. This value will be used to
run a test request against the data source to verify that the connection is working.
- Click Ok to confirm the data source creation and then Create base view.
- After doing this, you will be asked again for an example value for the interpolation variable. In this case, the value will be used to run a request against the data source that processes the format of the REST API response to produce the output schema of the new view. You should enter the same value as in the data source' creation.
- Click Ok and rename the new view hcatalog_tables
- Click
to confirm the creation of the view.
As with the previous datasource, the list of tables is contained under a field of type array called tables.
If you try to query the new base view you will see that it has a mandatory input parameter param_database and that you will have to provide a value for
as a WHERE condition any time that you run a query on the base view.
If you actually run a query, you will see that it returns the tables for the database that you have specified as input parameter.
To navigate to the list of tables you can just double-click on the tables field that is displaying the [Array]... value.
With that, you are finished with the Big Data Tutorial!
Great Job!