One of the goals of defining a virtual layer is to offer the data consumers a simple, uniform and efficient access to the data sources. The same general goal applies in Data Science workflows. Once our ML algorithm is fitted, we should design and implement the access system to its predictions.
Here, we are going to focus on the on-demand access pattern. Other considerations and techniques apply if a batch access is under study, you can get some insight here.
The ML algorithm is published via a REST web service that takes the values of all the input variables that the algorithm has been trained on and restitutes the prediction. As the final applications and users do not know all these input variables (their number may be huge and moreover it can vary over time), we can leverage the fact that all the data transformations steps have already been created in Denodo. We will then define a data source on the ML web service and construct another web service in Denodo, that will get the search parameters sent from the application, get the input variables, pass them to the ML web service and deliver the prediction back to the application.
We will go through the following steps:
Then, in Web Design Studio:
meter_reading_final
to prepare all (and only) the predictor fields to be joined with the base view created in the previous step.
timestamp_in
and building_id
and will give back the relevant predicted meter readings values with their respective timestamp.
Before starting with the development, let's see how to call the ML web service and what are its validity boundaries.
The ML web service expects all the predictors that the model has been trained on to emit the building energy consumption predictions. For your reference, we recall that the ML has been cross-validated and trained in the previous task where you'll find the predictors list.
There are two endpoints available for the ML web service:
http://mlservice.dstutorial.com:5010/meter_readings/api/v1.0/
):
Each predictor must be passed as a couple of name and value, as shown here:
param1=value1¶m2=value2&[...]¶mN=valueN
http://mlservice.dstutorial.com:5010/meter_readings/api/v2.0/
):
All the predictors must be embedded in a JSON file and this file passed as value for parameter input_data, as shown here:
[{"param1":"value1","param2":"value2", ..., "paramN":"valueN"}]
Both endpoints replies with a JSON file containing the field predicted_meter_reading
and its value, ex.{"predicted_meter_reading":180.186}
.
For the rest of the task, we will be using endpoint v1. If you are interested in how to do the integration with endpoint v2, you'll find the equivalent data preparation flow in database dstutorial_sample_completed
.
Regarding the validity boundary of the web service, let's recall the data scope that we used to train our machine learning algorithm:
site_id=0
).
meter=1
).
It is very important to let end users know the validity scope of the prediction service and to design it in such a way that predictions are always consistent with the data scope used at training time.
Moreover, we should keep in mind that we can retrieve the values for the predictors that are already available, in our specific case these correspond to the records of view bv_meter_reading_test
. This imposes a condition on the timestamp to be specified by the data consumer, specifically 2018-12-31 23:00:00 (SELECT MAX(timestamp_in) FROM bv_meter_reading_test
) .
After having mentioned these important considerations, we can now get started with the REST web service development.
The first step would be to test the low-level Python ML web service. You can open a browser (or another REST client) and open the following URL.
You should get back a floating number.
Let's go on now with defining a data source and a base view on this web service.
Let's log in into the Web Design Studio
User: dstutorial_usr
Password: dstutorial_usr
As a first step, create the following folders:
/01_connectivity/01_data_sources/prediction
/01_connectivity/02_base_views/prediction
/02_integration/prediction/staging
/02_integration/prediction/staging/v1
To create the first folder, select /01_connectivity/01_data_sources
, click on then New->Folder and enter
prediction
as the name. The folder is created and visible in the folder tree. Do the same for the remaining folders.
Now select /01_connectivity/01_data_sources/prediction
, click on , then >New->Data Source->JSON.
In the new wizard, set the Name to ds_ml_prediction_service_v1
and Data Route to HTTP Client.
Then expand the Configuration menu and paste the following url in the URL field:
http://denodo.dstutorial.com:5010/meter_readings/api/v1.0/?primary_use=@primary_use&square_feet=@square_feet&building_age=@building_age&dew_temperature=@dew_temperature&monthly_dp01=@monthly_dp01&monthly_dp10=@monthly_dp10&monthly_emxp=@monthly_emxp&monthly_prcp=@monthly_prcp&reading_hour=@reading_hour
As you can see, we are passing the predictors names and their values as separated parameters (param_name=@param_value
).
By prefixing the parameter value with a @ we are specifying that the value we are passing is not constant, but instead must be dynamically generate either from a user or from another view. In order to let Denodo probe the web service we need to enter the probe values. Based on the web service response Denodo will create the structure of the base view.
If you now click on Test Connection, you will be prompted to specify the probe values for all the predictors. You can enter the values listed in this table:
primary_use
|
Education
|
square_feet
|
1000
|
building_age
|
10
|
dew_temperature
|
10.0
|
monthly_dp01
|
10.0
|
monthly_dp10
|
10.0
|
monthly_emxp
|
10.0
|
monthly_prcp
|
10.0
|
reading_hour
|
0.0
|
Then Save.
Go on with creating a base view on ds_ml_prediction_service_v1
.
Open the data source, if it is not already open, then click on CREATE BASE VIEW.
Enter the same probe values as above for the parameters. The view is created. Click on EDIT to change:
bv_ml_prediction_service_v1
.
/01_connectivity/02_base_views/prediction
.
Finally try executing the view with the same parameters values specified above.
The data service preparation goes on now with the creation of a selection view on meter_reading_final
. The goal is to prepare all the predictor fields.
So let's select the folder /02_integration/prediction/staging/v1
, then click on , then New->Selection. Drag and drop the view
iv_prediction_service_ondemand_v1
in the work area, then:
timestamp_in
, building_id
, meter
, site_id
and click on Remove.
COALESCE( field_name, 'null' )
This is needed because we need to join this view with the view defined over the ML Web service, to dynamically pass all the values that the ML Web Service needs to emit the prediction. The Denodo optimizer will push down the tree view a not null condition on each field that enters the join conditions. While this is the expected behaviour in normal situations, this is not what we want in this case: the web service must be called even if some predictor value is missing. The missing value, if present, will be properly handled by the ML Web Service.
building_id
and timestamp_in
as primary keys by clicking on the corresponding key icon (meter
can be removed from the primary keys definition as we filtered on it.
iv_prediction_service_ondemand_v1
.
To test your view you can run the following query:
SELECT *
FROM dstutorial.iv_prediction_service_ondemand_v1
WHERE CAST('localdate',timestamp_in) ='2017-01-01'
AND building_id = 7
It should return 24 rows and 13 columns in about 80 seconds.
Now, we have to join the views iv_prediction_service_ondemand_v1
and bv_ml_prediction_service
on all the predictor fields.
So let’s select the folder /02_integration/prediction/
, then click on , then New->Join. Drag and drop the view
iv_prediction_service_ondemand_v1
and bv_ml_prediction_service
in the work area, then:
iv_prediction_service_ondemand_v1
and drag the generated arrow on top of bv_ml_prediction_service.
iv_prediction_service_ondemand_v1_final
.
Test the view with the following query iv_prediction_service_ondemand_v1_final
:
SELECT *
FROM dstutorial.iv_prediction_service_ondemand_v1_final
WHERE building_id = 7
AND timestamp_in = TIMESTAMP '2017-08-11 00:00:00'
This query must return a single row in a few seconds, with a non null value for predicted_meter_reading
.
We test this query as it is of the same type as the queries that will be generated by our final data service, that we are going to build and deploy now.
Let us now design our building energy consumption service:
building_id
and timestamp_in
and give back to the requester the prediction (predicted_meter_reading
) for the combination of those variables.
mlservice_usr
and mlservice_usr
respectively.
To create it, follow these instructions:
05_data_services
, click on meterreadingpredondemand_v1
then drag and drop iv_prediction_service_ondemand_v1_final
in the Resources table
iv_prediction_service_ondemand_v1_final
by clicking on meter
and site_id
by clicking on mlservice_usr
as login and password.
This is the Building Energy Consumption Prediction Service.
The predictions are valid for the Chilled Water meter reading at site University of Central Florida, Orlando, FL.
Input parameters:
- timestamp_in: with format YYYY-MM-DDTHH:MM:DD, it must be in the range 2017-01-01 and 2018-12-31 both included
- building_id: must be in range between 0 and 104 both included
dstutorial_usr
. Notice that while the requester will need to authenticate with user mlservice_usr
, Denodo will run the corresponding query with user dstutorial_usr
.
The Prediction Web Service is now deployed and ready to be consumed by end applications. You can see and change the deployment status of any Denodo REST Web Service by using the Web Service Container menu accessible from the Tools menu, located at the top-right of the screen.
To test the REST Web Service, you can query it with your preferred client, for example the browser.
Here an example:
As an exercise, you may want to build another REST data service that instead of the timestamp would require the parameter date, thus returning all the predictions for a given building and meter, for a whole day. To start with, you should define a new field in view predicted_meter_reading_ondemand_v1_final
, that builds the field date from the field timestamp_in
.