Task 6 - Making a Prediction Data Service

Data Science and Machine Learning

One of the goals of defining a virtual layer is to offer the data consumers a simple, uniform and efficient access to the data sources. The same general goal applies in Data Science workflows. Once our ML algorithm is fitted, we should design and implement the access system to its predictions.

Here, we are going to focus on the on-demand access pattern. Other considerations and techniques apply if a batch access is under study, you can get some insight here.

The ML algorithm is published via a REST web service that takes the values of all the input variables that the algorithm has been trained on and restitutes the prediction. As the final applications and users do not know all these input variables (their number may be huge and moreover it can vary over time), we can leverage the fact that all the data transformations steps have already been created in Denodo. We will then define a data source on the ML web service and construct another web service in Denodo, that will get the search parameters sent from the application, get the input variables, pass them to the ML web service and deliver the prediction back to the application.

We will go through the following steps:

  • Understand how the ML web service must be called.
  • Identify the validity conditions under which the ML web service gives consistent predictions.

Then, in Web Design Studio:

  • Create the appropriate folder structure.
  • Create a JSON data source abstracting the ML web service.
  • Create a base view on this data source.
  • Create a selection view on meter_reading_final to prepare all (and only) the predictor fields to be joined with the base view created in the previous step.
  • Create a join view between the views just created, with the predictor fields as join keys.
  • Create and deploy a REST data service that will take the input parameters timestamp_in and building_id and will give back the relevant predicted meter readings values with their respective timestamp.

Before starting with the development, let's see how to call the ML web service and what are its validity boundaries.

The ML web service expects all the predictors that the model has been trained on to emit the building energy consumption predictions. For your reference, we recall that the ML has been cross-validated and trained in the previous task where you'll find the predictors list.

There are two endpoints available for the ML web service:

  • v1 (endpoint: http://mlservice.dstutorial.com:5010/meter_readings/api/v1.0/):

    Each predictor must be passed as a couple of name and value, as shown here:

    param1=value1&param2=value2&[...]&paramN=valueN

  • v2 (endpoint: http://mlservice.dstutorial.com:5010/meter_readings/api/v2.0/):

    All the predictors must be embedded in a JSON file and this file passed as value for parameter input_data, as shown here:

    [{"param1":"value1","param2":"value2", ..., "paramN":"valueN"}]

Both endpoints replies with a JSON file containing the field predicted_meter_reading and its value, ex.{"predicted_meter_reading":180.186}.

For the rest of the task, we will be using endpoint v1. If you are interested in how to do the integration with endpoint v2, you'll find the equivalent data preparation flow in database dstutorial_sample_completed.

Regarding the validity boundary of the web service, let's recall the data scope that we used to train our machine learning algorithm:

  • The site is University of Central Florida, Orlando, FL (site_id=0).
  • The meter is chilledwater (meter=1).

It is very important to let end users know the validity scope of the prediction service and to design it in such a way that predictions are always consistent with the data scope used at training time.

Moreover, we should keep in mind that we can retrieve the values for the predictors that are already available, in our specific case these correspond to the records of view bv_meter_reading_test. This imposes a condition on the timestamp to be specified by the data consumer, specifically 2018-12-31 23:00:00 (SELECT MAX(timestamp_in) FROM bv_meter_reading_test) .

After having mentioned these important considerations, we can now get started with the REST web service development.

The first step would be to test the low-level Python ML web service. You can open a browser (or another REST client) and open the following URL.

To open this link, make sure that the name mlservice.dstutorial.com resolves to the IP of the virtual machine. For this, you can add it to your hosts file as you did with denodo.dstutorial.com.

NOTE

http://mlservice.dstutorial.com:5010/meter_readings/api/v1.0/?primary_use=Office&square_feet=27000&building_age=11&dew_temperature=9.4&monthly_dp01=12.6&monthly_dp10=8.1&monthly_emxp=1.783&monthly_prcp=6.390909&reading_hour=0.9876883405951378

You should get back a floating number.

Let's go on now with defining a data source and a base view on this web service.

Let's log in into the Web Design Studio

User: dstutorial_usr
Password: dstutorial_usr

As a first step, create the following folders:

  • /01_connectivity/01_data_sources/prediction
  • /01_connectivity/02_base_views/prediction
  • /02_integration/prediction/staging
  • /02_integration/prediction/staging/v1

To create the first folder, select /01_connectivity/01_data_sources, click on then New->Folder and enter prediction as the name. The folder is created and visible in the folder tree. Do the same for the remaining folders.

Now select /01_connectivity/01_data_sources/prediction, click on , then >New->Data Source->JSON.

In the new wizard, set the Name to ds_ml_prediction_service_v1 and Data Route to HTTP Client.

Then expand the Configuration menu and paste the following url in the URL field:

http://denodo.dstutorial.com:5010/meter_readings/api/v1.0/?primary_use=@primary_use&square_feet=@square_feet&building_age=@building_age&dew_temperature=@dew_temperature&monthly_dp01=@monthly_dp01&monthly_dp10=@monthly_dp10&monthly_emxp=@monthly_emxp&monthly_prcp=@monthly_prcp&reading_hour=@reading_hour

As you can see, we are passing the predictors names and their values as separated parameters (param_name=@param_value).

By prefixing the parameter value with a @ we are specifying that the value we are passing is not constant, but instead must be dynamically generate either from a user or from another view. In order to let Denodo probe the web service we need to enter the probe values. Based on the web service response Denodo will create the structure of the base view.

If you now click on Test Connection, you will be prompted to specify the probe values for all the predictors. You can enter the values listed in this table:

primary_use Education
square_feet 1000
building_age 10
dew_temperature 10.0
monthly_dp01 10.0
monthly_dp10 10.0
monthly_emxp 10.0
monthly_prcp 10.0
reading_hour 0.0

Then Save.

Go on with creating a base view on ds_ml_prediction_service_v1.

Open the data source, if it is not already open, then click on CREATE BASE VIEW.

Enter the same probe values as above for the parameters. The view is created. Click on EDIT to change:

  • The name of the view to bv_ml_prediction_service_v1.
  • The folder to /01_connectivity/02_base_views/prediction.

Finally try executing the view with the same parameters values specified above.

The data service preparation goes on now with the creation of a selection view on meter_reading_final. The goal is to prepare all the predictor fields.

So let's select the folder /02_integration/prediction/staging/v1, then click on , then New->Selection. Drag and drop the view iv_prediction_service_ondemand_v1 in the work area, then:

  1. In Where Conditions, enter input_site_id=0 and meter=1.
  2. In the Output menu, select all the fields but timestamp_in, building_id, meter, site_id and click on Remove.
  3. Create one field per predictor using the COALESCE function:
    COALESCE( field_name, 'null' )

    This is needed because we need to join this view with the view defined over the ML Web service, to dynamically pass all the values that the ML Web Service needs to emit the prediction. The Denodo optimizer will push down the tree view a not null condition on each field that enters the join conditions. While this is the expected behaviour in normal situations, this is not what we want in this case: the web service must be called even if some predictor value is missing. The missing value, if present, will be properly handled by the ML Web Service.

  4. Set building_id and timestamp_in as primary keys by clicking on the corresponding key icon (). The column meter can be removed from the primary keys definition as we filtered on it.
  5. Change the view name to iv_prediction_service_ondemand_v1.

To test your view you can run the following query:

SELECT *
FROM dstutorial.iv_prediction_service_ondemand_v1
WHERE CAST('localdate',timestamp_in) ='2017-01-01'
AND building_id = 7

It should return 24 rows and 13 columns in about 80 seconds.

Now, we have to join the views iv_prediction_service_ondemand_v1 and bv_ml_prediction_service on all the predictor fields.

So let’s select the folder /02_integration/prediction/, then click on , then New->Join. Drag and drop the view iv_prediction_service_ondemand_v1 and bv_ml_prediction_service in the work area, then:

  1. Click on for view iv_prediction_service_ondemand_v1 and drag the generated arrow on top of bv_ml_prediction_service.
  2. In the bottom menu, link all the predictor fields from both views.
  3. In the Output tab, remove all the predictor fields and rename the view as iv_prediction_service_ondemand_v1_final.
  4. Save.

Test the view with the following query iv_prediction_service_ondemand_v1_final:

SELECT *
FROM dstutorial.iv_prediction_service_ondemand_v1_final
WHERE building_id = 7
AND timestamp_in = TIMESTAMP '2017-08-11 00:00:00'

This query must return a single row in a few seconds, with a non null value for predicted_meter_reading.

We test this query as it is of the same type as the queries that will be generated by our final data service, that we are going to build and deploy now.

Let us now design our building energy consumption service:

  • It will have two mandatory parameters building_id and timestamp_in and give back to the requester the prediction (predicted_meter_reading) for the combination of those variables.
  • The output will be JSON-formatted.
  • It will request authentication with user and password mlservice_usr and mlservice_usr respectively.

To create it, follow these instructions:

  • Select folder 05_data_services, click on , then New->Data service->REST Web Service.
  • Enter the name meterreadingpredondemand_v1 then drag and drop iv_prediction_service_ondemand_v1_final in the Resources table
  • Expand the columns of iv_prediction_service_ondemand_v1_final by clicking on then:
  • Remove meter and site_id by clicking on .
  • Replicate the following setup for the parameters:
  • Switch to the Settings tab, set the default representation to JSON set the Authentication to HTTP Basic and enter mlservice_usr as login and password.
  • It is recommended to describe the instructions to call the service as well as validity conditions in the description. For example, you can copy and paste the following text in the Description field, Metadata panel:
This is the Building Energy Consumption Prediction Service.
The predictions are valid for the Chilled Water meter reading at site University of Central Florida, Orlando, FL.
Input parameters:
- timestamp_in: with format YYYY-MM-DDTHH:MM:DD, it must be in the range 2017-01-01 and 2018-12-31 both included
- building_id: must be in range between 0 and 104 both included
  • Save.
  • Click on , confirm the password for dstutorial_usr. Notice that while the requester will need to authenticate with user mlservice_usr, Denodo will run the corresponding query with user dstutorial_usr.

The Prediction Web Service is now deployed and ready to be consumed by end applications. You can see and change the deployment status of any Denodo REST Web Service by using the Web Service Container menu accessible from the Tools menu, located at the top-right of the screen.

To test the REST Web Service, you can query it with your preferred client, for example the browser.

Here an example:

http://denodo.dstutorial.com:9090/server/dstutorial/meterreadingpredondemand_v1/views/iv_prediction_service_ondemand_v1_final?building_id=7&timestamp_in=2017-01-20T10:00:00

As an exercise, you may want to build another REST data service that instead of the timestamp would require the parameter date, thus returning all the predictions for a given building and meter, for a whole day. To start with, you should define a new field in view predicted_meter_reading_ondemand_v1_final, that builds the field date from the field timestamp_in.