Goal
This document describes how to connect to R from Denodo Virtual DataPort.
Introduction
R is an open-source environment for statistical analysis. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible.
R can be used to connect to Denodo Virtual DataPort to retrieve data from underlying sources for statistical analysis. However, in this article we are going to focus on connecting to R from the Denodo Virtual DataPort Server to take advantage of R functions, libraries, packages, and even saved models. We will see how to dynamically invoke the R engine and pass values to R via the Rserve package that are then returned to Denodo Virtual DataPort.
Summary of the Process
To access the R functions from Denodo Virtual DataPort, the following prerequisites are needed:
- R server and client prerequisites.
- Creating a Denodo custom stored procedure to execute R functions from Denodo.
- Deploying the stored procedure into a Denodo Virtual DataPort Server.
- Creating the Stored Procedure in the Design Studio and executing it.
R server and client prerequisites
To access R from Denodo we will write in Java a Denodo Custom Stored Procedure. Before doing so we will have to:
- Install Rserve in the R server.
- Add the R Java client libraries.
Install Rserve in the R server
R provides collections of functions, sample data, and compiled codes as R packages. In the R environment, these default packages are stored under a directory called "library". In our scenario, we are going to install the “Rserve” package. Rserve is a TCP/IP server that allows other programs to use R. Every connection has a separate workspace and working directory.
R-1.5.0 or higher is required to be able to use Rserve. To install the R package, execute the following in the R command line or RStudio:
install.packages("Rserve") |
After installing the Rserve package, let's launch Rserve by executing the following commands in the R command-line tool or from R studio:
library(Rserve) Rserve() |
The above-mentioned commands know how to find Rserve, how to set up the environment, and how to start it, regardless of your platform.
Add the R Java client libraries
To communicate with R from a Java client, Rserve provides client jars that can be used inside a Java program. Hence after installing the Rserve package, download the three JAR files JRI.jar, REngine.jar, and JRIEngine.jar from http://www.rforge.net/JRI/files/ and the RserveEngine.jar from http://www.rforge.net/Rserve/files/.
JRI is a Java/R Interface, which allows running R inside Java applications as a single thread. REngine is a full client suite that allows any Java application to access Rserve.It provides automatic type translation for most objects such as int, double, arrays, string, or vector, and classes for special R objects such as RBool, RList, etc.
We will add these jars to the classpath of the Java client that we are going to create to be able to use R functions. In this document we will use Eclipse as the IDE as an example but it is not a requirement.
Creating a Denodo custom stored procedure to execute R functions from Denodo
NOTE: in this example we explain how to create a custom stored procedure to invoke a simple R function. A Denodo custom function could be used instead in this case. The use of a stored procedure will suit more complex scenarios like invoking data frames from R or cases where additional logic needs to be implemented on top of the information obtained from R.
Denodo Virtual DataPort provides an API to develop custom stored procedures in Java. The Denodo Platform provides examples of stored procedures and their source code. They are located in <DENODO_HOME>/samples/vdp/storedProcedures. The README file in this path contains instructions to compile and install them.
Now, let’s start creating the Java code to invoke an R function. We will use Eclipse in this example but any other tool can be used.
- Launch Eclipse, create a new Java project by navigating to File> New > Project.
- Specify the name for the Java project (r_project) and JavaSE-11 as JRE.
- To develop a stored procedure, add the following jar files to the CLASSPATH of your project:
<DENODO_HOME>/lib/vdp-server-core/denodo-vdp-server.jar
<DENODO_HOME>/lib/vdp-client-core/denodo-vdp-parser.jar
You may want to add the libraries Apache Commons Lang library (<DENODO_HOME>/lib/contrib/commons-lang.jar) and Apache Commons IO (<DENODO_HOME>/lib/contrib/commons-io.jar) to the classpath of your project.
- Click Finish to create the project and now lets create a new package by right-clicking the project > New > Package.
- Specify the name of the package as com.denodo.vdp.storedprocedure and Click on “Finish” to create the package.
- Now lets create a new Java Class as GetProbability.java under the created package.
- Now, to communicate with R from Java we are going to import the downloaded client jars of R. To do that, right-click on the Java project and navigate to “Build Path” > “Configure Build Path...”
- Import the downloaded jar by clicking on the “Add External JARs” option in the Java Build Path and Click on “Apply”. On successful installation, you should see imported jars in the “Referenced Libraries” section of your Java Project.
Let’s implement the logic to invoke the R function. The classes and interfaces for developing stored procedures are located in the package com.denodo.vdb.engine.storedprocedure. This section describes briefly the use of its main classes. See the Javadoc documentation of the API for further details on these classes and their methods.
We are going to import the required classes from package com.denodo.vdb.engine.storedprocedure and from org.rosuda.REngine to create the R function.
- To create a stored procedure, the created Java class needs to extend com.denodo.vdb.engine.storedprocedure.AbstractStoredProcedure. Upon extending this class, it will include all the required methods that need to be implemented.
- In this example we are going to invoke the “pnorm” R function. The pnorm function is also known as the "Cumulative Distribution Function". This function calculates the probability of a normally distributed random number, which is less than the value of a given number. To establish the connection R, we need to use RConnection class that provides a TCP/IP connection to a R server:
RConnection c = new RConnection(RSERVE_HOST, RSERVE_PORT); |
- Here is the logic implemented for invoking “pnorm” function:
package com.denodo.vdp.storedprocedure; import java.sql.ResultSet; import java.sql.Types; import org.rosuda.REngine.REXP; import org.rosuda.REngine.Rserve.RConnection; import org.rosuda.REngine.Rserve.RserveException; import com.denodo.vdb.engine.storedprocedure.AbstractStoredProcedure; import com.denodo.vdb.engine.storedprocedure.DatabaseEnvironment; import com.denodo.vdb.engine.storedprocedure.StoredProcedureException; import com.denodo.vdb.engine.storedprocedure.StoredProcedureParameter; public class GetProbability extends AbstractStoredProcedure {
private DatabaseEnvironment environment;
public GetProbability() {} public void initialize(DatabaseEnvironment theEnvironment) { super.initialize(theEnvironment); this.environment = theEnvironment; } protected void doCall(Object[] inputValues) throws StoredProcedureException { String input = (String) inputValues[0]; //Gets the input parameter String query = "pnorm("+input+")";//Constructs R function call String value = null; ResultSet rs = null;
try { RConnection c = new RConnection("127.0.0.1", 6311); //Connection to R REXP z = c.eval(query);//R function call value = z.asString();//Returned value is a string if(c.isConnected()) { c.close(); } } catch (RserveException e) { getEnvironment().log(LOG_ERROR, e.getMessage()); } catch (Exception e) { getEnvironment().log(LOG_ERROR, e.getMessage()); } try { getProcedureResultSet().addRow(new Object[] {value}); } catch (Exception e) { getEnvironment().log(LOG_ERROR, e.getMessage()); throw new StoredProcedureException(e.getMessage()); } } public String getName() { return GetProbability.class.getName(); } public String getDescription() { return "Returns the probability of a normally distributed random number"; } public int getNumOfAffectedRows() { return 1; } public StoredProcedureParameter[] getParameters() { return new StoredProcedureParameter[] { new StoredProcedureParameter("query", Types.VARCHAR, StoredProcedureParameter.DIRECTION_IN), new StoredProcedureParameter("value", Types.VARCHAR, StoredProcedureParameter.DIRECTION_OUT) }; } } |
- The above code establishes the connection to R Server, executes the R function and then the result is stored in a special object of the class REXP. This class encapsulates any objects received or sent to Rserve. If the type of the returned objects is known in advance, accessor methods can be called to obtain the Java object corresponding to the R value, in our case a regular String.
- We are also using the method public StoredProcedureParameter[] getParameters(). This method defines the input and output parameters of the Stored Procedure.
- If any error occurs, it will be written to the Virtual DataPort log file by invoking the log method of the environment object available in the stored procedure.
Deploying the stored procedure into a Denodo Virtual DataPort Server
We have implemented the logic for our R function in our java code. To utilize this, it has to be deployed into Virtual DataPort. To do this we will create a jar library that can be added to Virtual DataPort.
- Right click on the project r_project and click on “Export” to export it as a jar file.
- Click on “JAR file” to export this project as a jar file to the desired folder location.
- Import this jar into the Virtual DataPort Server. From your Virtual DataPort Administration tool, navigate to File > Extension Management and import the extension which we created.
- The libraries JRI.jar, REngine.jar, JRIEngine.jar and RserveEngine.jar also need to be added for the successful execution of deployed stored procedures.
Creating the Stored Procedure in the Design Studio and executing it
As we have deployed the extension successfully, let’s create the Stored Procedure in the Design Studio, to make use of it.
- To create the Stored Procedure, launch the Design Studio and navigate to the menu File > New > Stored Procedure > Java stored procedure, you can also click on the desired database and navigate to the New > Stored Procedure > Java stored procedure option.
- The new stored procedure dialog will appear in the Design Studio. Provide a desired name for the stored procedure “sp_getprobability” and enable the Select jars option to select the jar file that contains the stored procedure.
- In the “Existing extensions” option select the deployed jar “GetProbability” along with the dependent jars such as JRI.jar, REngine.jar, JRIEngine.jar, and RserveEngine.jar.
- Then, click on “Save” to save the created stored procedure. To execute the stored procedure click on Execute and specify the value for the input parameter “query”.
- Click on “Execute” to run the stored procedure. On successful execution, the query results will be displayed.
- As we have retrieved the results as expected from R, let's utilize this stored procedure to calculate the probability in a real-time scenario. For this, we take a base view from Web service data source that contains “Product details” information and calculate the probability of loss for orders in case it gets canceled.
- To do that we will create a join view “order_loss_report” over the views, “bv_product” from the Web service data source, and the “sp_getprobability” stored procedure.
- On executing this view, it passes the values of unit price from the “bv_product” view to the “pnorm” R function to calculate the probability of loss of an order.
References
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.