Introduction
This document explains how to integrate Twitter with Denodo and how to use sentiment analysis libraries to classify tweets according to information contained in the text of the tweets.
Sentiment analysis refers to the use of natural language processing techniques to extract subjective information from texts.
These techniques can be used to determine the attitude of a speaker regarding some topic or the overall contextual polarity of a text.
There are multiple libraries available to perform sentiment analysis in Java such as Gate, OpenNLP, Lingpipe, among others.
In this article we will use LingPipe as an example for several reasons but any other Java library could be easily integrated with Denodo:
- LingPipe is open-source (not free for commercial purposes).
- It implements many of the most popular POS (Part-Of-Speech) tagging, NER (Named Entity Recognition) and classification algorithms.
- LingPipe will just classify the tweets into positive, negative or neutral.
An easy way to apply sentiment analysis within Denodo is by creating a custom function that, using a sentiment analysis library, returns the results coming from the library when a specific text is passed as input.
In this article we will explain how to create a custom function to classify tweets coming from a JSON data source using the Twitter API.
Creating a custom function
The library of our choice, LingPipe, uses a classifier to determine the sentiment that an input text transmits (positive, negative or neutral). This classifier needs to be trained before it is used. Besides, the custom function should use a trained classifier for performance reasons, it would be a bad idea to train the classifier for each function call.
Since training a classifier is out of the scope of this article, we are going to use a pre trained one that you can find here. This classifier has been trained using random tweets written in English. It is capable to distinguish between positive, negative and neutral tweets with an accuracy of approximately 75%.
Taking into consideration these preconditions, we are going to create a custom function using Eclipse and the Denodo4e plugin available with any Denodo Platform installation.
NOTE: While developing custom functions with Denodo Platform 8.0, please use the latest version of Eclipse IDE (we have tested with version 2020-03) and use JDK 11 as Denodo platform 8.0 supports Java 11.
To install the Denodo4e plugin, review the steps here.
Once the Denodo4e plugin is installed in Eclipse, to create the Denodo Custom function project, go to File > New > Other and type denodo in the textbox.
Select “Denodo Extension Project” and click “Next”. Then select a project name (e.g. twitterclassifier) select the desired JRE and click “Next”.
The final step is defining the type of Denodo extension:
- Browse to the Denodo Platform Home Directory.
- Select “Denodo VDP/ITP Server” as Denodo application.
- Select “Denodo VDP Custom Function” as Extension.
- Select a Java Package name: e.g. “com.denodo.vdp”
- Finally, select the Name: e.g. “TwitterClassifier” and click “Finish”.
This will create a project with a template of a VDP custom function. The template can be adapted to our needs, in this case to use a sentiment analysis library. To be able to use this library we will have to add it to the project
To do that, right click on the project and go to “Build path > Configure Build Path”. Select the “Libraries” tab and click on “Add External JARs” to include the LingPipe dependencies.
For this example, we need to import the following libraries jar files from the LingPipe distribution:
- lingpipe-4.1.0.jar
- log4j.jar
Click “Ok” to include the libraries in the build path.
Back to the code of the custom function we will need define a constructor in our class to load the classifier as follows:
public TwitterClassifier() { private LMClassifier classiffier; private static final String CLASSIFIER_FILE_LOCATION = “<DENODO_HOME>/conf/”; private static final Logger logger = Logger.getLogger(TwitterClassifier.class); super(); try { this.classiffier = (LMClassifier) AbstractExternalizable.readObject( new File(CLASSIFIER_FILE_LOCATION)); } catch (Exception e) { Logger.error("Error in TWITTERSENTIMENT: " + e.getMessage()); } } |
Note that we have used the constant:
private static final String CLASSIFIER_FILE_LOCATION |
This constant is used to define the location of the file that represents the trained classifier.
Note: The file has to be copied to the <DENODO_HOME>/conf folder or in a subfolder.
The implementation of the actual code of the function to classify an input is as simple as an invocation of a method of the library:
@CustomExecutor public String execute (String text) {
ConditionalClassification classification = classiffier.classify(text); return classification.bestCategory(); } |
You can find more information about how to implement a VDP custom function in the Virtual DataPort Developer Guide, section “DEVELOPING CUSTOM FUNCTIONS”.
Once we have developed the custom function it can be deployed and tested from VDP.
To do that, click on the right side of the Denodo icon in the Eclipse toolbar to open the menu.
Then click on “Denodo4E Configurations”.
Now click on “Denodo Application” and press “New” to create a new Denodo Configuration.
Follow these steps:
- Set a name for the configuration, e.g. “TwitterClassifier”
- Select Denodo VDP/ITP Server as Application.
- Add the project to debug.
- Click on “Apply” and “Debug”.
After clicking on “Debug” your VDP server will start running.
The last step is to deploy the custom function to the VDP Server. Go to the Project Explorer in Eclipse, right click on the Denodo project and select “Deploy Extensions”.
Select the host, port, database, user and password for the VDP server and set a name for the extension containing the custom function. Click on “Finish” to deploy the custom function.
Connecting Twitter and the Denodo Platform
As you can see in the custom function implementation the classifier expects the text of a tweet to classify the tweet into positive, negative or neutral. To provide this text to the function we will integrate Twitter into Denodo as a JSON data source.
The Twitter REST API provides several resources for listing and querying Twitter data. Denodo Virtual DataPort can use this API through the builtin JSON data source. In addition to that the Twitter API requires the use of OAuth authentication.
Twitter Oauth Configuration
This step requires a Twitter account which will be used to access Twitter data. Once the account is created register an application using the Twitter account. To do so, go to https://apps.twitter.com and click on “Create new App”.
Fill out the following information:
- Name: A name for the application/project that will access twitter
- Description: Any descriptive information
- Website: Any valid website.
- Callback URL: Leave this blank.
Click on “Create your Twitter application” and go to the “Keys and Access Tokens” tab to get the authentication tokens that will be needed to configure the Twitter access on the Denodo side.
JSON data source
Once the application is registered in Twitter we can use the Oauth tokens to integrate the Twitter API into Denodo.
Using the Virtual DataPort Administration Tool connect to the database where you want to configure this integration and create a new JSON data source (New > Data source > JSON).
Name the data source, for instance ‘ds_twitter’, and select ‘HTTP Client’ as your Data Route. Click on the ‘Configure’ button:
In the configuration dialog, enter the Twitter REST API URL. In our example we are going to use the search method with the interpolation variables ‘query’ and ‘pagesize’ that will be passed as the ‘q’ and ‘count’ parameters of the API method. These variables will be mandatory parameters in the resulting base view:
https://api.twitter.com/1.1/search/tweets.json?q=@query&lang=en&count=@pagesize
In the ‘Authentication’ tab, choose the ‘OAuth 1.0a’ authentication method from the drop-down list, and enter the OAuth information generated in the previous step. The fields ‘Client Identifier’ and ‘Client Shared Secret’ correspond to the ‘Consumer Key’ and ‘Consumer Secret’ fields in Twitter, respectively:
Click ‘Ok’ and enter sample values for the ‘query’ (e.g “Denodo”) and ‘pagesize’ (e.g 20) parameters. These values are used to test the configured URL to make sure it is valid. Click ‘Ok’ again to create the data source.
In the new data source, click on the ‘Create Base View’ button. Again, enter sample values for the ‘query’ and ‘pagesize’ parameters. These will be used to introspect the data structures that represent the returned JSON data in the Twitter response. Click ‘Ok’ to confirm the creation of the base view. You should see the schema of the new view and you will be able to be able to execute queries against this view.
Click ‘OK’ to save the view and you should now be able to execute queries against this view.
As the structure returned by Twitter is hierarchical we can use the flatten operation to create a new view and simplify the output schema keeping only the relevant information.
Once we have a view with the desired fields we can add a new field to the view that applies the “twitterclassifier” custom function created before to the text of the tweets that we are retrieving from Twitter.
Click on edit, go to the “Output” tab and click on “New Field”.
Name the field “sentiment” and use as field expression “twitterclassifier(text)”, where “text” is the text of the tweet.
Click “Ok” twice to complete the view edition. If you run a query on this view searching for “Denodo” with a page size equals to 20.
You will obtain a classification into positive, negative or neutral according to the sentiment analysis library for the 20 tweets obtained from the Twitter API.
References
Virtual DataPort Developer Guide: section “DEVELOPING CUSTOM FUNCTIONS”
Virtual DataPort Developer Guide: section “Oauth Authentication”
Virtual DataPort Administration Guide: section “JSON Sources”
LingPipe documentation: http://alias-i.com/lingpipe/
Trained classifier for LingPipe: http://processingdeveloper.altervista.org/classifier.txt