Applies to:
Denodo 8.0
Last modified on: 27 Jan 2022
Tags:
Amazon SageMaker
Cloud
Jaydebeapi
ML
This document explains the steps involved in connecting to Denodo Virtual Dataport from AWS SageMaker.
Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to prepare, build, train, and deploy machine learning (ML) models quickly.
A notebook instance is a machine learning Amazon EC2 compute instance that runs the Jupyter Notebook App. Notebook instances are used to create and manage Jupyter notebooks for preprocessing data and to train and deploy machine learning models.
As a first step we are going to create a notebook instance.
Amazon SageMaker creates Jupyter Notebook instances from which we can create notebooks and store them. AWS offers several pre-built notebooks for python libraries for AI/ML workloads. For this example, we will create a simple notebook and install the required Python libraries for establishing a connection to Denodo.
This document explains the list of required libraries required and we must install them by accessing the Terminal of the instance. We do have an option to install the libraries from the notebooks, but the Terminal option offers more control for the users and offers persistence.
From the Jupyter page, click on the Upload button to upload Denodo’s JDBC driver.
Once the JAR file has been uploaded, create a new Notebook by clicking “File > New > Notebook”, from Launcher directly or choose conda_python3 for running a python pre-installed notebook.
Once a notebook instance has been created, the next step is to install the jaydebeapi python library for establishing a connection to Denodo.
To do so, navigate to File > New > Terminal and run the following command:
pip install jaydebeapi |
Using the Terminal option, the library will be persisted on restart whereas other options will lose the library if the notebook is stopped/restarted. Once installed, you can run a pip list command that retrieves the list of libraries installed on the instance.
On the next cell, use the python code as mentioned below to connect to Denodo and list the results of querying a view (bv_series).
import jaydebeapi as dbdriver ## Importing the gethostname function from socket to ## put the hostname in the useragent variable from socket import gethostname # Connection parameters of the Denodo Server that we are connecting to denodoserver_name = "<hostname>" denodoserver_jdbc_port = "9999" denodoserver_database = "admin" denodoserver_uid = "<username>" denodoserver_pwd = "<password>" denododriver_path = "/home/ec2-user/SageMaker/denodo-vdp-jdbcdriver.jar" client_hostname = gethostname() useragent = "%s-%s" % (dbdriver.__name__,client_hostname) conn_uri = "jdbc:vdb://%s:%s/%s?userAgent=%s" % (denodoserver_name,denodoserver_jdbc_port,denodoserver_database,useragent)
cnxn = dbdriver.connect("com.denodo.vdp.jdbc.Driver",conn_uri,driver_args = {"user": denodoserver_uid,"password": denodoserver_pwd}, jars = denododriver_path) query = "select * from bv_series" ## Define a cursor and execute the results cur = cnxn.cursor() cur.execute(query) ## Finally fetch the results. `results` is a list of tuples, ## If you don't want to load all the records in memory, ## you may want to use cur.fetchone() or cur.fetchmany() results = cur.fetchall() print(results) ##To close the cursor after execution to ensure connections are closed cur.close() |
Replace the placeholders for denodoserver_name, denodoserver_uid, denodoserver_pwd fields accordingly. Notice the path of the JDBC driver has been given as “/home/ec2-user/SageMaker/denodo-vdp-jdbcdriver.jar”. The location where the SageMaker stores the files uploaded to the Notebook instance can be found using the Terminal of the instance, DenodoInstance in this example.
After executing the cell, the print command is used to fetch the results of the executed query. Note that we are using a fetchall() cursor to store all the results in memory.
This way the views in Denodo Platform can be retrieved from AWS SageMaker and can also be used with various pre-built python libraries for Machine Learning workloads and AI use cases.
Using Notebooks for Data Science with Denodo
How to connect to Denodo from Python - a starter for Data Scientists
Denodo in Data Science and Machine Learning Projects