You can translate the document:

Content

This document explains the steps involved in connecting to Denodo Virtual Dataport from AWS SageMaker.

Amazon SageMaker is a fully managed service that provides developers and data scientists with the ability to prepare, build, train, and deploy machine learning (ML) models quickly.

Creating a Notebook Instance

A notebook instance is a machine learning Amazon EC2 compute instance that runs the Jupyter Notebook App. Notebook instances are used to create and manage Jupyter notebooks for preprocessing data and to train and deploy machine learning models.

As a first step we are going to create a notebook instance.

  • Navigate to Amazon SageMaker > Notebook Instances.
  • Click on Create notebook instance.

  • Name the instance, for example DenodoInstance.
  • Choose the required instance type from the available options, for this example, an ml.t2.medium instance is used.

  • You may leave the Additional configuration section options as it is as they are optional.
  • Next, configure the Permissions and encryption for the notebook instance.
  • Choose the IAM Role that has permissions to access other services like S3, EC2 etc. Note that you can create a role or let SageMaker create one for you with the AmazonSageMakerFullAccess IAM policy attached.

  • Based on the requirement, we may decide whether to give root access for the users accessing the notebook.
  • Scroll down to configure Network options. SageMaker allows you to configure the instance within a VPC. This will be useful in case the Notebook instance needs to be accessed only within the VPC. Note that if the VPC is configured to access the internet, then the instance will inherit it. For this example, we choose No VPC and SageMaker will provide internet access directly to the instance.

  • The Git repositories section allows the instance to start Jupyter in the given repository. Leave this section blank.
  • Click on Create Notebook Instance to start the instance creation.

Access Jupyter / JupyterLab

Amazon SageMaker creates Jupyter Notebook instances from which we can create notebooks and store them. AWS offers several pre-built notebooks for python libraries for AI/ML workloads.  For this example, we will create a simple notebook and install the required Python libraries for establishing a connection to Denodo.

This document explains the list of required libraries required and we must install them by accessing the Terminal of the instance. We do have an option to install the libraries from the notebooks, but the Terminal option offers more control for the users and offers persistence.

From the Jupyter page, click on the Upload button to upload Denodo’s JDBC driver.

Once the JAR file has been uploaded, create a new Notebook by clicking “File > New > Notebook, from Launcher directly or choose conda_python3 for running a python pre-installed notebook.

Install jaydebeapi

Once a notebook instance has been created, the next step is to install the jaydebeapi python library for establishing a connection to Denodo.

To do so, navigate to File > New > Terminal and run the following command:

pip install jaydebeapi

Using the Terminal option, the library will be persisted on restart whereas other options will lose the library if the notebook is stopped/restarted. Once installed, you can run a pip list command that retrieves the list of libraries installed on the instance.

Execute the code

On the next cell, use the python code as mentioned below to connect to Denodo and list the results of querying a view (bv_series).

import jaydebeapi as dbdriver

## Importing the gethostname function from socket to

## put the hostname in the useragent variable

from socket import gethostname

# Connection parameters of the Denodo Server that we are connecting to

denodoserver_name = "<hostname>"

denodoserver_jdbc_port = "9999"

denodoserver_database = "admin"

denodoserver_uid = "<username>"

denodoserver_pwd = "<password>"

denododriver_path = "/home/ec2-user/SageMaker/denodo-vdp-jdbcdriver.jar"

client_hostname = gethostname()

useragent = "%s-%s" % (dbdriver.__name__,client_hostname)

conn_uri = "jdbc:vdb://%s:%s/%s?userAgent=%s" % (denodoserver_name,denodoserver_jdbc_port,denodoserver_database,useragent)

                                                                                           

cnxn = dbdriver.connect("com.denodo.vdp.jdbc.Driver",conn_uri,driver_args = {"user": denodoserver_uid,"password": denodoserver_pwd},

jars = denododriver_path)

query = "select * from bv_series"

## Define a cursor and execute the results

cur = cnxn.cursor()

cur.execute(query)

## Finally fetch the results. `results` is a list of tuples,

## If you don't want to load all the records in memory,

## you may want to use cur.fetchone() or cur.fetchmany()

results = cur.fetchall()

print(results)

##To close the cursor after execution to ensure connections are closed

cur.close()

Replace the placeholders for denodoserver_name, denodoserver_uid, denodoserver_pwd fields accordingly. Notice the path of the JDBC driver has been given as “/home/ec2-user/SageMaker/denodo-vdp-jdbcdriver.jar”. The location where the SageMaker stores the files uploaded to the Notebook instance can be found using the Terminal of the instance, DenodoInstance in this example.

After executing the cell, the print command is used to fetch the results of the executed query. Note that we are using a fetchall() cursor to store all the results in memory.

Using Denodo Dialect for SQLAlchemy

SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. With the Denodo dialect for SQLAlchemy we can connect with Denodo databases and query data from Denodo views. For more information, refer to Denodo Dialect for SQLAlchemy - User Manual

Installation and Execution

This dialect requires the following Python modules:

  • SQLAlchemy version >= 1.4, < 2.0.
  • psycopg2 version 2.7 or higher.

The dialect will be available from two different locations:

The recommended way to install the dialect is by using the PyPi repository.

Installation from PyPi

The sqlalchemy and psycopg2 dependencies do not need to be installed because when you install Denodo Dialect for SQLAlchemy both should be installed automatically. Alternatively, these packages can be manually installed by means of:

pip install sqlalchemy<2.0

pip install psycopg2

Important: note also that psycopg2 has its own requirements for installation which need to be satisfied: https://www.psycopg.org/docs/install.html#prerequisites

Installation from Denodo Support Site

The sqlalchemy and psycopg2 dependencies must be manually installed using:

pip install sqlalchemy<2.0

pip install psycopg2

Important: note also that psycopg2 has its own requirements for installation which need to be satisfied: https://www.psycopg.org/docs/install.html#prerequisites

The Denodo dialect for SQLAlchemy can be downloaded from the Denodo Support Site.

To install dialect just run:

        pip install route/to/denodo_sqlalchemy-20220419-py3-none-any.whl

For SageMaker, upload the .whl file using the SageMaker upload option,

Install the .whl file from a python cell

pip install /home/ec2-user/SageMaker/denodo_sqlalchemy-20240229-py3-none-any.whl

To connect to Denodo VDP Server with SQLAlchemy, the following URL pattern can be used:

denodo://<username>:<password>@<host>:<odbcport>/<database>

Execute the Denodo Dialect for SQLAlchemy

Install the iPython SQL extension

pip install ipython-sql==0.4.1

%load_ext sql

Note that the Denodo dialect does not support SQLAlchemy 2.0 yet so the latest version of ipython-sql cannot be installed.

Connect with a Denodo database using the SQLAlchemy URL pattern for Denodo dialect.

%sql denodo://admin:admin@<hostname>:9996/admin

Run queries to Denodo views.

%sql select * from bv_series

This way the views in Denodo Platform can be retrieved from AWS SageMaker and can also be used with various pre-built python libraries for Machine Learning workloads and AI use cases.

References

Using Notebooks for Data Science with Denodo

How to connect to Denodo from Python - a starter for Data Scientists

Denodo in Data Science and Machine Learning Projects

Disclaimer
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here