You can translate the document:

Introduction

This document outlines the procedures for establishing a connection between the Denodo Platform Virtual DataPort (VDP) Server and Google Vertex AI. Vertex AI is a machine learning (ML) engine service offered by Google's Cloud Platform (GCP) used for training and deploying ML models and AI applications. The guide elaborates on connecting Denodo VDP to Vertex AI through Python code utilizing libraries such as the Denodo Dialect for SQLAlchemy and Jaydebeapi. Additionally, reference is made to the Knowledge Base article titled How to connect to Denodo from Python - a starter for Data Scientists, which presents alternative examples involving different Python libraries.

Overview

This section provides a concise overview of the process. Step 1, creating a Notebook instance in Vertex workbench, is obligatory. After its completion, Step 2 is the recommended method for connecting to Denodo, while Step 3 serves as an alternative to Step 2.

  1. Create/access a Notebook instance in the Workbench of Vertex AI.
  2. Using Denodo Dialect for SQLAlchemy (recommended)
  1. Install the required libraries.
  2. Run the Python code to establish the connection to the Denodo VDP server.
  1. Using Jaydebeapi (alternative)
  1. Upload the Denodo JDBC driver into the instance.
  2. Install the required libraries.
  3. Run the Python code to establish the connection to the Denodo VDP server.

Create a Notebook Instance in Workbench

Vertex AI Workbench is a Jupyter Notebook-based development environment. GCP allows you to run Jupyter Notebook on a Compute Engine (CE) virtual machine with your desired configuration. Proceed with the following steps to create a new Notebook instance.

  • After you log into your GCP account, you will find “Vertex AI” under the Artificial Intelligence section of the Navigation Menu (triple bar symbol) found at the left top corner. This opens the dashboard, click on the “Workbench” option found under the tools listed on the left pane. Note that you need to enable the Notebooks API in the Google Cloud project to be able to manage Vertex AI Workbench resources in Google Cloud.
  • Click User-managed notebooks. A user-managed Notebook instance is a Deep Learning virtual machine instance with the latest machine learning and data science libraries preinstalled.
  • Click New notebook, and then select Python 3.
  • The New notebook window appears.

  • In the New notebook window, enter a Notebook name. For example, my-instance.
  • Choose the Region and Zone where you wish to run your resources.
  • The default properties of the Notebook will be displayed. Keep the settings as default and click Create. If you wish to customize the default values, then click on the pencil icon to change the values as per your needs.
  • Once you click on Create, Vertex AI Workbench creates and automatically starts the instance. When the instance is ready to use, Vertex AI Workbench activates an Open JupyterLab link. Click the Open JupyterLab link found next to your user-managed notebooks instance's name.

Using the Denodo Dialect for SQLAlchemy

This document primarily concentrates on connecting to the Denodo VDP server from Python using the Denodo Dialect for SQLAlchemy. We recommend this library for the following reasons:

,

  • You do not need to install Denodo’s JDBC or ODBC driver in the target machine and you do not rely on an external DSN.
  • The higher-level set of functionality it offers and the fact that it makes use of psycopg2 underneath.
  • The SQLAlchemy option also makes it easy to work with pandas data frames.

Install the required libraries

The documentation Denodo Dialect for SQLAlchemy - User Manual can be referred for detailed information on the installation and execution of the Denodo Dialect for SQLAlchemy. Here, we are installing “Denodo Dialect for SQLAlchemy” from Python Package Index(PyPI). In order to do that, navigate to File > New > Terminal and run the following command:

pip install --upgrade denodo-sqlalchemy

Using the Terminal option is the preferred way because the library will be persisted on notebook restart. Other options like running the pip install command inside the notebook itself will require the library to be reinstalled when the notebook is stopped/restarted. Once installed, you can run a pip list command that retrieves the list of libraries installed on the instance to confirm if the library is installed.

Execute the code

The next step is to create a new Notebook by navigating to File > New > Notebook, or from the Launcher directly. Choose “python3” as the kernel of the Notebook.

In one of the cells, use Python code similar to below for establishing a successful connection to Denodo. In this example, a connection to the Denodo VDP server is established, a query is executed over the “bv_students” view, and its results are displayed.

## Importing the main library used to connect to Denodo via sqlalchemy

import sqlalchemy as dbdriver

## Connection parameters to the Denodo VDP Server

denodoserver_name = "<HOSTNAME>"

## Default port for ODBC connections

denodoserver_odbc_port = "9996"

denodoserver_database = "admin"

denodoserver_uid = "<USERNAME>"

denodoserver_pwd = "<PASSWORD>"

## Establishing a connection

cnxn_str = "denodo://%s:%s@%s:%s/%s" %\

       (denodoserver_uid, denodoserver_pwd, denodoserver_name, denodoserver_odbc_port, denodoserver_database)

engine=dbdriver.create_engine(cnxn_str)

## Query to be sent to the Denodo VDP Server

query = "select * from bv_students"

result_set=engine.execute(query)

## Finally fetch the results. `results` is a list of lists.

## If you don't want to load all the records in memory, you may

## want to use cur.fetchone() or cur.fetchmany()

results = result_set.fetchall()

print(results)

Before executing the code, replace the placeholders <HOSTNAME>, <USERNAME>, <PASSWORD> fields accordingly with their respective values. Following a similar approach like above, the data from various views can be fetched from the Denodo Platform and can be used as the source data to train AutoML and custom models on Vertex AI based on your use cases.

Using Jaydebeapi

When using the Jaydebeapi library to establish a successful connection to Denodo, it is necessary to place the Denodo JDBC driver in the machine where the Python interpreter is present. It's important to acknowledge that Jaydebeapi may exhibit performance concerns in the Python-Denodo Configuration, making it a less recommended option unless your specific scenario necessitates its use.

Upload the Denodo JDBC driver

From the JupyterLab page, click on the Upload Files button to upload Denodo’s JDBC driver. Note that the Denodo’s JDBC client driver can be found in the directory <DENODO_HOME>/tools/client-drivers/jdbc/denodo-vdp-jdbcdriver.jar of the Denodo Platform installation or it can be downloaded from the Denodo Community site.

Install the required libraries

In this document, we are going to use Jaydebeapi for connecting to the Denodo VDP server from Python. Therefore, we need to install the Jaydebeapi Python library. In order to do that, navigate to File > New > Terminal and run the following command:

pip install jaydebeapi

Using the Terminal option is the preferred way because the library will be persisted on notebook restart. Other options like running the pip install command inside the notebook itself will require the library to be reinstalled when the notebook is stopped/restarted. Once installed, you can run a pip list command that retrieves the list of libraries installed on the instance to confirm if the library is installed.

If you encounter a JVMNotFoundException error, then it is also necessary to install Java in order to use the JayDeBeApi. To install Java, execute a command similar to below in the same terminal.

sudo apt-get install default-jdk -y

Execute the code

Once the JAR file has been uploaded and the required libraries are installed, create a new Notebook by navigating to File > New > Notebook, from the Launcher directly or opening an existing Python pre-installed notebook. Choose “python3” as the kernel of the Notebook.

In one of the cells, use Python code similar to below to connect to Denodo. In this example, a connection to the Denodo VDP server is established, a query is executed over the “bv_cc_agent” view that is present in the “denodo_training” database and its results are displayed.

#Sample python code for connecting to Denodo using Jaydebeapi

import jaydebeapi as dbdriver

## Importing the gethostname function from socket to

## put the hostname in the useragent variable

from socket import gethostname

# Connection parameters of the Denodo Server that we are connecting to

denodoserver_name = "<hostname>"

denodoserver_jdbc_port = "9999"

denodoserver_database = "admin"

denodoserver_uid = "<username>"

denodoserver_pwd = "<password>"

denododriver_path = "/home/jupyter/denodo-vdp-jdbcdriver.jar"

client_hostname = gethostname()

useragent = "%s-%s" % (dbdriver.__name__,client_hostname)

conn_uri = "jdbc:vdb://%s:%s/%s?userAgent=%s" % (denodoserver_name,denodoserver_jdbc_port,denodoserver_database,useragent)

                                                                                           

cnxn = dbdriver.connect("com.denodo.vdp.jdbc.Driver",conn_uri,driver_args = {"user": denodoserver_uid,"password": denodoserver_pwd},

jars = denododriver_path)

query = "SELECT * FROM denodo_training.bv_cc_agent LIMIT 5"

## Define a cursor and execute the results

cur = cnxn.cursor()

cur.execute(query)

## Finally fetch the results. `results` is a list of tuples,

## If you don't want to load all the records in memory,

## you may want to use cur.fetchone() or cur.fetchmany()

results = cur.fetchall()

print(results)

##To close the cursor after execution to ensure connections are closed

cur.close()

Replace the placeholders <hostname>, <username>, and <password> fields accordingly with their respective values. Notice that the path of the JDBC driver has been given as “/home/jupyter/denodo-vdp-jdbcdriver.jar”. This is the default location where the Google Vertex AI stores the uploaded files to the Notebook instance. If you had uploaded it to a custom location, you shall navigate to that path using cd command and its complete path can be found by executing the command pwd in the Terminal of the instance. Similarly, the data from various views can be fetched from the Denodo Platform and can be used as the source data to train AutoML and custom models on Vertex AI based on your use cases.

References

How to connect to Denodo from Python - a starter for Data Scientists

Introduction to Vertex AI

Denodo in Data Science and Machine Learning Projects

Disclaimer
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.

Questions

Ask a question

You must sign in to ask a question. If you do not have an account, you can register here