Introduction
This document outlines the procedures for establishing a connection between the Denodo Platform Virtual DataPort (VDP) Server and Google Vertex AI Enterprise Notebooks. Vertex AI is a machine learning (ML) engine service offered by Google's Cloud Platform (GCP) used for training and deploying ML models and AI applications.
Additionally, as of Denodo 9.1, VDP includes an Arrow FlightSQL interface for applications such as Vertex AI Notebooks that can leverage Apache Arrow. Some key benefits for users over legacy industry standard drivers are:
- Columnar Data Transfers.
- More Efficient Memory Usage.
- Parallel Data Transfers.
Although direct connections via the ADBC flight sql library are also fully supported, we will focus on using Denodo dialect for SQLAlchemy for this guide. Utilizing SQLAlchemy over the base driver gives users essential abstractions that simplify VQL syntax and execution.
For more information on how to use SQLAlchemy with the Arrow driver, please refer to our dedicated guides on building Python notebooks with Denodo.
The guide elaborates on connecting Denodo VDP to Vertex AI Notebooks through Python utilizing libraries such as the Denodo Dialect for SQLAlchemy. Additionally, reference is made to the Knowledge Base article titled How to connect to Denodo from Python - a starter for Data Scientists, which presents alternative examples involving different Python libraries.
Create a Notebook Instance in Workbench
Vertex AI Workbench is a Jupyter Notebook-based development environment. GCP allows you to run Jupyter Notebook on a Compute Engine (CE) virtual machine with your desired configuration. Proceed with the following steps to create a new Notebook instance.
- After you log into your GCP account, you will find “Vertex AI” under the Products section of the Navigation Menu (triple bar symbol) found at the left top corner. Within the Notebooks section of Vertex AI, click on the “Workbench” option found under the tools listed on the left pane. Note that you need to enable the Notebooks API in the Google Cloud project to be able to manage Vertex AI Workbench resources in Google Cloud.
- Click Instances
- Click New notebook, and then select Python 3.
- The New notebook window appears.
- In the New notebook window, enter a Notebook name. For example, my-instance.
- Choose the Region and Zone where you wish to run your resources.
- The default properties of the Notebook will be displayed. Keep the settings as default and click Create. If you wish to customize the default values, then click on the pencil icon to change the values as per your needs.
- Once you click on Create, Vertex AI Workbench creates and automatically starts the instance. When the instance is ready to use, Vertex AI Workbench activates an Open JupyterLab link. Click the Open JupyterLab link found next to your user-managed notebooks instance's name.
After provisioning is complete, select Open JupyterLab next to the instance name to launch the integrated development environment.
Connect Denodo to Google Vertex AI Notebooks
Once a notebook instance has been created, the next step is to install the denodo-sqlalchemy python library for establishing a connection to Denodo. The first step is opening a new terminal window within Jupyter Lab by going to File > New > Terminal
Once a new terminal is open, run the following command to install the dependencies :
-
pip install denodo-sqlalchemy[flightsql] |
Connecting to Denodo Flight SQL with SQLAlchemy
Once the required libraries are installed, create a new notebook by navigating to “File > New > Notebook”, from the Launcher directly or opening an existing Python pre-installed notebook. Choose “python3” as the kernel of the Notebook.
Before we can run queries against our Denodo from Python, we need to initialize a connection object. We’ll use the Denodo Dialect for Flight SQL driver library. Unlike traditional drivers, ADBC is designed for high-performance columnar data transfer, making it the recommended choice for machine learning and AI in a GCP Vertex environment.
To build a connection, we will first need to define a connection string, defining how the driver should locate Denodo. The connection string has the following format.
denodo+flightsql://<username>:<password>@<host>:<port[9994]>/<database> |
By default this will connect to the 9994 port in Denodo.
import sqlalchemy import denodo.sqlalchemy as denodo_sqlalchemy uri = denodo+flightsql://<username>:<password>@<host>:<port[9994]>/<database> engine = sqlalchemy.create_engine(uri) |
For more information on handling headers, tokens, and configuration properties for the driver go to our official documentation.
With the connection established, the next step is to execute a query and fetch the results. To execute a query and load the data into a Pandas DataFrame, insert the following code into your notebook:
sql_query = text("SELECT * FROM your_denodo_view LIMIT 10") with engine.connect() as connection: df = pd.read_sql(sql_query, connection) |
Executing VQL with Magic Cells
For a more interactive Workbook experience you can leverage the python library Jupysql to directly execute VQL queries within your notebook cells using the magic command %sql .
First, install the jupysql python package with the following command:
pip install jupysql |
Next we will want to load the extension
%load_ext sql |
Now we can attach our “engine” connection manager to the jupysql extension with the following cell
%sql engine |
Now you can execute queries to Denodo using the %sql cell identifier.
For more information on JupySQL, with tutorials including pandas integration multiline cell magic, etc… visit the Jupysql documentation.
Summary
By leveraging Denodo as the enterprise's semantic layer, Data Scientists on Google Vertex AI ensure that all feature engineering and model training is instantly grounded in the single, real-time source of truth. Denodo's integrated Arrow FlightSQL interface bypasses the latency of traditional data movement, enabling high-throughput, parallel, and memory-efficient columnar data transfers via the ADBC standard.
This solution guarantees that your workloads in the Vertex AI ecosystem are built on a virtualized, performant, and governance-compliant data fabric, allowing engineers to focus purely on model logic and deployment, free from the complexity of ETL and managing replicated data silos.
References
How to connect to Denodo from Python - a starter for Data Scientists
Denodo in Data Science and Machine Learning Projects
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.

