Denodo has always empowered organizations by transforming data accessibility and usability through the Denodo Platform. Now, we are advancing this mission by connecting data and AI with the Denodo AI SDK, which enables organizations to fully leverage their data for intelligent decision-making.
AI is only as powerful as the data that fuels it. However, making this data accessible to AI systems requires addressing critical challenges:
- Exploding Data Volumes: Organizations face an exponentially increasing wave of data from diverse and complex sources.
- Scattered Knowledge: Semantic understanding and knowledge of data is often dispersed across separate teams and departments.
- Security Puzzles: Varied data systems and data sharing introduce new security protocols and compliance requirements, leading to more difficulty in integrating disparate systems.
The Denodo Platform and AI SDK address these barriers directly. With our solutions, developers can avoid fragmented pipelines and security roadblocks, allowing them to focus on building cutting-edge AI technology that taps into the full breadth of organizational data and semantics—ensuring insights that are not only reliable but also actionable.
With Denodo, organizations gain:
- Unified Data Access: Avoid time-consuming data integrations and get direct, virtualized access to all organizational data. Utilize data storage and retrieval systems optimized for each use case.
- Scalable Security and Compliance: Ensure data access is strictly controlled and compliant, supporting accurate and manageable AI insights.
- Developer-Focused Tools: Streamline data engineering efforts, enabling developers to dedicate more time to building innovative AI capabilities instead of fighting to get the right data.
At Denodo, we're here to help push past data issues and into the AI powered future.
What will we be setting up?
Below is a simplified architecture diagram of the AI SDK and Denodo Platform:
Note the following components of the diagram:
- The data sources: these are sources of organizational data, and can be stored in many formats (not only the ones listed here).
- The AI model provider(s): these systems provide the computational power to support AI models, and allow AI applications to interact with them through simple API endpoints.
- The Denodo Platform: the data delivery system that unifies data retrieval and security through a single access interface. Also, the Denodo AI SDK sits between the data delivery and AI infrastructures and performs common text-to-SQL operations.
- The chatbot or AI application: these frontend systems sit in front of the data delivery and AI infrastructure systems, and implement the business logic of the system.
Usually, the chatbot or AI application would implement the logic required to integrate the data delivery and AI infrastructure systems; however, the Denodo AI SDK implements the text-to-SQL logic for the user, allowing them to focus on the business needs of the chatbot or AI application. In the following sections, we will set up the Denodo Platform to access sample CSV files that contain information about loans. We will connect the Denodo AI SDK to an AI infrastructure provider and review a sample chatbot leveraging the AI SDK. Later, this will be generalized to other sources of data.
AI Model Provider
The supported AI model providers can be found below, along with the models that performed best in testing:
Provider | Recommended LLM | Alternative | Embedding Model |
OpenAI | gpt-4o | gpt-4o-mini | text-embedding-3-large |
Bedrock | anthropic.claude-3-5-sonnet-20240620-v1:0 | anthropic.claude-3-haiku-20240307-v1:0 | titan-embed-text-v2 |
gemini-1.5-pro | gemini-1.5-flash | gecko | |
Mistral | mistral-large-latest | mistral-large-latest | mistral-embed |
Llama | llama-3.1-nemotron-70b-instruct | llama-3.1-70b-instruct | Depends on provider |
Denodo License
In this example, a Denodo Express license (from page two of this link to the Denodo Community Site) will be used, but an Enterprise Plus subscription license and distribution of Denodo can also be used.
Test Environment
The prerequisites for setting up the system itself will depend on the method used to install the Denodo Platform and AI SDK. In this guide, the following two methods will be covered:
- Using Docker containers, which pulls all the necessary components as containers and starts them in the Docker runtime.
- Using a local Python installation, installing the Denodo Platform locally and the chatbot application in a Python virtual environment.
Please note that it is only necessary to complete one of the above sections to install the testing environment. The next two sections of this tutorial will cover both types of installation.
Additional Notes
(At least this is not additional configuration) It is critical to note that:
- In order to increase the accuracy of the system, it is important to provide the AI model with as much semantic information as possible. Please make sure to add thorough descriptions to views and columns to provide this semantic information to the mode, and make sure to allow for sample results to be included wherever possible.
- Even with this, AI systems in general are never 100% accurate; however, the AI chatbot provides transparency to the queries executed and the thought process used to generate the response to help users validate the answers.
Setup
In order to start the Denodo Platform and AI SDK in Docker, it will be necessary to perform the following:
- Install Docker. This can be installed using Docker Desktop or the base Docker Engine.
- If using the Docker Engine installation method, install the Docker Compose plugin.
- Authenticate with Denodo's Harbor repository. Click on the authenticated username in the top right corner of the screen, and select "User Profile". Copy the CLI secret, and execute the following command:
docker login harbor.open.denodo.com
For the username, input the username seen earlier, and for the password use the CLI secret. - Clone the Denodo Community Lab Environment Docker Compose project, which contains a sample environment for the Denodo AI SDK and a sample Chatbot application, by running the following command:
git clone https://github.com/denodo/denodocommunity-lab-environment.git
We will use this project later on in these instructions.
To validate that everything is installed correctly:
- Run
docker --version
. This should output a version number; this guide was constructed using Docker engine version 27.2.1-rd, build cc0ee3e. - Run
docker compose version
to get the version number of the Docker Compose distribution. This guide used Docker Compose version v2.29.5. - Execute
docker pull harbor.open.denodo.com/denodo-express/denodo-platform:latest
If this does not start downloading the image, try step #3 from the previous setup. - Check that the downloaded Denodo Community Lab Environment project has the following file:
- The
denodocommunity-lab-environment/lab-environment-containers/build/docker-compose-sample-chatbot.yml
file. This file describes the deployment of the sample system and is referenced by thedocker compose
command. - The
denodocommunity-lab-environment/lab-environment-containers/build/.env
file. In this file, the necessary properties to set up the system are listed.
AI SDK Configuration
The next step is configuring the Denodo AI SDK container, so it can connect to the Denodo Platform and to the LLM provider.
Connection to Denodo Platform
The following default configuration for Denodo Platform works for this tutorial so, for a basic installation, you don't need to modify these parameters of the
.env
file. It launches a Denodo Virtual DataPort server, the Design Studio and the Denodo Data Catalog initialized with several views used for this tutorial.
Connection to the LLM Provider
After setting the connection with Denodo Platform, you need to configure the LLM provider and models that will be used. For that, it is necessary to provide the correct authentication information for the AI provider. More information about the available settings for each provider available in the Denodo AI SDK - User Manual).
For example, in order to connect the AI SDK to an OpenAI model provider, the following properties must be updated in the .env
file:
PROVIDER
. For example,OpenAI
.LLM_MODEL
, e.g.gpt-4o
.EMB_MODEL
, e.g.text-embedding-3-large
.OPENAI_API_KEY
. This is the key used to authenticate to OpenAI.OPENAI_ORG_ID
. This is the OpenAI organization ID used to authenticate to OpenAI (optional).
For other LLM providers, you have to fill the corresponding values in the .env
file.
In the case that Google is the provider, a JSON credentials file corresponding to credentials for a service account is needed. Add an additional line in the docker-compose-sample-chatbot.yml
file mounting this file from the host machine:
volumes:
- "${DENODO_CHATBOT_CONFIG_FILE}:/opt/ai-sdk/sample_chatbot/chatbot_config.env"
- "<HOST_PATH_TO_JSON>:/opt/ai-sdk/google_credentials.json"
And specify the environment variable referencing this file in the .env
file:GOOGLE_APPLICATION_CREDENTIALS=/opt/ai-sdk/google_credentials.json
Deployment
To ensure that the docker-compose-sample-chatbot.yml
file is correctly configured, we recommend executing the following command:
$ cd denodocommunity-lab-environment/lab-environment-containers/build/
$ docker compose -f docker-compose-sample-chatbot.yml config
This will throw errors if anything is missing, and the environment variables passed to the services can be reviewed in the output document.
At this point, the system is ready to be started. This can be done by executing the following command:
$ docker compose -f docker-compose-sample-chatbot.yml pull
$ docker compose -f docker-compose-sample-chatbot.yml up -d
The command will take some time downloading the images from the Harbor repository, and then wait for the services to come up in the correct order; this will take a few minutes. After waiting a few minutes, you will see this:
At this point, if everything is working, move to the What can I do with this? Section of this tutorial. Other useful information for this installation method added below:
Troubleshooting
To check which containers are running as part of the Docker Compose project, the following command can be executed:
$ docker ps
To include containers that have stopped running, include the "-a" option in the command:
$ docker ps -a
In the case that unexpected behavior occurs, the logs of the individual systems can be accessed by running the following command:
$ docker logs -f <container_name>
Where
is the name of the container identified in the previous command.
Cleanup
In order to delete the Docker Compose project, run the following command:
$ cd denodocommunity-lab-environment/lab-environment-containers/build/
$ docker compose -f docker-compose-sample-chatbot.yml down
If Docker should also be uninstalled, reference the associated documentation for uninstalling it based on the installation instructions used.
Setup
The Python method of setting up the Denodo Platform and AI SDK involves running the AI SDK in a Python virtual environment, and running the Denodo Platform on the local machine. This requires:
A running Denodo Express installation
- Download the installer from the Denodo Express 9 page. Note that if the target operating system is not Windows or Linux a Java 17 JDK must be installed to support this.
- Run the installer following the steps in the "Using a local installation of Denodo" section of the Installation and Bootstrapping page.
- Execute the Denodo Express binaries following the instructions in the "Launching Denodo in a Local Installation" section of the Installation and Bootstrapping tutorial.
- The Denodo Express installation includes the Denodo AI SDK in this folder:
/samples/ai-sdk
An installation of Python (make sure this is version 3.12), and some dependencies
- Some of the libraries that will be installed by
pip
require specific compilers to be available:
Windows: please make sure to install the "MSVC v143 - VS 2022 C++ x64/x86 build tools" library using the Microsoft C++ Build Tools. After downloading the installer the "Individual Components" tab can be selected to include just the "MSVC v143 - VS 2022 C++ x64/x86 build tools" library.
Linux: Not all C compilers support building the libraries. We recommend using a Linux distribution that includesglibc
libraries.
Additional issues may be resolved by reviewing the steps in the Troubleshooting guide from Chroma. - Install Python. Installers for Python 3.12 can be found in the Download Python page under "Looking for a specific release?". Choose the latest installer for Python 3.12. Add the Python binary to the PATH variable if possible, to avoid needing to specify the full path to the binary each time.
The installation can be validated by running the following in a terminal:
$ python --version
A Python virtual environment configured, with the correct libraries installed
- Open a terminal and move to the AI SDK install location (included in the Denodo Express or downloaded from GitHub, for this tutorial we will assume we are using Denodo Express):
$ cd <DENODO_INSTALL_PATH>/samples/ai-sdk
- Here, we will create the Python virtual environment:
$ python -m venv venv_denodo
- This will create a
venv_denodo
folder inside our AI SDK folder where all the specific dependencies for this project will be installed. The ENTER key may have to be pressed again to return the prompt to the user. - We now need to activate the virtual environment. Activating the virtual environment differs depending on the OS:
OS | Command |
Windows |
|
Linux |
|
- Install the dependencies with the following command. Specify the requirements file corresponding to the operating system of the machine:
$ python -m pip install adbc_driver_flightsql==1.3.0 adbc-driver-manager==1.3.0
$ python -m pip install -r requirements.txt
Connect to AI Infrastructure
In order to define the connection to the selected AI infrastructure, we will need to define configuration files for the AI SDK and chatbot:
- Create a file named
chatbot_config.env
in the
directory, and paste the following content into it:/samples/ai-sdk/sample_chatbot
CHATBOT_LLM_PROVIDER=<provider_name>
CHATBOT_LLM_MODEL=<llm_model>
CHATBOT_EMBEDDINGS_PROVIDER=<provider_name>
CHATBOT_EMBEDDINGS_MODEL=<embeddings_model>
<PROVIDER_AUTH_CONFIG>
CHATBOT_VECTOR_STORE_PROVIDER=Chroma
AI_SDK_HOST=http://localhost:8008
AI_SDK_USERNAME=admin
AI_SDK_PASSWORD=admin
- For the AI SDK, create the
api/utils/sdk_config.env
file and add the following content:
CHAT_PROVIDER=<provider_name>
CHAT_MODEL=<llm_model>
SQL_GENERATION_PROVIDER=<provider_name>
SQL_GENERATION_MODEL=<llm_model>
EMBEDDINGS_PROVIDER=<provider_name>
EMBEDDINGS_MODEL=<embeddings_model>
<PROVIDER_AUTH_CONFIG>
VECTOR_STORE=Chroma
VDB_NAMES=samples_bank
DATA_CATALOG_URL=http://localhost:9090/denodo-data-catalog/
DATA_CATALOG_METADATA_USER=admin
DATA_CATALOG_METADATA_PWD=admin
The parameters (bolded values in brackets) above should be replaced with the following values:
: This should be the name of the AI infrastructure provider being used. For example,Bedrock
,OpenAI
,AzureOpenAI
,Google
,Anthropic
, orOllama
.
: The name of the LLM model that will be used to generate text and SQL statements. For example,anthropic.claude-3-5-sonnet-20240620-v1:0
,Gemini-1.5-pro
,gpt-4o
, etc.
: The name of the embeddings model that will generate the vector search index used by the chatbot. Examples are:text-embedding-3-large
,amazon.titan-embed-text-v1
,text-embedding-004
, etc.
: This should be replaced by the correct environment variable name and value; this configures the authentication for the selected provider, which is generally an API key but could involve other configuration parameters. For OpenAI, this would be replaced by:OPENAI_API_KEY=asd9f79867aa9s87df87as...
OPENAI_ORG_ID=org-23ds...
But for Azure this might involve:AZURE_OPENAI_ENDPOINT=...
AZURE_API_VERSION=...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_PROXY=...
Deployment
The first time that the AI SDK and sample chatbot is started, the following command should be executed:
$ python run.py both --load-demo --host localhost --grpc-port 9994 --dc-port 9090
If an error like the following is returned:
INTERNAL: [FlightSQL] Error executing command: Unexpected error (Internal; DoGet: endpoint 0: [])
Please make sure the Denodo Platform is running and attempt the above command again.
After starting up the chatbot once, the following command can be used to start the chatbot again:
$ python run.py both
This will simply start the AI SDK and the sample chatbot, assuming that the relevant data sets are already accessible through the Denodo Platform.
Finally, if only the AI SDK is needed, this can be started by specifying the "api" option:
$ python run.py api
At this point, if everything is working, move to the What can I do with this? Section of this tutorial . Otherwise, additional information useful for this installation method is added below:
Troubleshooting
When the API is running, log files related to the sample chatbot and AI SDK are written to the logs
directory of the AI SDK root folder. In order to troubleshoot errors originating from the Denodo Platform, please reference the
and
files from the Denodo Platform installation folder.
Cleanup
To deactivate the Python virtual environment, the following command can be executed:
$ deactivate
Then, to remove all resources associated with this virtual environment, recursively delete the
folder.
The Denodo Platform provides an uninstaller in the
folder.
Uninstalling Python will depend on the installation method, but this can usually be done using the Windows application manager or the package manager used to install Python in Linux and Mac systems.
To start, we'll access the chatbot directly and showcase the functionality that this provides. The setup steps automatically import some sample data related to an organization involved in banking, which we'll introspect through the chatbot.
Accessing the Chatbot
By default, a sample chatbot will be started following the setup that was performed previously in this guide. This chatbot can be accessed by navigating to the following link: http://localhost:9992
For now, login using the administrator credentials:
- Login:
admin
- Password:
admin
However, note that this user is delegated to the Denodo Platform and can be used to restrict access to data on a per-user basis–this will be reviewed in more detail later on in the Securing Data Access section.
The following screen should appear:
Straight into the Data
A sample line of questioning for the chatbot illustrating important features is included below. However, feel free to ask any questions about the sample data.
To make inputting the questions easier, SHIFT + ENTER
can be used to submit questions to the chatbot.
1. First, we'll start with an introduction
Hello, I would like some help with some questions.
This shows that the chatbot still has full power to answer normal questions using the underlying LLM, which we'll use in more detail later. This also lets the LLM know that we're nice in case of a Skynet event 🙂.
2. To look into the tables accessible through this system, we can simply ask another question
What tables do we have related to mortgages?
The Denodo Platform doesn't just allow access to the underlying data–it also stores important metadata about the data sets in its catalog.
Additionally, the AI system is able to relate our question to data sets that do not explicitly mention mortgages; this is due to the fact that the system is performing a semantic search on the information stored in the Denodo Platform (a vector search in the backend), so additional tables are returned that include information relevant to the question.
3. Now, let's look at actual data
Can you provide me with the list of active mortgages, including loan amount and property value?
This showcases the general data retrieval functionality of the system. It tells us the tables involved in the query, the SQL query executed, and formats the returned data:
To add to this, we also get the following when asking data related questions:
- Clicking on the Denodo icon will show the query executed against the Denodo Platform:
- Clicking on the CSV icon allows the returned data to be downloaded in an XML document...just kidding, it's a CSV file 🙂.
- If data can be returned in the format of a graphic the data is returned in that format, which allows us to rapidly generate much better insights. This is accessible by clicking on the rising bar graph icon:
(note the outlying high loans on properties with relatively lower value) - Additional suggested questions are provided for immediate follow up.
- The tables returned by the contextual search are shown at the bottom, with the tables used in the query highlighted in green.
4. Using the insight we just gained, we can immediately dig deeper into the related data
Could you provide more details about the customers that have loans higher than the property value?
This would be very important information for a business user to be able to figure out to whom these larger loans have been granted:
5. We can also ask questions based on the conversation history
Who were the loan officers that approved those loans?
This demonstrates a powerful aspect of AI systems based on LLMs: the amount of effort necessary to get information from a backend system is significantly reduced, and the interaction between the backend system and the user more closely resembles a normal conversation:
6. The last feature we would like to showcase ties into point #1; we can take the information we've gathered from our data and ask the LLM system (and basically the internet) for more information related to this topic
What are the risks for a bank that has mortgages with a loan higher than the property value?
This allows users to seamlessly transition from interrogating business data to asking questions to the knowledge base accessible to the general public.
What is the AI SDK doing?
The Denodo AI SDK provides an API interface that executes common tasks for an application responsible for querying organizational data. This AI SDK is built on top of the LangChain framework and is written in Python for the following reasons:
- To support many AI infrastructure providers, and not limit users to specific providers.
- To make any necessary customizations of the code easier to implement.
The main tasks that it performs (though not limited to this list) are the following:
Storing and Searching Metadata with a Vector Search Index
In order to be able to find data sets that are relevant to the question asked by a user, it is important for an AI application to be able to search over the metadata (and some sample data) of the data set.
The AI SDK performs this task by retrieving the relevant data from the Denodo Platform (such as column names, types, descriptions, and examples of the data in the column) and storing this into a vector index. Below is a high level diagram illustrating the overall data flow in this operation:
After creating this index, plain text queries can be sent to the /similaritySearch
endpoint of the Denodo AI SDK and relevant tables and metadata will be returned.
Answering User Questions
After constructing a searchable vector index with metadata, it is possible to write the logic to answer questions based on the available information.
The AI SDK provides multiple endpoints for answering user questions; the main one is the general /answerQuestion
endpoint (or /streamAnswerQuestion
for a streaming response). A high level overview of the AI SDK's logic is the following:
The above takes the user's question and constructs a prompt to the LLM, which adds potentially related tables and other information that would help the AI SDK decide which path it should take in answering the question.
After getting the categorization response from the LLM, the AI SDK then uses the determined category to move onto the next path in its logic:
- If the question is not related to data available in the underlying systems, then the LLM would include an LLM based, direct response to the question in the first operation. This LLM response is then directed back to the user.
- If the question is related to data available in the Denodo Platform, then the LLM will be prompted to generate a query based on the available views and their schemas. The query is then fed into the Denodo Platform, and another call is made to the LLM to take the response from the Denodo Platform and format it to answer the input question.
- If it is determined that the question can be answered using unstructured data stored in the vector store, then the LLM will provide a search term with which to search the unstructured index. The responses will be sent in another prompt to the LLM in order to provide context to answer the input question.
How is the data made available?
In order to add data sets to the Denodo Platform, we will need to define connection details to the underlying source of the data in the platform. In this section, we will review the data sets that were automatically imported for the demo to provide a feel for how this works; then, in the next section (How do I integrate my own data?), we will review how to import new data sets.
Connecting to Denodo
The Graphical User Interface for the Denodo Platform is the Design Studio, which allows users to create and manage new data connections. For more information about the use of this tool, please reference section 6 of the Installation and Bootstrapping tutorial (though there is no need to create a database for now). The Design Studio from our installation can be accessed through the following link: http://localhost:9090/denodo-design-studio
The default credentials are:
- Login:
admin
- Password:
admin
Integrating Datasets
The data for the lab is all retrieved in real time from these CSV files, yet as a data consumer this is not necessary information to know. This idea of abstraction and support for many different data sources is central to the value of the Denodo Platform and allows for efficient scaling of data delivery, but let's get back on topic.
Data sources are the elements in the Denodo Platform that reach out to sources of data and parse this information into a relational format; for now, we will review one of the CSV data sources used in the demo (in the next section –How do I integrate my own data?– steps for general data sources will be provided).
Double clicking on the 01 - Data Sources
folder and the ds_bank_customer
element will show the connection details to the humble CSV file:
This page contains all the details necessary to retrieve the CSV file in real time from wherever it is located. Since we selected a data source that can process CSV files, Denodo will know how to convert the file it retrieves into a relational structure.
Give me the data!
OK, ok, geez. On top of each data source, "base views" can be created that act just like a table, but correspond to physical data located somewhere else. This is the uniform structure on which many different types of data can be ingested; even though many different types of data sources can be created in the Denodo Platform, all the base views have the same relational structure.
Double clicking on the customer
base view in the 02 - Datasets
(this is the base view created on top of the ds_bank_customer
data source) will show the Summary
tab, which shows the schema of the data returned from the underlying data source (our CSV file):
To retrieve the data from the CSV data source, double click on the customer
view and click on Query
. This will bring up an execution window, in which the Execute
button can again be selected:
Adding Semantic Information
The Data Catalog is another system built into the Denodo Platform. It is not part of the AI SDK as it is mainly involved in data delivery, but it is heavily used by the AI SDK in order to provide complete semantic information about available data sets.
The goal of the Data Catalog is to streamline the process of searching for data and accelerate data consumption by a wider audience. The Data Catalog does this by simplifying the data access interface and providing additional tools to help usage, as compared to a more developer oriented tool like the Design Studio.
The Data Catalog can be accessed by using the below link: http://localhost:9090/denodo-data-catalog
Again, the default credentials can be used:
- Login:
admin
- Password:
admin
After logging in, the following window will be displayed:
This tool allows us to search for relevant views or columns from our views. If we search for "customer" and click on the first result, this will again bring up the customer
view that we saw previously.
In this tool, we have options to review the semantics of the view, create a query using drag and drop options, review the lineage and relationships of the data, and more:
The descriptions of fields have already been defined in the Data Catalog, but this is the location where Data Stewards would be able to update semantic information about the data sets. Being the central location for business usage, data democratization, and collaboration, this is also from where the AI SDK pulls all semantic information.
The following checklist will provide the chatbot access to additional data sets:
- In the Design Studio, navigate to
Administration > Database management
and click on the+ New
button. Provide a name for the database to store the new datasets that will be created, and click onOk
. This is only necessary the first time this is performed, and will prevent the chatbot from leveraging the sample data sets. - Click on the three dots next to the new database in the catalog on the left and select
New > Data Source
. In the window that appears, search for the data source connector corresponding to the desired data source. - After finding the specific connector, please insert the correct connection parameters for the data source. The
Test Connection
button can be used to validate that these parameters are correctly defined. - To create a logical view corresponding to the physical data, click on
Create Base View
. After filling out any prompts that appear this will automatically create a view with the schema of the source data in the Denodo Platform's catalog. - (Optional) Curate the metadata of the view:
Rename the view (click on the 3 dots to the right of the created view and selectRename
).
Open the view (double clicking on it) and selectEdit
. Rename columns and add descriptions.
In theMetadata
tab of theEdit
dialog, add a view description. - Login to the Data Catalog. If descriptions have not been added yet, make sure to add any remaining semantic information to views. Then, synchronize the Data Catalog with the Denodo Platform, by navigating to
Administration > Sync with VDP
in the Data Catalog. - In order to have the chatbot update its vector store index and point to the new data, the chatbot will need to be configured to reference the new database and retrieve the metadata. This can be done by:
a) Updating theVDB_NAMES
environment variable with the name of the new database created in step #1. For Docker installations this environment variable is set in the.env
file, while for Python installations this variable is found in the
file./samples/ai-sdk/sample_chatbot/chatbot_config.env
b) Restarting the chatbot. The startup of the chatbot will automatically construct the vector store index of the metadata. The following commands will do this:
Install | Commands |
Docker |
|
Python | Cancel the running process, and start it again:
|
Note that if additional data sets are added but no new databases are created in Denodo, the chatbot's vector index can also be updated without restarting the chatbot using the following commands:
OS | Command (executed in the terminal) |
Windows |
|
Linux |
|
Make sure to provide the password for the admin
user when prompted (admin
).
After performing the above, the chatbot can be queried for information about the additional data.
Up until now, this guide has described an example AI chatbot system that could be constructed on top of the Denodo AI SDK. However, this system is not ready for production usage; in order to move this system from an example to a production environment the following steps should be performed:
Securing Data Access
After creating data sources and views with which to consume data in a standard format, it's then important to decide who should be able to access the data. After all, you don't want everyone seeing personal loan information or the number of times you accidentally inserted the same record into that one table.
The Denodo Platform's security model provides granular security restrictions (down to the row and column level) that can be assigned to users and roles. External Identity Providers can be configured in order to seamlessly integrate into an organization's existing security model. Before we get off track again, the Denodo Security Overview page can be referenced for all the technical details.
For now, we will create a user and review the permissions that can be assigned to them. Clicking on Administration > User Management
will open the following window:
Clicking on "+ New" will allow a new user to be created.
After creating the user, we can select that user and click on Edit privileges
; this brings up a dialog box showing the databases available in the catalog:
This allows database level permissions to be assigned. Clicking on the pencil icon in the far right Advanced
column allows permissions to be assigned to specific views in the Denodo Platform's catalog:
And clicking the pencil icon under Column privileges
and View restrictions
allows administrators to modify the queryable columns and row restrictions of a view, respectively.
After modifying the user's permissions to only be able to access specific views, logging out and logging back into the chatbot with the user will only allow specific queries to be executed.
Enabling TLS
When configuring an application that will communicate over the internet (or even within an organization), it is essential to make sure that communication between the server and client is encrypted. The AI SDK, sample chatbot, and Denodo Platform all support the configuration of HTTPS.
Requirements
For the AI SDK, a private key and public certificate file must be generated–the Denodo Platform supports multiple certificate formats.
For production use a network administrator will generally be able to generate this; however, for testing purposes a self signed certificate can also be generated. Please ensure that the certificates include the correct Subject Alternative Names corresponding to the hostname that will be used to access these services. The OpenSSL utility will be able to quickly generate a self signed certificate that can be used by the AI SDK.
Necessary certificates:
Service | Certificate Format |
Sample chatbot | TLS is not supported; this application is only for testing purposes |
Denodo AI SDK | Private key + public certificate |
Denodo Platform | PEM-Encoded Key and Certificates, PKCS12 Keystore and Public Certificates in CER Format, or a PKCS12 bundle. |
Configuration
After making sure that the necessary certificates are available, the following changes must be made in each component (for which TLS should be enabled):
Service | Configuration steps (Docker) | Configuration steps (Python) |
Denodo AI SDK | Mount the certificates into the api container. Add the following environment variables to the service pointing to the location of these certificates:
| Make sure the certificates are available in the machine where the AI SDK is running. Update the
|
Denodo Platform | Mount the certificates into the | Make sure the certificates are in the machine running the Denodo Platform. Use the Denodo SSL/TLS Configurator Script to set this up efficiently. |
Developing AI Applications
The system with which we've been interacting is the sample chatbot distributed with the Denodo AI SDK. However, note that this system is designed to provide a sample of the functionality available, and not a production ready system. The goal of the AI SDK is to support proprietary AI applications or chatbots, as the system defines a set of API endpoints that are useful in AI applications retrieving data from Denodo.
In order to review the complete functionality that this system provides, the AI SDK's documentation can be accessed using the following link:
http://localhost:8008/docs. This documentation follows the OpenAPI Specification.
Congratulations! You have completed the Building an AI Chatbot with Denodo tutorial.
This is just the first step for using the Denodo AI SDK, an open-source toolkit to help developers create AI-powered applications and agents faster and with fewer obstacles. Now, you are prepared to integrate your structured data into GenAI models, enabling higher accuracy and better performance for your AI applications.