Overview
Memory within artificial intelligence(AI) systems is how large language models (LLM’s) recall previous interactions. It is a useful feature that allows question and answer based systems to feel more “alive” and responsive. When integrating Retrieval-Augmented Generation (RAG) systems like Denodo’s AI SDK, memory becomes crucial for creating a more interactive and intelligent solution. Not only does memory make a chatbot feel more interactive, but is key in making the AI more contextually aware and capable of delivering personalized and coherent responses.
Although there are several flavors of memory that you could implement, we will be focusing on conversational buffer memory. This approach involves bringing conversation history into the prompt to create a transient memory that retains information on recent interactions within the session. This type of memory does not require any permanent storage; and therefore, is less exposed from a security standpoint. In the code walkthrough we have in this guide, we will demonstrate how to implement it.
Memory In a Chatbot Application
In the following diagram, we have generalized what the flow for a chatbot memory looks like in order to understand how memory should fit into our application.
The flow begins when a user interacts with a genAI application. If this is the user’s first interaction, the application will create the memory components needed to keep track of their conversation history. In our case, memory storage happens through Python classes, such as lists or dictionaries, to retain the last few chat interactions. Since we are using “contextual memory” here, nothing is saved permanently, and each new conversation starts with an empty memory.
You can design your genAI system to include the entire memory in each request. However that would not be very efficient. Instead, before submitting queries to the AI SDK, the application optimizes the interaction by summarizing the conversational history as it is relevant to the current question using the LLM itself. This is critical to reduce token usage by avoiding the repeated inclusion of the entire interaction and increases the accuracy of the conversation. We will explain this step more in depth in the next section.
Next we will need to pass all the desired parameters, including the summarized interaction described in the step above, to the API endpoint of the Denodo AI SDK. Then a result is returned to the user. Finally, the interaction is saved to the Memory object to be retrieved in future queries. The stale memory is also removed from the memory object.
For the Denodo AI SDK, sample code of a memory implementation can be found in the folder /sample-chatbot. This code demonstrates how memory can be implemented using Langchain, and provides a quick and easy way to simplify your testing. In your code, depending on your actual chatbot implementation and orchestration framework in use, the memory data store can look slightly different; however, the steps it follows are generally like the previous diagram.
Denodo AI SDK Sample Chatbot Application
We have included sample code of how memory could work with Denodos’ AI SDK within the sample chatbot file distributed. Take this as a sample of how memory could work within your generative AI application. Before we start, let's get comfortable with the folder structure of the Denodo AI SDK.
The Denodo AI SDK is separated into three main folders: utils, rest-api, and sample-chatbot.
- In the api folder, you can find specific code on running the backend server, and also the code behind each individual API endpoint.
- The utils folder holds general functions and classes used inside the API but also from the sample chatbot. These are classes like UniformvectorDB and UniformLLM that standardize the different inherited classes from Langchain into a standard, easy-to-use class. You can also leverage this logic from your own application.
- The sample_chatbot folder holds code for our sample chatbot experience. This leverages ReactJS to create a UI for interacting with the APIs created in the api folder.
For our sample code, we have included a memory implementation within the chatbot_engine.py file in the sample_chatbot folder. We will now cover how this works in order to improve the user experience in any chatbot application
Diving into the code
Our sample user interface follows these steps to maintain session memory:
- Creates a memory object for storing previous interaction
- Summarizes the User Query and Memory
- Calling The AI SDK
- Updates memory with the previous interaction
Let’s walk through the code snippets that implement each step:
1. Creating a Memory Object for Storing Previous interactions
In our sample_chatbot folder the memory object is defined within the class ChatbotEngine in the chatbot_engine.py file. In this class, memory is handled by a list parameter in the ChatbotEngine constructor that initializes empty and will store the conversation history.
The server initializes with an empty session for each user handled by the ChatbotEngine class and is secured by the same username and password used to access the AI SDK.
class ChatbotEngine: def __init__(self, llm, system_prompt, tool_selection_prompt, tools, api_host, username, password, vdp_database_names, vector_store_provider, message_history = 4): … self.chat_history = [] |
Class ChatbotEngine in chatbotengine.py.
The ChatbotEngine class is designed to manage a connection with the Denodo AI SDK. Within this class definition, message_history specifies the number of recent messages to retain for a conversation. Its default value is 4, meaning that only the last 4 pairs of user questions and AI responses are kept in memory.
The ChatbotEngine leverages classes from Langchain like ChatPromptTemplate for memory management. This can be seen within the constructor of the ChatbotEngine class with the parameters tools_prompt and answer_with_tool_prompt, that we will describe below.
from langchain_core.prompts import MessagesPlaceholder, ChatPromptTemplate … self.tools_prompt = ChatPromptTemplate.from_messages([ ("system", self.system_prompt), MessagesPlaceholder("chat_history", n_messages=self.message_history), ("human", "{input}" + self.tool_selection_prompt + "{force_tool}"), ]) |
Class ChatbotEngine in chatbotengine.py.
ChatPromptTemplate is a Langchain class used for formatting our prompt in chat-based language models. The structure of a ChatPromptTemplate is a list of messages, formatted according to a specified pattern that models like OpenAI's ChatGPT can understand. Another important detail is that we are leveraging the from_messages function of the ChatPromptTemplate object. This will take a list formatted as dictionaries with fields like "role" and "content," to indicate whether each part of the conversation is coming from the "system", "user", or "assistant”. This will help the LLM distinguish user inputs from AI-generated responses.
In our sample code, the user’s inputs are represented with the placeholder {question} and the key “human”. The memory of the chat history is held with the MessagesPlaceholder class from Langchain and uses the key “history”. Finally, the key “system” represents the system prompt that the LLM has. In the following code snippet, we show how calling ChatPromptTemplate.from_messages works, simulating what the sample chatbot does. These are quick examples to understand how these functions work with both values in MessagesPlaceholder and without values indicating having memory of the chat history or not. This will give you a sense on how it is formatting queries and chat history before submitting it to the LLM.
from langchain_core.prompts import MessagesPlaceholder, ChatPromptTemplate ChatPromptTemplate.from_messages( #system prompt [("system", "You are a helpful assistant."), #placeholder for memory MessagesPlaceholder("chat_history"), #human query ("human", "{question}")]) |
In a conversation where the user wants to follow up on the question “5+2” by asking the application to multiply it by 4, the ChatPromptTemplate would look like the following after invoking:
ChatPromptValue(messages=[ # System Prompt SystemMessage(content="You are a helpful assistant."), # Memory(first answer) AIMessage(content="5 + 2 is 7"), # Memory(first question) HumanMessage(content="what's 5 + 2") # Current User Query HumanMessage(content="now multiply that by 4")]) |
As we can see, ChatPromptTemplate organizes the chat history into the sequential order with sufficient context that the LLM can distinguish the memory and the current query.
For more information visit ChatPromptTemplate - LangChain documentation
2. Summarize the User Query and Memory
For our sample_chatbot, main.py is responsible for instantiating the ChatbotEngine object, which holds the memory as we covered earlier. ChatbotEngine is a class that takes care of interacting with the AI SDK. In its constructors, It holds memory as a list, and the ChatPromptTemplate wrapper functions we described earlier, formatted for the session memory.
Before submitting to the AI SDK, the application optimizes interactions with the APIs by summarizing the conversational history before submitting queries to the endpoints. Each new user query, along with the relevant history, is first sent to an LLM using a summarization prompt defined in the chatbot_prompts.env configuration file in a manner similar to the bellow diagram:
As we can see in this example, rather than sending the original query (“who is the loan officer that issued that loan”) plus the previous chat history that gives us the context for “that loan”, we use the LLM to produce a new prompt that includes the relevant information; “Who is the loan officer that issued the loan with id 15”
Below you can find a code snippet that works identically to the one provided in sample_chatbot in order to test how the summarizer is sending queries to the Denodo AI SDK API endpoints.
from langchain_core.messages import HumanMessage, SystemMessage,AIMessage from langchain_core.output_parsers import StrOutputParser from langchain.chat_models import ChatOpenAI from langchain_core.prompts import MessagesPlaceholder, ChatPromptTemplate prompt = ChatPromptTemplate.from_messages([ ("system", "Generate a Question that is more specific" ), MessagesPlaceholder("chat_history"), ("human", "{input}" )]) llm = ChatOpenAI(model="gpt-4o", temperature=0) chain = (prompt | llm | StrOutputParser()) chat_history = [ AIMessage(content = "Loan id 15 is the largest loan with a value of $600,000"), HumanMessage(content= “What is the largest loan?”) ] chain.invoke({ "input": "Who is the loan officer that issued that loan?", "chat_history": chat_history, }) |
We first define a prompt template to hold the different components of our prompt, this being the system prompt, chat history and user query. Then we define an LLM, in this case the Langchain class for OpenAI based models using GPT-4o. Then the prompt template and LLM classes are wrapped together using a Langchain Chain. This will handle calling the LLM when the PromptTemplate parameters are passed. In our prompt template, we defined the necessary parameters as chat history and user input so to invoke our chain, we would need to pass both of those. Finally we define the chat history as a variable to be passed to our LLM Chain. The code is the same as the previous example and invokes our Langchain chain, passing the chat template parameters.
The provided code simulates submitting a query to an AI system that already has memory designed using LangChain's core components and OpenAI's GPT models. The example chat history in the code simulates a conversation defined in the variable chat_history, where a user inquires about a client with the largest approved loan and its value, the same as our last diagram. Then the user submits a new query asking who the issuing officer is. This would ultimately return the following prompt, a summary of the memory and the current user query:
Input: Who is the loan officer that issued that loan Output: Who is the loan officer that issued loan id 15? |
In the sample chatbot code, this process is a bit different, as the memory summarization is combined with the idea of tool selection in a single call. The tool selection allows the sample chatbot to understand if the question is related to data, metadata or other additional options. The details can be seen in the file chatbot_tools.py
3. Calling the AI SDK
For our sample chatbot, our API calls to the Denodo AI SDK use the Langchain tools framework. Tools are functions designed for LLM’s to pass parameters to. Tools can be appended to a Langchain Orchestration LLM class by simply including the tools as part of the constructor (note you must have a tools enabled model). In our sample chatbot, we have built a sample tool implementation of our API calls in the chatbot_tools.py folder. The general look of these tools can be seen in the below function.
@timed def denodo_query(natural_language_query, api_host, username, password, plot = 0, plot_details = ''): request_params = { 'question': natural_language_query, 'mode': 'data', 'verbose': True, 'plot': bool(plot), 'plot_details': plot_details } try: response = requests.get( f'{api_host}/answerQuestion', params=request_params, auth=(username, password), verify=False ) response.raise_for_status() json_data = response.json() keys_to_remove = { 'answer', 'tokens', 'sql_execution_time', 'vector_store_search_time', 'llm_time', 'total_execution_time' } … |
Function denodo_query within chatbot_tools.py
denodo_query is a function that sends a GET request to The AI SDK /answerquestion API with a given question parameter, the user query, the mode, verbose plot and plot details. For configuring the query to the Denodo AI SDK API visit the /docs extension of your instance for the openAPI documentation.
The LLM knows how to populate the parameters of denodo_query with the tools prompt template stored in the chatbot_prompts.env.
DATABASE_QUERY_TOOL = "- Database Query Tool. I have full access to the company's database and I can query the database of the company of the user texting me. This tool automatically looks for the relevant tables/views where the data is, I don't have to specify the table unless told by the user. I also have the ability to visually plot the data returned by a query and create graphs. Usage: <database_query> <natural_language_query>Natural language query.</natural_language_query> <plot>1 for yes, 0 for no</plot> <plot_details>Any extra details of the graph to generate.</plot_details> </database_query>" |
DATABASE_QUERY_TOOL in chatbot_prompts.env
Where the unique parameters needed is defined within the Usage section of each prompt. As an example we can see <natural_language_query> </natural_language_query> and the definition of the input parameter.
For generating a response from the Denodo AI SDK, we call the process_tool_query we defined earlier, passing the tools functions for our AI SDK API calls and the result of the summarization LLM call.
tool_result = process_tool_query(first_input, self.tools) |
This will save the values returned from the AI SDK as the variable tool_result.
Alternatively, given that the AI SDK offers standard REST endpoints, you could call the API using any other method and just filling the necessary fields described in the API documentation.
4. Updating the Memory
After the Denodo AI SDK generates an answer for the user query and before saving to memory, the system formats the question and answer to contextualize them for the next prompt. In our sample chatbot, the formatting is done when the response is returned from the AI SDK. This is done with the function add_to_chat_history which, through string manipulation, adds context to the question and answer before appending them to the memory object.
At the bottom of process_query we can see the function add_to_chat_history that saves our memory.
def process_query(self, query, tool = None): … #Save result of the LLM to Memory add_to_chat_history( chat_history = self.chat_history, human_query = query, ai_response = ai_response, tool_name = tool_name, tool_output = tool_output, original_xml_call = original_xml_call ) |
Process_query within chatbot_engine.py
The implementation of that function, in the case of our chatbot, makes a distinction between different kinds of responses (i.e. database query, metadata, etc.) that store different message patterns relevant to the type of response. It also includes some additional controls to manage the length of the messages stored in memory, for example it only keeps the first 15 rows in case of a result set response.
def add_to_chat_history(chat_history, human_query, ai_response, tool_name, tool_output, original_xml_call): if tool_name == "database_query": execution_result = tool_output.get('execution_result', {}) if len(execution_result.items()) > 15: llm_execution_result = dict(list(execution_result.items())[:15]) llm_execution_result = str(llm_execution_result) + "... Showing only the first 15 rows of the execution result." else: llm_execution_result = execution_result sql_query = tool_output.get('sql_query', '') human_query = f"""{human_query} ## TOOL DETAILS I used the {tool_name} tool: {original_xml_call} Output: SQL Query: {sql_query} Execution result: {llm_execution_result} """ elif tool_name in ["metadata_query", "kb_lookup"]: human_query = f"""{human_query} ## TOOL DETAILS I used the {tool_name} tool: {original_xml_call} … |
Function add_to_chat_history in chatbotengine.py
Summary
In summary, adding session memory to any compound AI system like the Denodo AI SDK significantly enhances its ability to deliver personalized and contextually aware responses. In a standard installation, The AI SDK provides an out of the box solution to handle memory in a transient way specific to user sessions. It is important to remember that this memory is transient and does not require permanent storage, reducing potential security risks and compliance overhead.
For the users, The difference in usability between an AI system having memory and that which does not are stark, as can be seen with the following representation of how they compare. In the first image, the system retains context from previous exchanges, allowing it to provide more cohesive responses by remembering past interactions.
Conversation without Memory:
In this example without memory, the system's ability to maintain context is limited, requiring users to reestablish context with each new interaction or the queries will fail. We can see this in the third query where the LLM system does not know what the user refers to when saying “across all of those” despite being a response to the previous question.
Conversation with Memory:
Differing from the previous example, in this conversation the system’s memory enhances the user experience by maintaining continuity and understanding across interactions. We see this in the second and third query where the user does not need to include context of the previous conversation to continue investigating their dataset.
For systems without memory, every interaction is independent, meaning the system lacks awareness of past exchanges. This critically reduces context of past interactions and makes prompting much more important.
In conclusion, integrating memory into a RAG system significantly improves its ability to handle complex conversations where building on past interactions is necessary. This leads to a more efficient and human-like user experience, highlighting the crucial role memory plays in compound AI system design.
References
Memory - LangChain documentation
The information provided in the Denodo Knowledge Base is intended to assist our users in advanced uses of Denodo. Please note that the results from the application of processes and configurations detailed in these documents may vary depending on your specific environment. Use them at your own discretion.
For an official guide of supported features, please refer to the User Manuals. For questions on critical systems or complex environments we recommend you to contact your Denodo Customer Success Manager.