In this deep technical tutorial, you’ll learn how to build an AI chatbot with memory using Python, FastAPI, and LangChain. Unlike basic chatbots, your bot will remember previous messages in the conversation, enabling more natural and context-aware interactions. This guide is designed for developers who want to create a practical, production-ready chatbot with persistent memory features.
For a broader look at modern AI-powered developer tools, see our guide to the best AI-powered API services for developers in 2026.
Prerequisites
- Python 3.10+ (tested with 3.11)
- pip (Python package manager)
- Basic knowledge of Python and REST APIs
- OpenAI API key (for LLM access; you can substitute other LLM APIs if desired)
- Terminal/Command Line access
- Recommended: Visual Studio Code or similar code editor
1. Set Up Your Project Environment
-
Create a project directory:
mkdir ai-chatbot-memory && cd ai-chatbot-memory
-
Set up a Python virtual environment:
python3 -m venv venv
source venv/bin/activate
-
Install required packages:
pip install fastapi uvicorn langchain openai python-dotenv
-
Create a
.envfile for your OpenAI API key:
touch .env
Add the following line to.env(replace with your key):OPENAI_API_KEY=sk-...
-
Directory structure overview:
ai-chatbot-memory/ ├── venv/ ├── main.py ├── .env
2. Build the Chatbot API with FastAPI
-
Create
main.pyand import dependencies:import os from fastapi import FastAPI, Request from fastapi.responses import JSONResponse from dotenv import load_dotenv from langchain.memory import ConversationBufferMemory from langchain.chat_models import ChatOpenAI from langchain.chains import ConversationChain load_dotenv() -
Initialize FastAPI and set up OpenAI:
app = FastAPI() openai_api_key = os.getenv("OPENAI_API_KEY") if not openai_api_key: raise ValueError("OPENAI_API_KEY not found in .env") llm = ChatOpenAI(openai_api_key=openai_api_key, temperature=0.2) memory = ConversationBufferMemory(return_messages=True) conversation = ConversationChain(llm=llm, memory=memory) -
Define the chat endpoint:
@app.post("/chat") async def chat(request: Request): data = await request.json() user_message = data.get("message") if not user_message: return JSONResponse({"error": "No message provided."}, status_code=400) # Generate response with memory response = conversation.predict(input=user_message) return {"response": response} -
Run the FastAPI server:
uvicorn main:app --reload
The API will be available athttp://127.0.0.1:8000
3. Test the Chatbot’s Memory Function
-
Use
curlor Postman to chat with your bot:curl -X POST http://127.0.0.1:8000/chat \ -H "Content-Type: application/json" \ -d '{"message": "Hi, my name is Alex."}' -
Send a follow-up message referencing prior context:
curl -X POST http://127.0.0.1:8000/chat \ -H "Content-Type: application/json" \ -d '{"message": "What did I just tell you my name was?"}'The bot should reply with:
Your name is Alex. -
Try a longer conversation:
curl -X POST http://127.0.0.1:8000/chat \ -H "Content-Type: application/json" \ -d '{"message": "Remember that my favorite color is blue."}' curl -X POST http://127.0.0.1:8000/chat \ -H "Content-Type: application/json" \ -d '{"message": "What's my favorite color?"}'
Screenshot description:
Terminal window showing a series of curl commands and the chatbot’s JSON responses, confirming that the bot remembers and recalls user-specific information.
4. Persisting Memory Across Sessions (Optional)
By default, ConversationBufferMemory only remembers messages for the current process. To persist memory across server restarts or for multiple users, you’ll want to store conversations in a database.
-
Install SQLite support:
pip install sqlalchemy
-
Extend your code to store and retrieve conversation history per user:
from sqlalchemy import create_engine, Column, Integer, String, Text from sqlalchemy.orm import declarative_base, sessionmaker DATABASE_URL = "sqlite:///./chat_memory.db" engine = create_engine(DATABASE_URL) Base = declarative_base() SessionLocal = sessionmaker(bind=engine) class Conversation(Base): __tablename__ = "conversations" id = Column(Integer, primary_key=True, index=True) user_id = Column(String, index=True) history = Column(Text) # Store as JSON string Base.metadata.create_all(bind=engine) import json def get_user_memory(user_id): db = SessionLocal() convo = db.query(Conversation).filter_by(user_id=user_id).first() db.close() if convo: return json.loads(convo.history) return [] def save_user_memory(user_id, messages): db = SessionLocal() convo = db.query(Conversation).filter_by(user_id=user_id).first() if convo: convo.history = json.dumps(messages) else: convo = Conversation(user_id=user_id, history=json.dumps(messages)) db.add(convo) db.commit() db.close() -
Modify your
/chatendpoint to accept auser_idand persist memory:@app.post("/chat") async def chat(request: Request): data = await request.json() user_message = data.get("message") user_id = data.get("user_id") if not user_message or not user_id: return JSONResponse({"error": "Message and user_id required."}, status_code=400) # Load memory for this user history = get_user_memory(user_id) memory = ConversationBufferMemory(return_messages=True) memory.chat_memory.messages = history conversation = ConversationChain(llm=llm, memory=memory) response = conversation.predict(input=user_message) # Save updated memory save_user_memory(user_id, memory.chat_memory.messages) return {"response": response}Now, each user’s conversation history is stored and recalled even if you restart the server.
5. Enhancing the Bot: Advanced Memory Types
LangChain supports several memory types, such as ConversationSummaryMemory (summarizes long chats) and VectorStoreRetrieverMemory (retrieves context using embeddings). To switch to summary memory:
-
Install
tiktokenfor token counting:pip install tiktoken
-
Update your memory initialization:
from langchain.memory import ConversationSummaryMemory summary_memory = ConversationSummaryMemory(llm=llm, return_messages=True) conversation = ConversationChain(llm=llm, memory=summary_memory) -
Test with longer conversations:
The bot will now summarize previous exchanges, keeping memory concise and contextually relevant.
Common Issues & Troubleshooting
-
OpenAI API Key Error:
If you seeopenai.error.AuthenticationError, double-check your.envfile and ensure it’s loaded properly. -
Memory Not Persisting:
If memory resets between requests, ensure you are storing and retrieving memory per user (see step 4). -
FastAPI Not Running:
Make sure you’re in the correct directory and your virtual environment is activated before runninguvicorn. -
High API Costs:
Remember that each API call to OpenAI may incur costs. For development, use smaller models likegpt-3.5-turbo. -
Rate Limiting:
If you hit OpenAI’s rate limits, add atime.sleep()between requests or request a higher quota.
Next Steps
- Deploy your chatbot to cloud platforms like Heroku, AWS, or Azure for public use.
- Integrate with messaging platforms such as Slack, Discord, or WhatsApp using their APIs.
- Add user authentication to protect and personalize conversations.
- Experiment with other LLM providers (see our guide to the best AI-powered API services for developers in 2026 for more options).
- Implement advanced memory features like semantic search, knowledge base integration, or long-term storage.
By following this tutorial, you’ve learned how to build an AI chatbot with memory—from basic context retention to persistent, user-specific conversation history. This foundation empowers you to create chatbots that are not only intelligent but also contextually aware, opening the door to a new generation of interactive AI applications.
