Retrieval-Augmented Generation (RAG) is transforming how developers build robust, context-aware AI applications. If you’re looking to build a custom RAG pipeline with Haystack v2, you’re in the right place. This hands-on tutorial walks you through every step, from setup to production-ready inference, with practical code, troubleshooting tips, and next steps.
For a broader overview of RAG architectures, their use-cases, and deployment strategies, check out our Ultimate Guide to RAG Pipelines: Building Reliable Retrieval-Augmented Generation Systems. Here, we’ll dive deep into the nuts and bolts of building your own pipeline using Haystack v2.
Prerequisites
- Python: Version 3.8 or higher (tested with 3.10 recommended)
- Haystack: Version 2.x (tested with 2.0.0+)
- Basic Python knowledge (functions, classes, virtual environments)
- Familiarity with REST APIs and JSON (optional, for advanced integrations)
- Command-line access (Windows, Linux, or macOS terminal)
- Text editor/IDE (VSCode, PyCharm, etc.)
- OpenAI API key (if using GPT models; or HuggingFace API key for open models)
We’ll use a simple local document store (FAISS) and OpenAI’s GPT-3.5-turbo for demonstration, but you can swap in other vector stores or LLMs as needed.
1. Environment Setup
-
Create a Virtual Environment
python3 -m venv rag-tutorial-env source rag-tutorial-env/bin/activate # On Windows: rag-tutorial-env\Scripts\activate -
Install Haystack v2 and Required Libraries
pip install farm-haystack[faiss,openai]==2.0.0This installs Haystack with FAISS (for vector search) and OpenAI (for LLMs). For other backends, see Haystack’s
[extras]options. -
Set Your API Key (if using OpenAI)
export OPENAI_API_KEY=your-openai-api-key # On Windows: set OPENAI_API_KEY=your-openai-api-key
2. Prepare Your Data
-
Organize Documents
Place your text files in a folder called
data/. For this tutorial, let’s use three sample text files:data/doc1.txt: "Haystack is an open-source framework for building search systems powered by language models."data/doc2.txt: "Retrieval-Augmented Generation combines retrieval and generation to improve answer accuracy."data/doc3.txt: "FAISS is a library for efficient similarity search and clustering of dense vectors."
(You can use your own corpus; just adjust the file names.)
3. Build the Haystack Pipeline
-
Import Required Modules
from haystack import Pipeline from haystack.document_stores import FAISSDocumentStore from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http -
Initialize the Document Store (FAISS)
document_store = FAISSDocumentStore(embedding_dim=384, faiss_index_factory_str="Flat")Tip:
embedding_dimshould match your retriever’s embedding model. We’ll usesentence-transformers/all-MiniLM-L6-v2(384 dimensions). -
Index Your Documents
from haystack.utils import convert_files_to_docs docs = convert_files_to_docs(dir_path="data/") document_store.write_documents(docs)This step parses your text files and writes them into the FAISS vector store.
-
Add Embeddings with a Retriever
retriever = EmbeddingRetriever( document_store=document_store, embedding_model="sentence-transformers/all-MiniLM-L6-v2", model_format="sentence_transformers" ) document_store.update_embeddings(retriever)The retriever will embed your documents for semantic search.
-
Set Up a PromptNode for Generation
prompt_template = PromptTemplate( prompt="Answer the question based on the following context: {join(documents)} \n Question: {query}", output_parser=None ) generator = PromptNode( model_name_or_path="gpt-3.5-turbo", api_key=None, # Uses environment variable default_prompt_template=prompt_template, max_length=256 )Note: For open-source LLMs, use
model_name_or_pathlike"google/flan-t5-base"and adjust Haystack’sPromptNodesettings. -
Assemble the Pipeline
rag_pipeline = Pipeline() rag_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"]) rag_pipeline.add_node(component=generator, name="Generator", inputs=["Retriever"])This connects the retriever and generator:
Query → Retriever → Generator.
4. Run Inference: Ask Questions!
-
Query the Pipeline
query = "What is FAISS used for?" result = rag_pipeline.run(query=query) print(result["results"])Expected Output: The model should generate an answer using retrieved context, e.g.:
['FAISS is a library for efficient similarity search and clustering of dense vectors.']Screenshot description: Terminal output showing the answer generated by the pipeline in response to the user query.
-
Try Another Query
query = "What does RAG stand for?" result = rag_pipeline.run(query=query) print(result["results"])Screenshot description: Terminal output with the pipeline generating the correct expansion of "RAG" and a brief explanation.
5. Customizing and Extending Your Pipeline
-
Swap in a Different Retriever or LLM
You can use other embedding models (e.g.,
BAAI/bge-base-en) or LLMs (e.g., HuggingFace’stiiuae/falcon-7b-instruct). Adjustembedding_dimandmodel_name_or_pathaccordingly. -
Add a Ranker or Filter
from haystack.nodes import TransformersRanker ranker = TransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-6-v2") rag_pipeline.add_node(component=ranker, name="Ranker", inputs=["Retriever"]) rag_pipeline.connect("Ranker", "Generator")This improves answer relevance by re-ranking retrieved documents before generation.
For more on scaling and optimizing large RAG deployments, see Scaling RAG for 100K+ Documents: Sharding, Caching, and Cost Control.
-
Experiment with Prompt Engineering
Adjust the
PromptTemplatefor your use-case, or try techniques from Reducing Hallucinations in RAG Workflows: Prompting and Retrieval Strategies for 2026.
Common Issues & Troubleshooting
-
ImportError: No module named 'haystack'
Solution: Double-check your virtual environment is activated and runpip install farm-haystack[faiss,openai]==2.0.0
-
OpenAI authentication errors
Solution: Ensure yourOPENAI_API_KEYis set in your environment. On Windows, useset OPENAI_API_KEY=your-key
-
FAISS errors about embedding dimensions
Solution: Make sureembedding_diminFAISSDocumentStorematches yourEmbeddingRetriever’s model. -
Pipeline returns empty or irrelevant answers
Solution: Check your data is loaded correctly, embeddings are up-to-date, and try a different embedding model or prompt template. -
OutOfMemory or CUDA errors (on GPU)
Solution: Use smaller models, batch queries, or run on CPU by settingCUDA_VISIBLE_DEVICES="".
Next Steps
You’ve now built a working custom RAG pipeline with Haystack v2! From here, you can:
- Scale up with larger document stores, sharding, and caching (see Scaling RAG for 100K+ Documents: Sharding, Caching, and Cost Control).
- Experiment with advanced prompting and retrieval strategies to reduce hallucinations (Reducing Hallucinations in RAG Workflows: Prompting and Retrieval Strategies for 2026).
- Integrate your pipeline into a REST API or chatbot UI.
- Explore other vector stores (Weaviate, Milvus), LLMs, or multi-modal pipelines.
- Dig deeper into RAG architecture and best practices in our Ultimate Guide to RAG Pipelines.
RAG is a fast-moving field—keep experimenting, and join the Haystack and open-source communities to stay up to date!
