Building a Custom RAG Pipeline: Step-by-Step Tutorial with Haystack v2

Unlock the secrets to building a scalable, production-ready RAG pipeline using Haystack v2—complete with practical code and architecture tips.

Retrieval-Augmented Generation (RAG) is transforming how developers build robust, context-aware AI applications. If you’re looking to build a custom RAG pipeline with Haystack v2, you’re in the right place. This hands-on tutorial walks you through every step, from setup to production-ready inference, with practical code, troubleshooting tips, and next steps.

For a broader overview of RAG architectures, their use-cases, and deployment strategies, check out our Ultimate Guide to RAG Pipelines: Building Reliable Retrieval-Augmented Generation Systems. Here, we’ll dive deep into the nuts and bolts of building your own pipeline using Haystack v2.

Prerequisites

Python: Version 3.8 or higher (tested with 3.10 recommended)
Haystack: Version 2.x (tested with 2.0.0+)
Basic Python knowledge (functions, classes, virtual environments)
Familiarity with REST APIs and JSON (optional, for advanced integrations)
Command-line access (Windows, Linux, or macOS terminal)
Text editor/IDE (VSCode, PyCharm, etc.)
OpenAI API key (if using GPT models; or HuggingFace API key for open models)

We’ll use a simple local document store (FAISS) and OpenAI’s GPT-3.5-turbo for demonstration, but you can swap in other vector stores or LLMs as needed.

1. Environment Setup

Create a Virtual Environment

python3 -m venv rag-tutorial-env
source rag-tutorial-env/bin/activate  # On Windows: rag-tutorial-env\Scripts\activate

Install Haystack v2 and Required Libraries
```
pip install farm-haystack[faiss,openai]==2.0.0
    
```
This installs Haystack with FAISS (for vector search) and OpenAI (for LLMs). For other backends, see Haystack’s [extras] options.

Set Your API Key (if using OpenAI)

export OPENAI_API_KEY=your-openai-api-key  # On Windows: set OPENAI_API_KEY=your-openai-api-key

2. Prepare Your Data

Organize Documents
Place your text files in a folder called data/. For this tutorial, let’s use three sample text files:
- data/doc1.txt: "Haystack is an open-source framework for building search systems powered by language models."
- data/doc2.txt: "Retrieval-Augmented Generation combines retrieval and generation to improve answer accuracy."
- data/doc3.txt: "FAISS is a library for efficient similarity search and clustering of dense vectors."
(You can use your own corpus; just adjust the file names.)

3. Build the Haystack Pipeline

Import Required Modules


from haystack import Pipeline
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http

Initialize the Document Store (FAISS)
```
document_store = FAISSDocumentStore(embedding_dim=384, faiss_index_factory_str="Flat")
    
```
Tip: embedding_dim should match your retriever’s embedding model. We’ll use sentence-transformers/all-MiniLM-L6-v2 (384 dimensions).

Index Your Documents


from haystack.utils import convert_files_to_docs

docs = convert_files_to_docs(dir_path="data/")
document_store.write_documents(docs)

This step parses your text files and writes them into the FAISS vector store.

Add Embeddings with a Retriever


retriever = EmbeddingRetriever(
    document_store=document_store,
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",
    model_format="sentence_transformers"
)
document_store.update_embeddings(retriever)

The retriever will embed your documents for semantic search.

Set Up a PromptNode for Generation


prompt_template = PromptTemplate(
    prompt="Answer the question based on the following context: {join(documents)} \n Question: {query}",
    output_parser=None
)
generator = PromptNode(
    model_name_or_path="gpt-3.5-turbo",
    api_key=None,  # Uses environment variable
    default_prompt_template=prompt_template,
    max_length=256
)

Note: For open-source LLMs, use model_name_or_path like "google/flan-t5-base" and adjust Haystack’s PromptNode settings.

Assemble the Pipeline


rag_pipeline = Pipeline()
rag_pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
rag_pipeline.add_node(component=generator, name="Generator", inputs=["Retriever"])

This connects the retriever and generator: Query → Retriever → Generator.

4. Run Inference: Ask Questions!

Query the Pipeline
```
query = "What is FAISS used for?"
result = rag_pipeline.run(query=query)
print(result["results"])
    
```
Expected Output: The model should generate an answer using retrieved context, e.g.:
['FAISS is a library for efficient similarity search and clustering of dense vectors.']

Screenshot description: Terminal output showing the answer generated by the pipeline in response to the user query.
Try Another Query
```
query = "What does RAG stand for?"
result = rag_pipeline.run(query=query)
print(result["results"])
    
```
Screenshot description: Terminal output with the pipeline generating the correct expansion of "RAG" and a brief explanation.

5. Customizing and Extending Your Pipeline

Swap in a Different Retriever or LLM
You can use other embedding models (e.g., BAAI/bge-base-en) or LLMs (e.g., HuggingFace’s tiiuae/falcon-7b-instruct). Adjust embedding_dim and model_name_or_path accordingly.

Add a Ranker or Filter


from haystack.nodes import TransformersRanker

ranker = TransformersRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-6-v2")
rag_pipeline.add_node(component=ranker, name="Ranker", inputs=["Retriever"])
rag_pipeline.connect("Ranker", "Generator")

This improves answer relevance by re-ranking retrieved documents before generation.

For more on scaling and optimizing large RAG deployments, see Scaling RAG for 100K+ Documents: Sharding, Caching, and Cost Control.

Experiment with Prompt Engineering
Adjust the PromptTemplate for your use-case, or try techniques from Reducing Hallucinations in RAG Workflows: Prompting and Retrieval Strategies for 2026.

Common Issues & Troubleshooting

ImportError: No module named 'haystack'
Solution: Double-check your virtual environment is activated and run
```
pip install farm-haystack[faiss,openai]==2.0.0
```
OpenAI authentication errors
Solution: Ensure your OPENAI_API_KEY is set in your environment. On Windows, use
```
set OPENAI_API_KEY=your-key
```
FAISS errors about embedding dimensions
Solution: Make sure embedding_dim in FAISSDocumentStore matches your EmbeddingRetriever’s model.
Pipeline returns empty or irrelevant answers
Solution: Check your data is loaded correctly, embeddings are up-to-date, and try a different embedding model or prompt template.
OutOfMemory or CUDA errors (on GPU)
Solution: Use smaller models, batch queries, or run on CPU by setting CUDA_VISIBLE_DEVICES="".

Next Steps

You’ve now built a working custom RAG pipeline with Haystack v2! From here, you can:

Scale up with larger document stores, sharding, and caching (see Scaling RAG for 100K+ Documents: Sharding, Caching, and Cost Control).
Experiment with advanced prompting and retrieval strategies to reduce hallucinations (Reducing Hallucinations in RAG Workflows: Prompting and Retrieval Strategies for 2026).
Integrate your pipeline into a REST API or chatbot UI.
Explore other vector stores (Weaviate, Milvus), LLMs, or multi-modal pipelines.
Dig deeper into RAG architecture and best practices in our Ultimate Guide to RAG Pipelines.

RAG is a fast-moving field—keep experimenting, and join the Haystack and open-source communities to stay up to date!

Building a Custom RAG Pipeline: Step-by-Step Tutorial with Haystack v2

Prerequisites

1. Environment Setup

2. Prepare Your Data

3. Build the Haystack Pipeline

4. Run Inference: Ask Questions!

5. Customizing and Extending Your Pipeline

Common Issues & Troubleshooting

Next Steps

Related Articles

Put your brand in front of 10,000+ tech professionals

Stay ahead of the tech curve

Building a Custom RAG Pipeline: Step-by-Step Tutorial with Haystack v2

Prerequisites

1. Environment Setup

2. Prepare Your Data

3. Build the Haystack Pipeline

4. Run Inference: Ask Questions!

5. Customizing and Extending Your Pipeline

Common Issues & Troubleshooting

Next Steps

Continue Reading

Related Articles

Tools & Software

Guides & Playbooks

Put your brand in front of 10,000+ tech professionals

Stay ahead of the tech curve