How to Fine-Tune LLMs With Your Own Data Using LoRA

Unlock the full power of LLMs: Learn hands-on how to fine-tune models with LoRA on your own data.

Category: Builder's Corner

Keyword: finetune llm lora

Fine-tuning large language models (LLMs) on your own data unlocks custom capabilities, domain-specific expertise, and improved performance for your applications. However, full fine-tuning is resource-intensive. Enter LoRA (Low-Rank Adaptation): a parameter-efficient method that makes LLM fine-tuning accessible even on consumer GPUs.

In this tutorial, you'll learn how to fine-tune an LLM using LoRA, leveraging the Hugging Face ecosystem. We'll cover setup, data preparation, configuration, training, and evaluation — all with reproducible code and commands.

Prerequisites

Hardware:
- Recommended: NVIDIA GPU with ≥8GB VRAM (e.g., RTX 3060 or higher)
- Minimum: Modern CPU (fine-tuning will be slow)
Operating System: Linux (Ubuntu 20.04+), macOS, or Windows (WSL2 recommended)
Python: 3.9 or 3.10
Knowledge:
- Basic Python scripting
- Familiarity with Hugging Face Transformers and Datasets
- Understanding of LLMs and fine-tuning concepts
Software Tools:
- PyTorch 2.0+
- transformers (v4.30+)
- peft (v0.4+)
- datasets
- accelerate
- CUDA toolkit (if using GPU)
Accounts:
- Hugging Face account (to access models and datasets)

1. Environment Setup

Create and activate a Python virtual environment:

python3 -m venv lora-finetune-env
source lora-finetune-env/bin/activate

Upgrade pip and install required packages:

pip install --upgrade pip
pip install torch transformers datasets peft accelerate

Verify CUDA (for GPU acceleration):
```
python -c "import torch; print(torch.cuda.is_available())"
```
Should print True if CUDA is working.

2. Prepare Your Data

Format your data as JSONL or CSV.

For text generation tasks, each example should have an input (prompt/context) and output (desired response).

{"input": "What is LoRA?", "output": "LoRA stands for Low-Rank Adaptation, a parameter-efficient fine-tuning method for LLMs."}
{"input": "Explain fine-tuning.", "output": "Fine-tuning adapts a pre-trained model to a specific task or dataset."}

Place your data file in your project directory (e.g., my_data.jsonl).

Load your dataset using Hugging Face datasets:


from datasets import load_dataset

dataset = load_dataset("json", data_files="my_data.jsonl")
print(dataset)

3. Choose a Base Model

Pick a model checkpoint from Hugging Face Hub.
- Popular options: tiiuae/falcon-7b, meta-llama/Llama-2-7b-hf, mistralai/Mistral-7B-v0.1, etc.
- Check Hugging Face Models for more.

Download and load the model and tokenizer:


from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "mistralai/Mistral-7B-v0.1"  # or your preferred model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")

Tip: Add use_auth_token=True if the model is gated.

4. Apply LoRA With PEFT

Configure LoRA parameters:


from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,  # Rank
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],  # Layer names may vary by model
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

Note: For Llama/Mistral, target modules are usually q_proj and v_proj. For other models, check their architecture for correct target modules.

Wrap the base model with LoRA:
```
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
      
```
This should show only a small number of trainable parameters (the LoRA adapters).

5. Preprocess and Tokenize Your Data

Define a prompt formatting function:


def format_prompt(example):
    return f"### Question:\n{example['input']}\n\n### Answer:\n{example['output']}"

Apply the function and tokenize:


def tokenize_function(example):
    prompt = format_prompt(example)
    return tokenizer(
        prompt,
        truncation=True,
        max_length=512,
        padding="max_length"
    )

tokenized_dataset = dataset["train"].map(tokenize_function)

6. Configure Training Arguments

Set up training hyperparameters:


from transformers import TrainingArguments

training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    warmup_steps=50,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    output_dir="./lora-finetuned-llm",
    save_strategy="epoch",
    evaluation_strategy="no",
    report_to="none"
)

7. Launch Fine-Tuning

Initialize the Trainer and start training:
```
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
)

trainer.train()
      
```
Training progress and loss will be printed to the terminal.

Screenshot description: The terminal displays training progress, including epoch number, step, and loss values.

After training, save the LoRA adapters:


model.save_pretrained("./lora-finetuned-llm")
tokenizer.save_pretrained("./lora-finetuned-llm")

8. Run Inference With Your Fine-Tuned LLM

Load the LoRA-adapted model and generate text:


from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")
lora_model = PeftModel.from_pretrained(base_model, "./lora-finetuned-llm")
lora_model.eval()

prompt = "What is LoRA?"
inputs = tokenizer(prompt, return_tensors="pt").to(lora_model.device)
outputs = lora_model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Screenshot description: Terminal output shows the fine-tuned model's response to the prompt.

Common Issues & Troubleshooting

CUDA out of memory:
- Reduce per_device_train_batch_size or max_length.
- Use gradient_accumulation_steps to maintain effective batch size.
- Try fp16=True for mixed precision.
Model or tokenizer mismatch:
- Ensure you use the same model and tokenizer checkpoints.
Incorrect target modules for LoRA:
- Check your model architecture and set target_modules accordingly.
Dataset mapping errors:
- Check your JSONL/CSV format and ensure fields match your code.
Slow training:
- Ensure GPU is being used (torch.cuda.is_available() is True).
- Close other GPU-intensive applications.

Next Steps

Experiment with different LoRA hyperparameters (r, alpha, dropout).
Try larger or more specialized base models for better results (if hardware allows).
Evaluate your fine-tuned model on held-out data or real-world tasks.
Convert your LoRA adapters to ONNX or other formats for production deployment.
Explore advanced PEFT techniques (e.g., QLoRA, AdaLoRA) for further efficiency.

References:

How to Fine-Tune LLMs With Your Own Data Using LoRA

Prerequisites

1. Environment Setup

2. Prepare Your Data

3. Choose a Base Model

4. Apply LoRA With PEFT

5. Preprocess and Tokenize Your Data

6. Configure Training Arguments

7. Launch Fine-Tuning

8. Run Inference With Your Fine-Tuned LLM

Common Issues & Troubleshooting

Next Steps

Related Articles

Put your brand in front of 10,000+ tech professionals

Stay ahead of the tech curve

How to Fine-Tune LLMs With Your Own Data Using LoRA

Prerequisites

1. Environment Setup

2. Prepare Your Data

3. Choose a Base Model

4. Apply LoRA With PEFT

5. Preprocess and Tokenize Your Data

6. Configure Training Arguments

7. Launch Fine-Tuning

8. Run Inference With Your Fine-Tuned LLM

Common Issues & Troubleshooting

Next Steps

Related Articles

Tools & Software

Guides & Playbooks

Put your brand in front of 10,000+ tech professionals

Stay ahead of the tech curve