Document classification is a foundational step in modern AI-driven workflows, enabling automated routing, compliance checks, and downstream processing. With the rise of large language models (LLMs), prompt engineering has become a critical skill for ensuring high accuracy and reliability in automated document classification. This tutorial provides a hands-on, deep-dive guide to designing, testing, and optimizing prompts for document classification tasks, with actionable code examples and best practices for integration into production workflows.
For a broader context on how document classification fits into end-to-end automation, see our Pillar: The 2026 Guide to Automating AI-Driven Document Workflows Across Industries.
Prerequisites
- Python 3.9+ installed (
python --version) - OpenAI Python SDK (
openai), Version 1.0.0+ - Pandas for data handling (
pip install pandas) - A valid OpenAI API key (or similar LLM provider)
- Familiarity with basic Python scripting
- Basic understanding of document workflow automation concepts
If you're new to prompt engineering in compliance contexts, check out our Best Practices for Prompt Engineering in Compliance Workflow Automation.
Step 1: Define Your Document Classes and Requirements
-
List Target Classes
Start by defining the exact classes your workflow needs. For example:- Invoice
- Purchase Order
- Contract
- Resume
- Other
Tip: Be as specific as possible. Overly broad or ambiguous classes lead to low classification accuracy.
-
Document Requirements
Note any requirements, such as minimum accuracy, explainability, or regulatory constraints.
Step 2: Prepare Example Documents for Prompt Engineering
-
Collect Representative Samples
Gather 5-10 real examples for each class. Store them as plain text files or in a spreadsheet.examples/ invoice_1.txt invoice_2.txt contract_1.txt resume_1.txt ...Why? These samples help you craft realistic prompts and test LLM performance.
Step 3: Engineer and Test Few-Shot Prompts
-
Design a Clear, Structured Prompt
Use few-shot learning by providing labeled examples in your prompt. Here's a template:You are an expert document classifier. Classify the following document into one of these classes: Invoice, Purchase Order, Contract, Resume, Other. Examples: Document: "ABC Corp Invoice #12345 for services rendered on 2023-05-01." Class: Invoice Document: "Employment Agreement between XYZ Ltd and John Doe, signed 2022-11-15." Class: Contract Document: "Jane Smith, Software Engineer. Skills: Python, AI, Cloud Computing." Class: Resume Now, classify this document: "{DOCUMENT_TEXT}" Class:Best Practice: Always use explicit instructions and a fixed output format.
-
Test the Prompt with OpenAI API
Save your test document astest_doc.txt. Use the following Python script:import openai with open("test_doc.txt") as f: test_text = f.read() prompt = f""" You are an expert document classifier. Classify the following document into one of these classes: Invoice, Purchase Order, Contract, Resume, Other. Examples: Document: "ABC Corp Invoice #12345 for services rendered on 2023-05-01." Class: Invoice Document: "Employment Agreement between XYZ Ltd and John Doe, signed 2022-11-15." Class: Contract Document: "Jane Smith, Software Engineer. Skills: Python, AI, Cloud Computing." Class: Resume Now, classify this document: "{test_text}" Class: """ response = openai.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0 ) print(response.choices[0].message.content.strip())Note: Set
temperature=0for deterministic output.
Step 4: Batch Evaluation and Iteration
-
Automate Batch Testing
Test your prompt on multiple samples to measure accuracy. Example script:import os import pandas as pd import openai def classify_doc(text, prompt_template): prompt = prompt_template.format(DOCUMENT_TEXT=text) response = openai.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0 ) return response.choices[0].message.content.strip() prompt_template = """You are an expert document classifier. ... (same as above) ...""" results = [] for fname in os.listdir("examples"): with open(os.path.join("examples", fname)) as f: text = f.read() predicted = classify_doc(text, prompt_template) true_label = fname.split("_")[0].capitalize() results.append({"file": fname, "true": true_label, "predicted": predicted}) df = pd.DataFrame(results) accuracy = (df["true"] == df["predicted"]).mean() print(df) print(f"Accuracy: {accuracy:.2%}")Tip: Analyze misclassifications and refine your prompt or classes.
Step 5: Integrate with Automated Workflows
-
Wrap Classification as a Function
Create a reusable function for your workflow system:def classify_document(document_text): prompt = f"""You are an expert document classifier. ... """ response = openai.chat.completions.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0 ) return response.choices[0].message.content.strip() -
Call from Workflow Orchestrator
Integrate this function into your orchestration system (e.g., Airflow, Zapier, custom Python).def classify_and_route(**context): doc_text = context['ti'].xcom_pull(key='document_text') doc_class = classify_document(doc_text) # Route based on doc_class...
Step 6: Optimize and Monitor Over Time
-
Monitor Real-World Performance
Log classified documents and review misclassifications regularly. -
Iterate Prompt and Examples
Periodically update your prompt and examples based on new document types or edge cases. -
Consider Prompt Chaining for Complex Workflows
For multi-stage classification (e.g., first detect "Legal Document" vs "Financial Document," then subclassify), see Prompt Chaining Tactics: Building Reliable Multi-Stage AI Workflows (2026 Best Practices).
Common Issues & Troubleshooting
-
Model Returns Unexpected Classes
Solution: Ensure your prompt lists only allowed classes and provides clear, unambiguous examples. -
Inconsistent Output Format
Solution: Explicitly specify the output format in your prompt (e.g.,Class: [class_name]). -
Low Accuracy on Real Documents
Solution: Add more representative examples, especially for edge cases. Consider splitting classes if needed. -
API Rate Limits or Cost Overruns
Solution: Batch requests, cache results, or use a smaller/faster model for initial triage. -
Security or Compliance Concerns
Solution: Mask or redact sensitive data before sending to third-party APIs. Review Optimizing AI Document Workflows for Healthcare: Compliance, Security, and Clinical Outcomes for more.
Next Steps
- Expand to Multi-Modal Inputs: For workflows involving scanned documents or images, see Beyond OCR: Next-Gen IDP Solutions for AI Workflow Automation in 2026 and Mastering Multi-Modal Prompts in Workflow Automation: Best Practices for 2026.
- Automate Downstream Actions: Use class predictions to trigger approval, routing, or review workflows. See Automating Contract Review with AI: Tools, Best Practices, and Workflow Templates (2026).
- Continue Learning: For a deep understanding of prompt engineering in complex, regulated environments, refer to our Pillar: The 2026 Guide to Automating AI-Driven Document Workflows Across Industries.