Prompt Engineering for Document AI: Real-World Templates for Approval and Extraction

Unlock performance and accuracy in document AI workflows with proven prompt templates for extraction and approval tasks.

Document AI is rapidly transforming how organizations handle unstructured information, automating everything from invoice approvals to contract data extraction. As we covered in our complete guide to automating complex document workflows with AI, prompt engineering is the linchpin for unlocking real-world value from these systems. This tutorial provides a practical, hands-on deep dive into prompt engineering for Document AI—focusing on approval and extraction use cases, with reusable templates, code examples, and troubleshooting tips.

Whether you’re integrating LLMs into your document workflow, building custom extraction pipelines, or seeking robust approval automation, this guide delivers actionable steps. For a broader view of available platforms and compliance considerations, see our sibling articles: AI Document Workflow Tools: A 2026 Buyer’s Guide and Automating Document Workflows in Regulated Industries: AI Compliance Techniques That Work.

Prerequisites

Python 3.10+ (tested with Python 3.11)
OpenAI API access (or Azure OpenAI, or Google Vertex AI with Gemini)
openai Python package (v1.2+)
Basic knowledge of prompt engineering concepts (see: Prompt Engineering for Approval Workflows: Patterns, Anti-Patterns, and Real-World Templates)
Familiarity with requests and json libraries
Sample documents for testing (PDF or plain text)

1. Setting Up Your Environment

Install Python and required packages:

python3 -m venv .venv
source .venv/bin/activate
pip install openai==1.2.3 python-dotenv

If you plan to parse PDFs, also install pypdf:

pip install pypdf

Configure your API key:

Create a .env file in your project directory:

OPENAI_API_KEY=sk-...

Load the key in your Python scripts using python-dotenv:


from dotenv import load_dotenv
import os
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")

2. Extracting Text from Documents

Extract text from a PDF (optional):


from pypdf import PdfReader

def extract_pdf_text(pdf_path):
    reader = PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text() + "\n"
    return text

doc_text = extract_pdf_text("sample_invoice.pdf")

For plain text files, simply read with open("file.txt").read().

3. Designing Effective Prompts for Document Extraction

Prompt engineering is all about clarity and structure. For document extraction, we want to guide the LLM to return structured outputs (ideally JSON) and to ignore irrelevant content. Below are real-world prompt templates for extracting key fields from invoices and contracts.

Invoice Extraction Prompt Template:


extraction_prompt = f"""
You are an expert in document data extraction. Extract the following fields from the document below:
- Invoice Number
- Invoice Date
- Vendor Name
- Total Amount

Return your answer as a valid JSON object with keys: invoice_number, invoice_date, vendor_name, total_amount.

Document:
\"\"\"
{doc_text}
\"\"\"
"""

Contract Extraction Prompt Template:


contract_prompt = f"""
Extract the following key terms from the contract below:
- Effective Date
- Termination Clause (copy the full clause)
- Governing Law

Return as JSON with keys: effective_date, termination_clause, governing_law.

Contract:
\"\"\"
{doc_text}
\"\"\"
"""

4. Running Extraction with OpenAI GPT

We'll use the openai.ChatCompletion API for structured extraction, which supports models like gpt-4-turbo or gpt-3.5-turbo.

Send the prompt to the API:


import openai

response = openai.ChatCompletion.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant for document data extraction."},
        {"role": "user", "content": extraction_prompt}
    ],
    temperature=0.0,
    max_tokens=512
)

extracted_json = response.choices[0].message.content
print(extracted_json)

Tip: Use temperature=0.0 for deterministic outputs.

Parse and validate the JSON output:


import json

try:
    data = json.loads(extracted_json)
    print("Extracted fields:", data)
except json.JSONDecodeError:
    print("Output is not valid JSON. Raw output:", extracted_json)

5. Prompt Templates for Approval Automation

Approval workflows often require the LLM to decide if a document meets certain criteria and to provide justification. Here are robust prompt templates for such scenarios, inspired by patterns from our deep dive on generative AI prompt engineering for approval workflow automation.

Approval Decision Prompt Template:


approval_prompt = f"""
You are an automated approval assistant. Review the following document and determine if it should be approved based on these criteria:
- The invoice amount is less than $10,000
- The invoice date is within the last 90 days

Return your answer as JSON:
{{
  "approved": true/false,
  "justification": "Explain your decision"
}}

Document:
\"\"\"
{doc_text}
\"\"\"
"""

Send the approval prompt and parse results:


response = openai.ChatCompletion.create(
    model="gpt-4-turbo",
    messages=[
        {"role": "system", "content": "You are an approval automation assistant."},
        {"role": "user", "content": approval_prompt}
    ],
    temperature=0.0,
    max_tokens=300
)

approval_result = response.choices[0].message.content
try:
    result = json.loads(approval_result)
    print(f"Approved: {result['approved']}\nJustification: {result['justification']}")
except json.JSONDecodeError:
    print("Approval output is not valid JSON. Raw output:", approval_result)

6. Advanced Prompt Engineering Patterns

For more complex workflows, consider these enhancements:

Few-shot prompting: Provide 1-2 example Q&A pairs to improve accuracy.
Chain-of-Thought reasoning: Ask the model to explain its reasoning step by step before outputting a final answer.
Schema enforcement: Use response_format={"type": "json_object"} (if your LLM supports it) to force valid JSON output.



cot_approval_prompt = f"""
You are an approval automation assistant. Review the following document and determine if it should be approved based on these criteria:
- The invoice amount is less than $10,000
- The invoice date is within the last 90 days

First, explain your reasoning step by step. Then, return your answer as JSON:
{{
  "approved": true/false,
  "justification": "Your reasoning here"
}}

Document:
\"\"\"
{doc_text}
\"\"\"
"""

7. Testing and Evaluating Your Prompts

Prepare a test corpus: Gather a set of real-world sample documents (invoices, contracts, etc.).

Automate prompt testing:


test_docs = ["sample_invoice1.pdf", "sample_invoice2.pdf"]
for doc_path in test_docs:
    doc_text = extract_pdf_text(doc_path)
    prompt = extraction_prompt.format(doc_text=doc_text)
    # ...send to API and evaluate results...

Evaluate accuracy: Compare extracted fields or approval decisions to ground truth data.
Iterate on prompts: Adjust instructions, add examples, or clarify schema as needed.

Common Issues & Troubleshooting

Model returns unstructured or partial output:
- Use explicit instructions: “Return your answer as a valid JSON object with these keys...”
- Set temperature=0.0 for more deterministic results.
Invalid JSON returned:
- Post-process output with json.loads() and handle exceptions.
- Try schema enforcement or few-shot examples.
Hallucinated data (fields invented):
- Instruct the model: “If a field is missing, set its value to null.”
API rate limits or timeouts:
- Implement retry logic and exponential backoff.
Extraction accuracy varies by document type:
- Segment your prompts by document type, or use model fine-tuning for high-volume use cases.
Security and compliance:
- Never send sensitive documents to public APIs without proper encryption and compliance checks. See our guide to AI compliance in regulated industries.

Next Steps

You’ve now built a foundation for prompt engineering in Document AI—covering both extraction and approval automation, with practical templates and troubleshooting strategies. For more advanced workflows, explore chaining multiple prompts, integrating with RAG (Retrieval-Augmented Generation), and leveraging specialized document AI platforms (see: AI Document Workflow Tools: A 2026 Buyer’s Guide). For a broader strategic perspective, revisit our pillar article on automating complex document workflows with AI.

To deepen your expertise, check out related guides on prompt engineering for real-time incident response workflows and our explainer on extracting data from unstructured documents with AI-powered workflow solutions.

Prompt engineering is an iterative process. Test, refine, and adapt your templates as your document workflows evolve—and stay tuned for more playbooks from Tech Daily Shot.

Prompt Engineering for Document AI: Real-World Templates for Approval and Extraction

Prerequisites

1. Setting Up Your Environment

2. Extracting Text from Documents

3. Designing Effective Prompts for Document Extraction

4. Running Extraction with OpenAI GPT

5. Prompt Templates for Approval Automation

6. Advanced Prompt Engineering Patterns

7. Testing and Evaluating Your Prompts

Common Issues & Troubleshooting

Next Steps

Related Articles

Put your brand in front of 10,000+ tech professionals

Stay ahead of the tech curve

Prompt Engineering for Document AI: Real-World Templates for Approval and Extraction

Prerequisites

1. Setting Up Your Environment

2. Extracting Text from Documents

3. Designing Effective Prompts for Document Extraction

4. Running Extraction with OpenAI GPT

5. Prompt Templates for Approval Automation

6. Advanced Prompt Engineering Patterns

7. Testing and Evaluating Your Prompts

Common Issues & Troubleshooting

Next Steps

Continue Reading

Related Articles

Tools & Software

Guides & Playbooks

Put your brand in front of 10,000+ tech professionals

Stay ahead of the tech curve