Manual invoice processing is tedious, error-prone, and costly. With advances in AI and automation, businesses can now process invoices faster, more accurately, and at scale. In this in-depth tutorial, you’ll learn how to automate invoice processing with AI — extracting data from invoices, validating it, and integrating it with your business systems.
This guide is part of our AI Playbooks series. If you’re looking for a broader overview of how AI is transforming business workflows, see our Definitive Guide to AI Tools for Business Process Automation.
Prerequisites
- Python 3.9+ (tested with 3.10)
- pip (Python package manager)
- Basic Python scripting knowledge
- Familiarity with REST APIs (helpful but not required)
- Sample invoices (PDF or image format, e.g.,
invoice1.pdf,invoice2.png) - Google Cloud Platform account (for Document AI), or an alternative OCR API (e.g., AWS Textract, Tesseract OCR)
- Optional: Familiarity with Robotic Process Automation (RPA) tools such as UiPath or Power Automate. For a comparison, see this RPA leaders review.
Overview: What You’ll Build
We’ll build a Python-based workflow that:
- Uploads invoice PDFs/images to an AI-powered OCR service (Google Document AI)
- Extracts structured data (invoice number, date, total, vendor, line items)
- Validates and normalizes the data
- Exports the data to a CSV (or integrates with your ERP system)
You’ll see code snippets, configuration steps, and troubleshooting tips at each stage.
Step 1: Set Up Your Environment
-
Install Python and pip
Ensure Python is installed:python3 --version
If not installed, download from python.org. -
Create a project folder:
mkdir ai-invoice-processing cd ai-invoice-processing
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate
-
Install required Python packages:
pip install google-cloud-documentai pandas
Note: If you want to use AWS Textract or Tesseract, see their respective SDKs and adapt the code accordingly.
Step 2: Set Up Google Document AI
-
Create a Google Cloud project:
Go to the Google Cloud Console. Create a new project (e.g.,invoice-ai-demo). -
Enable Document AI API:
In your project, navigate toAPIs & Services > Enable APIs and Services, search for Document AI API, and enable it. -
Create a service account and download the key:
- Go to
IAM & Admin > Service Accounts - Create a new service account (e.g.,
invoice-processor) - Assign the Document AI API User role
- Click Keys > Add Key > Create new key (choose JSON)
- Download the JSON key file and save it in your project folder as
service-account.json
- Go to
-
Set your Google credentials environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="service-account.json"
-
Get your Document AI processor ID:
- Go to Document AI Processors
- Create a new processor of type Invoice Parser
- Copy the processor ID (format:
projects/PROJECT_ID/locations/LOCATION/processors/PROCESSOR_ID)
Step 3: Upload and Extract Invoice Data with AI
-
Create a Python script for invoice extraction:
Save the following as
extract_invoice.pyin your project folder.import os from google.cloud import documentai_v1 as documentai import pandas as pd PROJECT_ID = "your-project-id" LOCATION = "us" # or "eu" PROCESSOR_ID = "your-processor-id" INVOICE_FILES = ["invoice1.pdf", "invoice2.png"] # Add your invoice files def process_invoice(file_path): client = documentai.DocumentUnderstandingServiceClient() name = f"projects/{PROJECT_ID}/locations/{LOCATION}/processors/{PROCESSOR_ID}" with open(file_path, "rb") as f: file_content = f.read() raw_document = documentai.RawDocument(content=file_content, mime_type="application/pdf") if file_path.endswith(".png") or file_path.endswith(".jpg"): raw_document = documentai.RawDocument(content=file_content, mime_type="image/png") request = documentai.ProcessRequest( name=name, raw_document=raw_document, ) result = client.process_document(request=request) document = result.document # Extract fields fields = {} for entity in document.entities: fields[entity.type_] = entity.mention_text return fields results = [] for file in INVOICE_FILES: print(f"Processing {file}...") data = process_invoice(file) results.append(data) df = pd.DataFrame(results) df.to_csv("invoices_extracted.csv", index=False) print("Extraction complete. Data saved to invoices_extracted.csv.")Note: Replace
your-project-idandyour-processor-idwith your actual values. -
Run the script:
python extract_invoice.py
The script will process your sample invoices and create
invoices_extracted.csvwith structured data. -
Sample output (CSV):
InvoiceId,InvoiceDate,VendorName,AmountDue,LineItemDescription,LineItemAmount INV-123,2024-06-01,Acme Corp,2500.00,Web Design,2500.00 ...
Screenshot: Extracted invoice data in CSV format, ready for ERP import.
Step 4: Validate and Normalize Extracted Data
-
Check for missing/invalid fields:
import pandas as pd df = pd.read_csv("invoices_extracted.csv") missing = df[df["InvoiceId"].isnull() | df["InvoiceDate"].isnull()] if not missing.empty: print("Warning: Some invoices are missing key fields:") print(missing) -
Normalize date formats and amounts:
import pandas as pd df = pd.read_csv("invoices_extracted.csv") df["InvoiceDate"] = pd.to_datetime(df["InvoiceDate"], errors="coerce").dt.strftime("%Y-%m-%d") df["AmountDue"] = df["AmountDue"].astype(float) df.to_csv("invoices_cleaned.csv", index=False) print("Cleaned data saved to invoices_cleaned.csv.")
Screenshot: Cleaned and normalized invoice data.
Step 5: Integrate with Your Business System
-
Export to CSV for ERP import:
Most ERP/accounting systems can import CSVs. Check your system’s import requirements.
-
Optional: Automate upload with an API
If your ERP (e.g., SAP, NetSuite, QuickBooks) offers an API, you can automate the upload using Python’s
requestslibrary.import requests API_URL = "https://your-erp.com/api/invoices" API_KEY = "your-api-key" with open("invoices_cleaned.csv", "rb") as f: response = requests.post(API_URL, headers={"Authorization": f"Bearer {API_KEY}"}, files={"file": f}) print(response.status_code, response.text) -
Automate the workflow end-to-end:
For full automation (e.g., watch a folder for new invoices, process, and upload), see RPA platforms like UiPath or Power Automate. For a comparison, see UiPath vs. Power Automate.
Common Issues & Troubleshooting
-
Google Document AI authentication errors:
- Make sure
GOOGLE_APPLICATION_CREDENTIALSpoints to the correct JSON key. - Check that your service account has the right permissions.
- Make sure
-
Invoice fields not extracted correctly:
- Check that you’re using the Invoice Parser processor, not the generic parser.
- Some invoice formats are harder to parse; try uploading a clearer PDF or a different sample.
-
Pandas errors (e.g., dtype issues):
- Use
errors="coerce"inpd.to_datetimeto handle invalid dates. - Use
astype(float, errors="ignore")for numeric columns.
- Use
-
API upload failures:
- Check your API endpoint, credentials, and request format.
- Review API documentation for required fields and authentication methods.
Next Steps
- Expand to other document types: Adapt the workflow for receipts, purchase orders, or contracts.
- Integrate approval workflows: Trigger human review for flagged invoices or missing data.
- Explore more advanced AI: Train custom models for your unique invoice layouts.
- Compare with RPA platforms: For more complex automation, see our RPA leaders comparison.
- Broader automation: For a full strategy, see our Definitive Guide to AI Tools for Business Process Automation.
By following these steps, you can automate invoice processing using AI, freeing up your team from repetitive tasks and reducing costly errors. Experiment with the workflow, adapt it to your needs, and explore how AI can streamline other business processes.