Data enrichment is a cornerstone of modern automated workflows, transforming raw, incomplete, or unstructured data into actionable business insights. With the rise of large language models (LLMs), prompt engineering has become a critical skill for automating and scaling data enrichment tasks across industries. This playbook provides a step-by-step, hands-on guide for designing, implementing, and optimizing data enrichment prompts in your AI-powered workflows.
For a comprehensive overview of prompt engineering’s role in end-to-end AI automation, see The Ultimate AI Workflow Prompt Engineering Blueprint for 2026.
Prerequisites
-
Tools & Libraries:
- Python 3.9+
openaiSDK v1.2+ (PyPI link)pandasv1.5+ (for data manipulation)- Command-line interface (Bash, PowerShell, or Terminal)
-
Accounts & API Keys:
- OpenAI API key (or Azure OpenAI key)
-
Knowledge:
- Basic Python scripting
- Familiarity with REST APIs
- Understanding of LLM prompt structure and limitations
1. Define Your Data Enrichment Objectives
-
Identify Enrichment Goals:
- What missing or incomplete data do you want to fill? (e.g., company descriptions, product categories, address standardization)
- What context or sources will the LLM need?
-
Example:
- Enrich a customer database by adding industry categories and LinkedIn URLs based on company names.
-
Tip: Map out your input fields and desired outputs in a table for clarity.
| Company Name | Industry Category | LinkedIn URL | |---------------|------------------|---------------------| | Acme Corp | ? | ? | | DataVision | ? | ? |
2. Prepare and Structure Your Input Data
-
Load your data into a DataFrame:
pip install pandas
import pandas as pd df = pd.read_csv('companies.csv') print(df.head())Screenshot description: DataFrame preview showing company names with missing fields.
-
Clean and normalize input fields:
df['Company Name'] = df['Company Name'].str.strip().str.title() -
Sanity-check for nulls and duplicates:
print(df.isnull().sum()) print(df.duplicated().sum())
3. Design Effective Data Enrichment Prompts
-
Prompt Template Example:
You are a data enrichment assistant. Given a company name, return its industry category and official LinkedIn page URL. Format your answer as JSON: { "industry_category": "...", "linkedin_url": "..." } Company Name: {COMPANY_NAME} -
Dynamic Prompt Generation:
def build_prompt(company_name): return f""" You are a data enrichment assistant. Given a company name, return its industry category and official LinkedIn page URL. Format your answer as JSON: {{ "industry_category": "...", "linkedin_url": "..." }} Company Name: {company_name} """ -
Best Practices:
- Be explicit about output format (e.g., JSON, CSV, key-value pairs).
- Provide clear instructions and sample outputs.
- Consider including context (e.g., "Assume the latest public data is available.").
- For advanced prompt patterns across multi-step data pipelines, see Prompt Engineering for Multi-Step Automated Data Pipelines: Strategies for Accuracy and Speed.
4. Automate LLM Calls for Batch Data Enrichment
-
Install OpenAI SDK:
pip install openai
-
Set your API key as an environment variable:
export OPENAI_API_KEY="sk-..."Windows:set OPENAI_API_KEY="sk-..." -
Batch Processing Script:
import os import openai import pandas as pd import json import time openai.api_key = os.getenv("OPENAI_API_KEY") def build_prompt(company_name): return f""" You are a data enrichment assistant. Given a company name, return its industry category and official LinkedIn page URL. Format your answer as JSON: {{ "industry_category": "...", "linkedin_url": "..." }} Company Name: {company_name} """ def enrich_company(row): prompt = build_prompt(row['Company Name']) try: response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0 ) output = response['choices'][0]['message']['content'] data = json.loads(output) return pd.Series([data.get("industry_category", ""), data.get("linkedin_url", "")]) except Exception as e: print(f"Error: {e}") return pd.Series(["", ""]) df = pd.read_csv('companies.csv') df[['Industry Category', 'LinkedIn URL']] = df.apply(enrich_company, axis=1) df.to_csv('companies_enriched.csv', index=False)Screenshot description: Terminal output showing progress and sample enriched rows in the DataFrame.
-
Rate Limiting: Add
time.sleep(1)between calls to avoid hitting rate limits. - Tip: For workflow orchestration and multi-step enrichment, see Prompt Engineering for Complex Multi-Step AI Workflows: Templates and Best Practices.
5. Validate, Clean, and Post-Process LLM Outputs
-
Validate Output Structure:
def is_valid_linkedin_url(url): return url.startswith("https://www.linkedin.com/company/") df['Valid LinkedIn'] = df['LinkedIn URL'].apply(is_valid_linkedin_url) print(df['Valid LinkedIn'].value_counts()) -
Handle Nulls and Errors:
df = df.dropna(subset=['Industry Category', 'LinkedIn URL']) -
Deduplicate Results:
df = df.drop_duplicates(subset=['Company Name']) -
Export Cleaned Data:
df.to_csv('companies_enriched_clean.csv', index=False)Screenshot description: Cleaned CSV file preview with all enriched fields populated and valid URLs.
- For automated data cleansing prompt templates, see Crafting Effective LLM Prompts for Automated Data Cleansing Workflows.
6. Integrate Data Enrichment into Automated Workflows
-
Connect to Workflow Automation Platforms:
- Airflow, Prefect, n8n, Zapier, or custom Python scripts
-
Example: Airflow DAG for Scheduled Enrichment
from airflow import DAG from airflow.operators.python import PythonOperator from datetime import datetime def run_enrichment(): # (Insert batch enrichment script here) pass with DAG( 'company_data_enrichment', start_date=datetime(2024, 6, 1), schedule_interval='@daily', catchup=False ) as dag: enrich_task = PythonOperator( task_id='enrich_companies', python_callable=run_enrichment ) - Trigger enrichment jobs automatically on new data arrivals.
- For low-code automation approaches, see Prompt Engineering for Low-Code AI Workflow Automation: Templates and Pitfalls.
Common Issues & Troubleshooting
-
LLM Outputs Invalid JSON:
- Use
json.loads()withtry/exceptand fallback parsing. - Explicitly instruct the model to output only JSON.
- Use
-
Hallucinated or Inaccurate Data:
- Set
temperature=0for deterministic outputs. - Cross-check results with external APIs or reference datasets.
- See Prompt Engineering to Reduce Hallucinations in Automated Document Workflows for mitigation tactics.
- Set
-
API Rate Limits:
- Throttle requests using
time.sleep()or batch processing. - Monitor API usage and handle
429 Too Many Requestserrors gracefully.
- Throttle requests using
-
Prompt Token Limit Exceeded:
- Shorten instructions and input fields.
- Chunk large datasets and process in smaller batches.
-
Workflow Integration Fails:
- Check environment variables and API keys in your automation platform.
- Test scripts outside the workflow engine to isolate errors.
- For debugging and optimization, see LLM Prompt Debugging: How to Fix and Optimize Broken Workflow Automations.
Next Steps
- Template Library: Build a reusable library of prompt templates for different enrichment scenarios. See How to Build a Robust Prompt Library for Automated AI Workflows.
- Advanced Automation: Explore multi-modal and multi-step enrichment workflows. See Mastering Multi-Modal Prompts in Workflow Automation: Best Practices for 2026.
- Optimization: Experiment with prompt variants, model selection, and post-processing for accuracy and speed. For comparative analysis, see Prompt Engineering vs. Classic Automation Scripting: Which Is Better for 2026 Workflows?.
- Broader Application: Extend data enrichment techniques to marketing, sales, finance, insurance, and other business domains. See Prompt Engineering for AI Marketing Workflows: 2026’s Most Effective Templates and Prompt Engineering for Finance Workflows: Real-World Templates and Optimization Strategies.
- Stay Updated: As LLMs evolve, revisit your prompt strategies regularly. For ongoing best practices, refer to The Ultimate AI Workflow Prompt Engineering Blueprint for 2026.
Related Reading: Prompt Engineering for Workflow Automation: Tips, Templates, and Prompt Libraries (2026) | Prompt Engineering Tactics for Automated Marketing Campaigns in 2026