As AI models become deeply integrated into business and developer workflows, orchestrating multi-stage automation with prompt engineering is critical for reliability and scalability. This tutorial walks you through designing, implementing, and testing prompt engineering patterns that enable robust, multi-step workflow automation using Large Language Models (LLMs) and orchestration frameworks.
Prerequisites
- Python: Version 3.8 or higher
- OpenAI API: Access and API key (or similar LLM provider)
- LangChain: Version 0.0.350+ (LangChain docs)
- Pydantic: For data validation
- Basic knowledge of:
- Prompt engineering concepts
- API usage and Python scripting
- Workflow automation patterns
- Terminal/CLI access
1. Set Up Your Environment
-
Create a Python virtual environment:
python3 -m venv venv source venv/bin/activate
-
Install required libraries:
pip install openai langchain pydantic
-
Configure your OpenAI API key:
export OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Replace
sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxwith your actual API key. -
Verify installation:
python -c "import openai, langchain, pydantic; print('All good!')"
2. Understand Multi-Stage Workflow Patterns
Multi-stage orchestration means chaining several LLM tasks, where the output of one becomes the input for the next. Key prompt engineering patterns include:
- Chain-of-Thought: Guide the LLM to reason step by step.
- Structured Output: Use explicit output formats (e.g., JSON, YAML) for reliable parsing.
- Function Calling: Let the LLM trigger code or API calls at each stage.
- Guardrails & Validation: Validate LLM outputs before passing to the next step.
3. Design a Sample Multi-Stage Workflow
We'll build a workflow that:
- Extracts tasks from a project description (Stage 1)
- Assigns priorities to each task (Stage 2)
- Generates a summary report (Stage 3)
This is a common pattern in project management automation.
3.1 Define the Workflow Stages
- Stage 1 Prompt: Extract tasks in a structured JSON format.
- Stage 2 Prompt: Assign priorities to each task, again using structured output.
- Stage 3 Prompt: Summarize the prioritized tasks as a project manager.
4. Implement Stage 1: Task Extraction with Structured Prompts
-
Create the extraction prompt:
TASK_EXTRACTION_PROMPT = """ You are an expert project analyst. Extract the main tasks from the following project description. Return the tasks as a JSON list of objects with "task" and "description" fields. Project Description: {project_description} Example Output: [ {{"task": "Design UI", "description": "Create wireframes for the new dashboard."}}, {{"task": "Set up database", "description": "Initialize PostgreSQL and define schema."}} ] """ -
Call the LLM and parse the output:
import openai import json def extract_tasks(project_description): prompt = TASK_EXTRACTION_PROMPT.format(project_description=project_description) response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0.2, max_tokens=500 ) # Extract JSON from the response text = response.choices[0].message.content.strip() try: tasks = json.loads(text) return tasks except json.JSONDecodeError: raise ValueError(f"Could not parse JSON from LLM output: {text}") -
Test with a sample input:
project_description = "Develop a web app for time tracking. Include user authentication, reporting, and integration with Slack." tasks = extract_tasks(project_description) print(tasks)Expected Output (example):
[ {"task": "Develop authentication", "description": "Implement user login and registration."}, {"task": "Build reporting", "description": "Create reports for tracked time."}, {"task": "Integrate Slack", "description": "Enable notifications via Slack integration."} ]
5. Implement Stage 2: Prioritization with Guardrails
-
Define a validation schema using Pydantic:
from pydantic import BaseModel, ValidationError from typing import List class Task(BaseModel): task: str description: str priority: str class TaskList(BaseModel): __root__: List[Task] -
Create the prioritization prompt:
PRIORITIZATION_PROMPT = """ You are a project manager. Assign a priority ("High", "Medium", "Low") to each task. Return the updated list as JSON. Tasks: {tasks_json} Example Output: [ {{"task": "Develop authentication", "description": "Implement user login and registration.", "priority": "High"}}, {{"task": "Build reporting", "description": "Create reports for tracked time.", "priority": "Medium"}} ] """ -
Prompt the LLM and validate output:
def prioritize_tasks(tasks): tasks_json = json.dumps(tasks, indent=2) prompt = PRIORITIZATION_PROMPT.format(tasks_json=tasks_json) response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0.2, max_tokens=500 ) text = response.choices[0].message.content.strip() try: prioritized = json.loads(text) # Validate using Pydantic TaskList.parse_obj(prioritized) return prioritized except (json.JSONDecodeError, ValidationError) as e: raise ValueError(f"Invalid prioritized tasks output: {e}\n{text}") -
Test the prioritization step:
prioritized_tasks = prioritize_tasks(tasks) print(prioritized_tasks)Expected Output (example):
[ {"task": "Develop authentication", "description": "Implement user login and registration.", "priority": "High"}, {"task": "Build reporting", "description": "Create reports for tracked time.", "priority": "Medium"}, {"task": "Integrate Slack", "description": "Enable notifications via Slack integration.", "priority": "Low"} ]
6. Implement Stage 3: Summary Generation with Chain-of-Thought
-
Create the summary prompt with explicit reasoning:
SUMMARY_PROMPT = """ You are a senior project manager. Summarize the prioritized tasks for an executive summary. Explain why each task has its assigned priority. Prioritized Tasks: {prioritized_tasks_json} Summary: """ -
Call the LLM for summary generation:
def summarize_tasks(prioritized_tasks): prioritized_json = json.dumps(prioritized_tasks, indent=2) prompt = SUMMARY_PROMPT.format(prioritized_tasks_json=prioritized_json) response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[{"role": "user", "content": prompt}], temperature=0.3, max_tokens=300 ) return response.choices[0].message.content.strip() -
Test the summary stage:
summary = summarize_tasks(prioritized_tasks) print(summary)Expected Output (example):
The most critical task is developing authentication, as secure user access forms the foundation of the application ("High" priority). Reporting is assigned "Medium" priority since it provides valuable insights but can be built iteratively. Slack integration is "Low" priority, as it enhances user experience but is not essential for the initial launch.
7. Orchestrate the Multi-Stage Workflow
-
Chain the stages together:
def orchestrate_workflow(project_description): tasks = extract_tasks(project_description) prioritized = prioritize_tasks(tasks) summary = summarize_tasks(prioritized) return { "tasks": tasks, "prioritized": prioritized, "summary": summary } result = orchestrate_workflow( "Develop a web app for time tracking. Include user authentication, reporting, and integration with Slack." ) print(json.dumps(result, indent=2)) -
Observe the output structure:
{ "tasks": [...], "prioritized": [...], "summary": "The most critical task is developing authentication..." }
8. Add Robustness: Error Handling and Re-Prompting
-
Implement automatic re-prompting for invalid outputs:
def safe_extract_tasks(project_description, retries=2): for attempt in range(retries + 1): try: return extract_tasks(project_description) except ValueError as e: if attempt == retries: raise print(f"Retrying task extraction (attempt {attempt+1}): {e}") -
Log errors and alert on repeated failures.
For production, integrate with logging frameworks or monitoring tools for alerting.
Common Issues & Troubleshooting
-
LLM Output Not in JSON:
- Refine your prompt to be more explicit: "Return only JSON, with no commentary."
- Increase
temperature=0.0for more deterministic output.
-
Validation Errors:
- Check your Pydantic schema matches the output structure.
- Print the raw LLM output for debugging.
-
OpenAI API Rate Limits:
- Implement exponential backoff and retry logic.
- Monitor your API usage from the provider dashboard.
-
Chained Errors:
- If a previous stage fails, halt the workflow and log the error clearly.
Next Steps
- Experiment with more complex workflows, e.g., incorporating external API calls or database updates within the chain.
- Explore LangChain's advanced chain patterns for production orchestration.
- Integrate user feedback loops and human-in-the-loop validation for critical decision points.
- Consider deploying your workflow as a REST API using frameworks like FastAPI.
- Stay updated with latest prompt engineering best practices from sources like OpenAI's Prompt Engineering Guide.
By combining structured prompts, validation, and robust orchestration, you can build reliable multi-stage automation pipelines that leverage the full power of LLMs for real-world business workflows.
