graphwiz.ai

Introduction

Agentic AI systems don't just respond—they plan, execute, observe, and iterate. Unlike traditional chatbots that answer in a single turn, agents pursue goals over multiple steps, use external tools, maintain state, and make decisions autonomously.

This shift demands a corresponding shift in how we prompt. Prompting an agent isn't conversation—it's programming behavior.

Below are ten proven techniques for engineering prompts that make agentic systems more reliable, grounded, and effective.

1. The ROC Pattern: Role + Objective + Criteria

The single most important structure for agentic prompts. Define who the agent is, what it must achieve, and how success is measured.

Template

You are a [specific role with expertise].

Objective: [concrete, measurable goal]

Success Criteria:
- [observable condition 1]
- [observable condition 2]

Constraints:
- [boundary 1]
- [boundary 2]

Example

You are an autonomous competitive research agent.

Objective: Produce a feature comparison between products A and B.

Success Criteria:
- Cover at least 5 feature categories
- Include pricing information
- Cite primary sources for each claim

Constraints:
- Do not include promotional language
- Flag uncertainty explicitly
- Maximum 2 pages

Why It Works

Agentic models optimize behavior when goals are explicit and measurable. Vague objectives like "research this topic" produce vague outputs. The ROC pattern creates a clear optimization target.

2. Hierarchical Task Decomposition

Agents fail when they try to solve everything at once. Force planning before execution.

Pattern

Before taking any action:
1. Generate a high-level plan
2. Break the plan into atomic subtasks
3. Identify dependencies between subtasks
4. Execute sequentially, verifying after each step

After each subtask:
- Confirm completion before proceeding
- If blocked, revise the plan

Example

Task: Investigate why API latency increased 40% this week.

Plan:
├── Subtask 1: Pull metrics from monitoring dashboard
├── Subtask 2: Identify which endpoints degraded
├── Subtask 3: Check deployment history for changes
├── Subtask 4: Correlate with infrastructure events
└── Subtask 5: Synthesize findings and root cause

Execute each subtask. After each, verify the output is actionable before continuing.

Why It Works

This mirrors hierarchical reinforcement learning and reduces cognitive load. Errors caught early don't cascade. The agent can course-correct mid-task rather than producing a completely wrong final output.

3. Tool-Use Contracts

Agents hallucinate when they guess instead of verify. Define explicit triggers for tool use.

Pattern

Available tools: [list tools]

Tool usage rules:
- Use [Tool A] when [condition]
- Use [Tool B] when [condition]
- Never guess when a tool can provide the answer
- Always prefer tools over internal knowledge for [category]

Example

Available tools:
- web_search: For current information, news, recent events
- code_interpreter: For calculations, data analysis, plotting
- file_read: For accessing provided documents

Rules:
- If information may have changed since training → web_search
- If numerical accuracy matters → code_interpreter
- If the answer exists in provided files → file_read
- Never synthesize data you cannot verify

Why It Works

Explicit tool contracts reduce hallucination by creating decision trees. The agent doesn't have to infer when to act—it follows a specification.

4. State Management Instructions

Long-horizon tasks require memory management. Tell the agent what to track, update, and discard.

Pattern

Maintain a scratchpad with:
- [tracked item 1]
- [tracked item 2]

Update rules:
- After each action, update relevant entries
- When assumptions change, mark them as invalidated
- Before final output, verify all entries are current

Example

Maintain a research scratchpad:

ASSUMPTIONS:
- User needs enterprise-grade solution
- Budget is flexible
- Timeline: Q2 2026

UNCERTAINTIES:
- Regulatory requirements: unknown
- Integration complexity: estimated

Update this after each research step. If new information contradicts an assumption, mark it INVALID and note the revision.

Why It Works

Without explicit memory management, agents either forget critical context or get overwhelmed by accumulated state. The scratchpad pattern creates structured, updatable memory.

5. Reflection and Self-Critique Loops

Agents improve when they evaluate their own work before delivering.

Pattern

After completing the main task:

1. Self-Critique:
   - What assumptions did I make?
   - Where might I be wrong?
   - What did I fail to consider?

2. Revision:
   - Address the top 2-3 weaknesses
   - Produce an improved final answer

3. Confidence Rating:
   - Assign confidence: High / Medium / Low
   - Explain the rating

Example

[After producing initial analysis]

Self-Critique:
- I assumed the data is complete—need to verify
- I didn't check for seasonal patterns
- My confidence in claim #3 is lower than the others

Revision:
- Added caveat about data completeness
- Flagged seasonal analysis as out of scope
- Softened language on claim #3

Final Confidence: Medium (requires domain expert review)

Why It Works

Self-critique catches errors that single-pass generation misses. It's a lightweight alternative to human-in-the-loop verification.

6. Decision Thresholds and Stop Conditions

Agents can over-optimize or get stuck in loops. Define when to stop and when to escalate.

Pattern

Stopping rules:
- If confidence < [threshold] → state uncertainty and stop
- If [condition] → escalate to human
- If [repetition detected] → summarize progress and ask for guidance

Never:
- Continue past [N] iterations without progress
- Fabricate information when uncertain

Example

Stopping rules:
- Confidence < 70% → "I cannot answer this with sufficient confidence. Here's what I found..."
- Conflicting sources → "Sources disagree. Summarizing both perspectives..."
- No new information after 3 iterations → "I've exhausted available sources. Recommending: [action]."

Escalation triggers:
- Safety-critical domain without verified data
- User request outside defined scope

Why It Works

Explicit thresholds prevent infinite loops and overconfident wrong answers. The agent knows when to say "I don't know."

7. Environment and Action Constraints

Restrict the action space to prevent unsafe or irrelevant behavior.

Pattern

Allowed actions:
- [action 1]
- [action 2]
- [action 3]

Forbidden actions:
- [forbidden 1]
- [forbidden 2]

Example

You are a medical information agent.

Allowed:
- Search peer-reviewed literature
- Summarize findings with citations
- Suggest questions to ask a doctor

Forbidden:
- Provide diagnostic conclusions
- Recommend treatments
- Interpret lab results
- Speak with authority on uncertain topics

Why It Works

This mirrors constrained Markov Decision Processes in control theory. By limiting the action space, you prevent the agent from taking damaging actions even if it misinterprets the goal.

8. Structured Output Schemas

Machine-readable outputs enable agent composition and automated verification.

Pattern

Output must conform to this schema:

{
  "field_1": "type",
  "field_2": "type",
  "field_3": "type"
}

Example

Output schema:

{
  "answer": "string - the direct answer to the question",
  "confidence": "float between 0 and 1",
  "sources": ["array of citation strings"],
  "caveats": ["array of limitations or uncertainties"],
  "follow_up_questions": ["optional: questions that remain"]
}

Why It Works

Schemas force the agent to complete all required fields. They make outputs parseable by downstream systems and enable automated quality checks.

9. Resource Budgets

Explicit limits align agent behavior with bounded rationality.

Pattern

Budget:
- Maximum [N] tool calls
- Maximum [M] reasoning steps
- Time limit: [duration]

Optimization priority: [accuracy | speed | thoroughness]

Example

Budget:
- 5 web searches maximum
- 15 reasoning steps maximum
- Prioritize accuracy over speed

If budget exhausted without resolution:
- Summarize progress
- State what additional resources would help
- Provide best-available answer with confidence rating

Why It Works

Without budgets, agents can waste resources on diminishing returns. Explicit constraints force prioritization and prevent runaway processes.

10. Multi-Agent Coordination Patterns

Complex tasks benefit from specialized roles working together.

Pattern

Agent A (Planner): Decomposes tasks, assigns subtasks
Agent B (Executor): Performs assigned work
Agent C (Critic): Reviews outputs, challenges assumptions

Coordination protocol:
1. Planner creates plan
2. Executor works through plan
3. Critic reviews each output
4. If Critic rejects → back to Planner for revision

Example

Research Agent Team:

PLANNER: Break research questions into searchable sub-queries
RESEARCHER: Execute searches, gather sources, extract facts
SYNTHESIZER: Combine findings into coherent answer
CRITIC: Check for gaps, contradictions, weak evidence

Workflow:
Planner → Researcher (x3) → Synthesizer → Critic → [revise if needed]

Why It Works

Multi-agent systems exploit ensemble effects. Specialized agents perform better at their domain than generalist agents. Critic roles catch blind spots.

The Gold Standard Agentic Prompt

Combining these techniques:

You are an autonomous research agent specializing in [domain].

## OBJECTIVE
Produce a [specific deliverable] that answers [question].

## PROCESS
1. Generate a plan with atomic subtasks
2. Execute each subtask using appropriate tools
3. Maintain a scratchpad of assumptions and uncertainties
4. After completion, self-critique and revise
5. Rate confidence and state limitations

## TOOLS
- web_search: for current information
- code_interpreter: for calculations
- file_read: for provided documents

Use tools when:
- Information may be outdated → web_search
- Numerical precision matters → code_interpreter
- Answer exists in provided files → file_read

## CONSTRAINTS
- Cite all claims
- State uncertainty explicitly
- Do not speculate
- Maximum 5 tool calls
- Maximum 10 reasoning steps

## OUTPUT
{
  "answer": "...",
  "confidence": 0.0-1.0,
  "sources": [...],
  "limitations": [...],
  "follow_up": [...]
}

## STOPPING CONDITIONS
- If confidence < 70%: state uncertainty clearly
- If sources conflict: present both perspectives
- If budget exhausted: summarize progress and gaps

Key Takeaways

Technique	What It Solves
ROC Pattern	Vague goals → Measurable targets
Task Decomposition	Overwhelm → Manageable steps
Tool Contracts	Hallucination → Verified information
State Management	Memory failures → Structured tracking
Self-Critique	Single-pass errors → Revised outputs
Stop Conditions	Infinite loops → Graceful termination
Action Constraints	Unsafe behavior → Bounded actions
Output Schemas	Unstructured outputs → Composable results
Resource Budgets	Runaway costs → Efficient execution
Multi-Agent	Blind spots → Specialized expertise

Final Thought

Prompting agentic AI is systems design, not conversation. You're not asking a question—you're specifying behavior, constraints, and feedback loops.

The best prompts read less like instructions and more like specifications: precise, bounded, and verifiable.

Invest time in the prompt. The agent will repay it.