Skip to content
JSBlogs
Go back

AI agent patterns — when to use simple chains, RAG, or full agents

Module 6 covered tools and agents in depth. Before closing, it is worth stepping back and asking the question that should come first: do you even need an agent?

The answer is often no. Most AI features are simpler than they look. Reaching for an agent when a single LLM call would do adds latency, cost, and failure modes without meaningful benefit.

This post is a decision guide.

Table of contents

Open Table of contents

The complexity spectrum

AI application patterns exist on a spectrum from simple to complex:

Single call → Prompt chain → RAG → RAG + Tools → Full agent
  (cheapest,                                      (most flexible,
   fastest)                                        most complex)

Move right only when the feature genuinely requires it. Each step adds:

Pattern 1 — Single LLM call

Use when: The question can be answered from training data or from text you provide directly in the prompt.

String answer = chatClient.prompt()
    .user("Classify this review as POSITIVE, NEGATIVE, or NEUTRAL: " + reviewText)
    .call()
    .content();

This handles:

A single call is deterministic in structure (though not in content), cheap, fast, and easy to test. Use it whenever possible.

Pattern 2 — Prompt chain

Use when: The output of one LLM call should be the input to another — where each step has a clear transformation.

// Step 1: classify the customer message
String category = chatClient.prompt()
    .user("Classify as: ORDER_STATUS | RETURN | PRODUCT_QUESTION | COMPLAINT | OTHER\n\n" + message)
    .call().content().strip();

// Step 2: route to category-specific handler
String response = switch (category) {
    case "ORDER_STATUS" -> orderStatusPrompt(message);
    case "RETURN"       -> returnPolicyPrompt(message);
    default             -> generalSupportPrompt(message);
};

Chains give you explicit control at each step. Each step is independently testable. The flow is deterministic — the same input to step 1 always routes to the same step 2.

When chains beat agents: If you know exactly which steps are needed based on input, a chain is more reliable than an agent. The agent might take a different path each time. The chain is always the same path.

Tip: Prefer chains over agents for classification-and-route patterns. "Classify the intent, then call the right handler" is far more reliable than "let the LLM figure out what to do next." Predictability is a feature.

Pattern 3 — RAG

Use when: The question requires knowledge from documents you control, and that knowledge fits in the context window.

// QuestionAnswerAdvisor handles retrieval + injection automatically
String answer = chatClient.prompt()
    .user(question)
    .call()
    .content();

RAG is the right pattern for:

RAG does not need agents. The retrieval is deterministic (same query → same results, roughly). There is no reasoning loop. It is fast and predictable.

Pattern 4 — RAG + single tool

Use when: The question requires both knowledge base content AND one piece of live data.

// RAG provides context; one tool call provides live data
String answer = chatClient.prompt()
    .user("What is the warranty on order TG-9821?")
    .tools(orderTools)    // one tool: getOrderStatus
    .call()
    .content();

This is still relatively predictable. The LLM makes at most one tool call, uses its result alongside the RAG context, and generates an answer. The agent loop is short: at most two LLM calls (one to decide to call the tool, one to generate the final answer).

Pattern 5 — Full agent (multi-tool, multi-step)

Use when: The correct answer requires multiple tool calls in sequence, where the result of one determines whether and how to call the next.

// Agent decides: check order → check refund eligibility → check product warranty
// Each step depends on the result of the previous
String answer = chatClient.prompt()
    .user("My headphones stopped working. What are my options?")
    .tools(orderTools, productTools)
    .call()
    .content();

Full agents are appropriate for:

Full agents are inappropriate for:

Caution: Agents can loop. If the LLM decides a tool call didn't provide enough information and calls another tool, and that also seems insufficient, it may keep calling tools. Limit the maximum number of agent turns to prevent runaway loops. Spring AI's ChatClient has a configurable maxRounds option for the agent loop.

Reliability degrades with each step

Each LLM call in a chain or agent loop introduces variability. The LLM might:

In a single-call scenario, the output is variable but the structure is fixed. In a 5-step agent, variability compounds across steps.

Estimate reliability:

Single LLM call with good prompt:  ~95% correct on clear inputs
2-step chain:                       ~90% (0.95 × 0.95)
3-step agent:                       ~86% (0.95³)
5-step agent:                       ~77% (0.95⁵)

These are rough estimates, not guarantees. The point: design agents to minimise the number of steps, not maximise flexibility.

The decision framework

Is the answer in training data or provided text?
  Yes → Single LLM call

Is the answer in your knowledge base documents?
  Yes → RAG

Does the answer require one piece of live data?
  Yes → RAG + single tool

Is there a fixed sequence of steps where each is deterministic?
  Yes → Prompt chain

Does the task require multiple tools where the path depends on results?
  Yes → Full agent

Are actions irreversible (send email, process payment, delete data)?
  Yes → Add a human confirmation step before acting

Designing agents that fail gracefully

When you do build agents, plan for failure at every step:

Tool timeouts: Set timeouts on every tool method. If the order service doesn’t respond in 2 seconds, return an error string — don’t leave the LLM waiting indefinitely.

@Tool(description = "Get order status.")
OrderStatus getOrderStatus(String orderId) {
    try {
        return orderService.getStatusWithTimeout(orderId, Duration.ofSeconds(2));
    } catch (TimeoutException e) {
        return new OrderStatus(orderId, "TIMEOUT", "Order system is temporarily unavailable");
    }
}

Max tool calls: Prevent infinite loops by limiting how many tool calls can happen per request. In Spring AI, this is handled by the model’s max rounds configuration or by tracking call counts in the tool class.

Fallback response: If the agent cannot complete a task (all tools failed, too many retries), return a helpful fallback rather than an exception:

try {
    return chatClient.prompt().user(question).tools(tools).call().content();
} catch (Exception e) {
    log.error("Agent failed for question: {}", question, e);
    return "I'm having trouble accessing live information right now. " +
           "Please contact support@techgadgets.com for immediate assistance.";
}

Important: Always have a fallback path for agent failures. Users should never see a raw exception or an empty response. A graceful "I'm having trouble right now, please contact support" is always better than a 500 error.

What to monitor in production agents

MetricWhy it mattersAlert when
Tool call count per requestDetects runaway loops> 5 calls/request
Agent response latencyEach tool call adds ~500ms+P95 > 5 seconds
Tool error rateDetects failing integrations> 5% errors
Fallback response rateProxy for agent failure rate> 2% of requests
Token usage per requestCost indicator> 5000 tokens/request

Summary: the right tool for the job

PatternCodeBest for
Single callchatClient.prompt().user(...).call()Classification, extraction, generation
Prompt chainMultiple sequential callsFixed multi-step workflows
RAGQuestionAnswerAdvisorKnowledge base Q&A
RAG + one toolAdvisor + .tools(x)Live data + policy questions
Full agent.tools(x, y, z) multi-callMulti-step data gathering

Start at the top. Move down only when you have a clear reason.

Note: Module 6 is complete. The support assistant is now a full agent: it retrieves knowledge from documents, calls live services, maintains multi-turn memory, streams responses to the browser, and gracefully declines what it cannot answer. Module 7 covers what it takes to ship this to production: observability, cost controls, testing strategies, and safety guardrails.

References


Share this post on:

Next Post
Streaming LLM responses in Spring AI for a better user experience