Understanding Spring AI's ChatClient — the heart of every AI call

Dev’s first ChatClient call worked. Now came the real questions: how do you set a persistent system prompt? How do you override the temperature for one specific call? What is an advisor? What does call() return compared to stream()?

ChatClient looks simple on the surface — and the basic usage is simple. But it has a well-designed depth that makes it useful for production systems. This post maps all of it.

Open Table of contents

The ChatClient.Builder — where configuration lives
The call chain — building a request
- .system() — override the default for one call
- .options() — override model options per call
What .call() returns
Advisors — cross-cutting concerns for AI calls
.call() vs .stream() — when to use each
A complete example: the support endpoint
References

The ChatClient.Builder — where configuration lives

ChatClient.Builder is auto-configured by Spring AI. You inject it and call .build() to produce a ChatClient. The builder is where you set defaults that apply to every call made through that client instance.

@Bean
ChatClient chatClient(ChatClient.Builder builder) {
    return builder
            .defaultSystem("You are a helpful assistant.")
            .defaultOptions(OpenAiChatOptions.builder()
                    .model("gpt-4o-mini")
                    .temperature(0.2)
                    .maxTokens(500)
                    .build())
            .build();
}

You can create multiple ChatClient beans — one per use case — each with different system prompts and options. A support bot and a code-review bot can share the same underlying ChatClient.Builder but produce clients with entirely different personalities.

@Configuration
class AiConfig {

    @Bean
    @Qualifier("support")
    ChatClient supportClient(ChatClient.Builder builder) {
        return builder
                .defaultSystem("""
                        You are a support assistant for TechGadgets.
                        Answer only product and order questions.
                        """)
                .defaultOptions(OpenAiChatOptions.builder()
                        .model("gpt-4o-mini")
                        .temperature(0.2)
                        .maxTokens(400)
                        .build())
                .build();
    }

    @Bean
    @Qualifier("summarizer")
    ChatClient summarizerClient(ChatClient.Builder builder) {
        return builder
                .defaultSystem("You summarize customer feedback into one sentence.")
                .defaultOptions(OpenAiChatOptions.builder()
                        .model("gpt-4o-mini")
                        .temperature(0.0)
                        .maxTokens(100)
                        .build())
                .build();
    }
}

Tip: Create one ChatClient bean per distinct role in your application — support, summarizer, classifier, code reviewer. Each gets its own system prompt and sensible defaults. This is cleaner than passing a different system prompt on every call.

The call chain — building a request

Every interaction with ChatClient starts with .prompt() and ends with .call() or .stream(). Everything in between shapes the request:

String response = chatClient
        .prompt()                                    // start building a request
        .system("Override the default system prompt for this call only")
        .user("What is the return policy for electronics?")
        .call()                                      // send the request
        .content();                                  // extract the response text

.system() — override the default for one call

If you set a default system prompt in the builder but need a different one for a specific call, .system() on the prompt overrides it:

// Uses the bean's default system prompt
String normalAnswer = chatClient.prompt()
        .user("What is your return policy?")
        .call()
        .content();

// Uses a different system prompt for this call only
String technicalAnswer = chatClient.prompt()
        .system("You are a technical support specialist. Use precise technical language.")
        .user("How does the noise cancellation work on the ProX headphones?")
        .call()
        .content();

.options() — override model options per call

You can override temperature, model, or max tokens for a single call without changing the bean configuration:

String creativeAnswer = chatClient.prompt()
        .user("Write three creative subject lines for our sale email.")
        .options(OpenAiChatOptions.builder()
                .temperature(0.9)
                .maxTokens(200)
                .build())
        .call()
        .content();

Important: Per-call options merge with — and override — the builder defaults. The model name from the builder is preserved unless you explicitly override it in the per-call options.

What .call() returns

.call() returns a CallResponseSpec with several extraction methods depending on what you need:

// Just the text content — most common
String text = chatClient.prompt()
        .user("What is Java?")
        .call()
        .content();

// The full ChatResponse — includes metadata, finish reason, token usage
ChatResponse response = chatClient.prompt()
        .user("What is Java?")
        .call()
        .chatResponse();

// Token usage from the full response
Usage usage = response.getMetadata().getUsage();
long inputTokens = usage.getPromptTokens();
long outputTokens = usage.getGenerationTokens();

// Strongly typed entity (next post covers this in detail)
record Answer(String summary, List<String> keyPoints) {}
Answer structured = chatClient.prompt()
        .user("Explain Java records. Respond in JSON matching this schema: {summary: string, keyPoints: string[]}")
        .call()
        .entity(Answer.class);

Use .content() for most cases. Use .chatResponse() when you need token usage, finish reason, or model metadata — such as for logging or cost tracking.

Advisors — cross-cutting concerns for AI calls

Advisors are Spring AI’s equivalent of Spring MVC interceptors — they wrap every call with pre- and post-processing logic. Spring AI ships with several built-in advisors and you can write custom ones.

You register advisors on the builder (applies to all calls) or per call:

// On the builder — applies to every call through this client
@Bean
ChatClient chatClient(ChatClient.Builder builder, VectorStore vectorStore) {
    return builder
            .defaultSystem("You are a support assistant.")
            .defaultAdvisors(
                new QuestionAnswerAdvisor(vectorStore)   // RAG — covered in Module 4
            )
            .build();
}

// Per call — applies to this call only
String answer = chatClient.prompt()
        .user(question)
        .advisors(new SimpleLoggerAdvisor())
        .call()
        .content();

The most important built-in advisors:

Advisor	What it does	Module
`QuestionAnswerAdvisor`	Retrieves relevant documents from a vector store and injects them into the prompt (RAG)	4
`MessageChatMemoryAdvisor`	Injects conversation history into every call for multi-turn conversations	5
`SimpleLoggerAdvisor`	Logs the request and response for debugging	Available now

Tip: Add SimpleLoggerAdvisor in your dev profile during early development. It logs the full prompt and response so you can see exactly what the model receives and returns — invaluable for debugging prompt issues.

.call() vs .stream() — when to use each

.call() waits for the complete response before returning. .stream() returns a Flux<String> that emits tokens as they are generated.

// Blocking — waits for complete response
String complete = chatClient.prompt()
        .user("Explain microservices.")
        .call()
        .content();

// Streaming — returns tokens as they arrive
Flux<String> streamed = chatClient.prompt()
        .user("Explain microservices.")
        .stream()
        .content();

Use .call() for:

Background processing jobs
API-to-API calls where the consumer does not need partial results
Short responses (under ~1 second) where streaming adds no perceived benefit
Structured output parsing (easier with a complete response)

Use .stream() for:

Chat interfaces where users see the response being typed out
Long responses where showing partial results reduces perceived latency
SSE (Server-Sent Events) endpoints

The next post covers streaming in depth with a working SSE endpoint. This post covers the complete call() path. Get comfortable with call() first — you will use it far more often.

Caution: stream() returns a cold Flux — nothing happens until you subscribe. If you return the Flux from a Spring MVC endpoint with the correct produces media type, the framework subscribes. If you call .stream() in a method that discards the result, the LLM call never fires.

A complete example: the support endpoint

Here is the support assistant endpoint that combines everything in this post:

@Configuration
class AiConfig {

    @Bean
    ChatClient supportClient(ChatClient.Builder builder) {
        return builder
                .defaultSystem("""
                        You are a customer support assistant for TechGadgets, an online electronics store.
                        Answer only questions about products, orders, and store policies.
                        If you cannot help, say: "I don't have that information right now."
                        Keep responses to 2–4 sentences.
                        """)
                .defaultOptions(OpenAiChatOptions.builder()
                        .model("gpt-4o-mini")
                        .temperature(0.2)
                        .maxTokens(400)
                        .build())
                .defaultAdvisors(new SimpleLoggerAdvisor())
                .build();
    }
}

@RestController
@RequestMapping("/api/support")
class SupportController {

    private final ChatClient chatClient;

    SupportController(@Qualifier("support") ChatClient chatClient) {
        this.chatClient = chatClient;
    }

    @PostMapping("/chat")
    SupportResponse chat(@RequestBody SupportRequest request) {
        String answer = chatClient.prompt()
                .user(request.question())
                .call()
                .content();

        return new SupportResponse(answer);
    }
}

record SupportRequest(String question) {}
record SupportResponse(String answer) {}

This is the foundation the support assistant is built on. Every module adds a layer: RAG (Module 4) to ground answers in real data, memory (Module 5) to make conversations stateful, tools (Module 6) to check live order status.

Note: The ChatClient API is deliberately fluent and shallow — you can build a working endpoint in 10 lines. The depth is in advisors, structured output, and streaming, which the next three posts cover one at a time.

Understanding Spring AI's ChatClient — the heart of every AI call

Table of contents

The ChatClient.Builder — where configuration lives

The call chain — building a request

.system() — override the default for one call

.options() — override model options per call

What .call() returns

Advisors — cross-cutting concerns for AI calls

.call() vs .stream() — when to use each

A complete example: the support endpoint

References