Skip to content
JSBlogs
Go back

What are embeddings? A practical explanation for Java developers

Dev was staring at a problem. The support assistant answered general questions well. But when a customer asked “my headphones keep dropping connection”, the assistant had no idea that the answer lived in a knowledge base article titled “Bluetooth pairing troubleshooting for ProX series”. The words did not match. The meaning did, but the words did not.

That is the problem embeddings solve.

Table of contents

Open Table of contents

Traditional search works by matching words. A query for “connection dropping” finds documents that contain “connection” and “dropping”. It misses documents that use “Bluetooth disconnecting” or “pairing issues” — even though those documents answer the same question.

This is a fundamental limitation. Keywords represent syntax. What users actually want is meaning.

Semantic search works differently: convert text into numbers that capture meaning, then find numbers that are close to each other. Documents with similar meaning end up near each other in number-space, even if they share no words.

Those numbers are embeddings.

What an embedding actually is

An embedding is a fixed-length array of floating-point numbers — a vector — that represents the semantic meaning of a piece of text.

For example, text-embedding-3-small (OpenAI’s embedding model) produces vectors of 1536 numbers. Every piece of text — a word, a sentence, a paragraph, a document — maps to exactly 1536 numbers.

"Bluetooth pairing issues"    → [0.021, -0.143, 0.872, 0.054, ..., -0.302]  // 1536 numbers
"my headphones keep dropping" → [0.019, -0.138, 0.861, 0.048, ..., -0.289]  // 1536 numbers
"chocolate chip cookie recipe" → [-0.412, 0.891, -0.203, 0.637, ..., 0.114] // 1536 numbers

The first two sentences are about the same topic. Their vectors are close together — the numbers are similar. The cookie recipe is completely unrelated — its vector is far away.

Tip: You never need to inspect or understand the individual numbers in an embedding. What matters is the distance between vectors. Close vectors mean similar meaning. That distance is what the vector database computes at query time.

How similarity is measured

The standard metric for comparing embeddings is cosine similarity — the angle between two vectors. A cosine similarity of 1.0 means identical direction (same meaning). A cosine similarity near 0 means unrelated.

You do not need to implement this. The vector database does it. You ask “find me the 5 most similar embeddings to this query embedding”, and the database returns them ranked by similarity score.

The two-phase workflow

Working with embeddings always involves two distinct phases:

Phase 1 — Indexing (one-time or batch)

For each document in your knowledge base:

  1. Send the document text to an embedding model
  2. Receive back a vector (array of floats)
  3. Store both the vector and the original text in a vector database

Phase 2 — Query time (per user request)

For each user query:

  1. Send the query text to the same embedding model
  2. Receive back a vector
  3. Search the vector database for the closest stored vectors
  4. Return the matching documents

The embedding model must be the same in both phases. You cannot index with text-embedding-3-small and query with a different model — the vector spaces would be incompatible.

Important: Indexing happens once (or in batch when documents change). Querying happens on every user request. Design your system so indexing is a separate, schedulable job — not something that runs on startup every time.

What the embedding model actually does

The embedding model is a neural network — typically a transformer — that was trained on massive amounts of text. During training it learned to represent semantic relationships as geometric relationships in vector space.

After training, it can convert any text into a point in a high-dimensional space such that:

The training is done. You use the model as a black box: text in, vector out.

Embeddings vs the LLM that answers questions

Your application will use two different models:

Model typePurposeAPI call
Embedding modelConverts text to vectorsEmbeddingModel.embed(text)
Chat modelGenerates answersChatClient.prompt(...).call()

The embedding model does not generate text. It does not answer questions. It only converts text into vectors. The chat model is what generates the final answer — but in a RAG setup, it needs relevant documents first, which the embedding model and vector database provide.

Tip: In Spring AI, the EmbeddingModel interface abstracts the embedding model the same way ChatModel abstracts the chat model. Swap OpenAI embeddings for Ollama embeddings by changing configuration, not code.

What embeddings are good for

Embeddings power a range of features beyond just search:

FeatureHow embeddings help
Semantic searchFind documents by meaning, not keywords
RAGRetrieve relevant context to ground LLM answers
Recommendation”More like this” — find items similar to what a user viewed
ClusteringGroup support tickets by topic automatically
Anomaly detectionFlag text that is semantically far from expected inputs
DeduplicationDetect near-duplicate documents regardless of wording

All of these reduce to the same operation: compute embeddings, then find nearest neighbours.

What the model’s context window has to do with it

Embedding models have their own input limits — separate from the chat model’s context window.

text-embedding-3-small supports up to 8191 tokens per input. If your document is longer than that, you need to split it before embedding. A 50-page PDF cannot be embedded as one chunk — it must be split into paragraphs or sections first.

This splitting is called chunking, and it is a critical design decision covered in the RAG module (Module 4). For now: know that documents must fit within the embedding model’s token limit, and most production systems split long documents into overlapping chunks of 200–500 tokens.

Caution: A common mistake is embedding the entire document as one unit. If the document is long, the embedding will be an average of all its topics — none of them represented well. Chunk first, then embed each chunk. The chunk size is a tuning parameter that affects retrieval quality significantly.

A preview of the Spring AI embedding API

Spring AI wraps the embedding API behind EmbeddingModel. You do not call it directly in most production code — the VectorStore abstraction calls it internally during ingestion and query. But it is useful to know what is happening underneath:

@Autowired
EmbeddingModel embeddingModel;

// Embed a single string
float[] vector = embeddingModel.embed("Bluetooth pairing issues");
// vector.length == 1536 for text-embedding-3-small

// Embed a batch (more efficient than one-at-a-time)
List<float[]> vectors = embeddingModel.embed(
    List.of("Bluetooth pairing issues", "my headphones keep dropping")
);

In practice, when you call vectorStore.add(documents) in the next posts, Spring AI calls embeddingModel.embed() internally for each document. You rarely call EmbeddingModel directly in application code.

The mental model to keep

Think of embedding as a translation layer between human language and a coordinate system that computers can search efficiently.

The result: a user asking “why won’t my headphones connect?” retrieves an article titled “Bluetooth pairing troubleshooting” — even though those phrases share no words.

That is the entire point.

Note: The next post covers vector databases — what they are, how they differ from PostgreSQL, and which options work well with Spring AI. Once you understand the data store, the following posts will set up pgvector locally and build the full indexing pipeline.

References


Share this post on:

Module 03 · Data and Embeddings — Teaching the AI to Understand Your Content · Next up

Vector databases explained — why regular databases are not enough for AI


Next Post
Streaming LLM responses in Spring AI for a better user experience