ky-thuat Intermediate

What is an Embedding?

A way to represent text, images, or anything else as numerical vectors so machines can compare meaning.

Updated: May 2, 2026 · 2 min read

An embedding is a way to convert something (text, image, audio) into an array of numbers (a vector) such that things with similar meaning end up with similar vectors.

Intuition

Imagine embedding words into a 3D space (real embeddings have ~1536 dimensions):

"dog"  → [0.8, 0.2, 0.1]
"cat"  → [0.7, 0.3, 0.2]   ← close to "dog" (both pets)
"car"  → [0.1, 0.9, 0.5]   ← far from "dog" (different topic)

The distance between two vectors approximates semantic distance.

What embeddings are used for

1. RAG (Retrieval-Augmented Generation)

You have 1000 pages of docs. You can’t fit them all into a prompt.

Embed each chunk → store in a vector database
Embed the question → find chunks with closest vectors
Pass those chunks to the LLM → get an accurate answer

2. Semantic search

Traditional search matches keywords. Embedding search matches meaning:

Query “ways to lose weight” also surfaces “fat-burning techniques”

3. Clustering & classification

Embed all customer feedback, group nearby vectors → discover common complaint themes.

4. Recommendations

Products with vectors close to ones a user already bought → suggest those.

Popular embedding models (2026)

Provider	Model	Dimensions	Price / 1M tokens
OpenAI	text-embedding-3-large	3072	$0.13
OpenAI	text-embedding-3-small	1536	$0.02
Voyage AI	voyage-3	1024	$0.06
Cohere	embed-v3	1024	$0.10
Open source	bge-m3	1024	Free (self-host)

Multimodal embeddings

Models like CLIP embed both images and text into the same vector space — letting you search images by text:

Query “a golden retriever running on a beach” → find matches across millions of photos.