Sổ Tay AI
ky-thuat Intermediate

What is an Embedding?

A way to represent text, images, or anything else as numerical vectors so machines can compare meaning.

Updated: May 2, 2026 · 2 min read

An embedding is a way to convert something (text, image, audio) into an array of numbers (a vector) such that things with similar meaning end up with similar vectors.

Intuition

Imagine embedding words into a 3D space (real embeddings have ~1536 dimensions):

"dog"  → [0.8, 0.2, 0.1]
"cat"  → [0.7, 0.3, 0.2]   ← close to "dog" (both pets)
"car"  → [0.1, 0.9, 0.5]   ← far from "dog" (different topic)

The distance between two vectors approximates semantic distance.

What embeddings are used for

1. RAG (Retrieval-Augmented Generation)

You have 1000 pages of docs. You can’t fit them all into a prompt.

  • Embed each chunk → store in a vector database
  • Embed the question → find chunks with closest vectors
  • Pass those chunks to the LLM → get an accurate answer

Traditional search matches keywords. Embedding search matches meaning:

  • Query “ways to lose weight” also surfaces “fat-burning techniques”

3. Clustering & classification

Embed all customer feedback, group nearby vectors → discover common complaint themes.

4. Recommendations

Products with vectors close to ones a user already bought → suggest those.

ProviderModelDimensionsPrice / 1M tokens
OpenAItext-embedding-3-large3072$0.13
OpenAItext-embedding-3-small1536$0.02
Voyage AIvoyage-31024$0.06
Cohereembed-v31024$0.10
Open sourcebge-m31024Free (self-host)

Multimodal embeddings

Models like CLIP embed both images and text into the same vector space — letting you search images by text:

  • Query “a golden retriever running on a beach” → find matches across millions of photos.
Tags
#embedding#vector#rag