intermediateDeep Learning

Understand embeddings - learned dense vector representations that capture semantic meaning for words, items, and entities.

embeddingsrepresentation-learningvectorssemantic-search

Embeddings

Embeddings are learned dense vector representations of discrete objects. They map categorical data into continuous vector spaces where similar items are nearby.

What Are Embeddings?

From Sparse to Dense

One-hot encoding (sparse):
"cat" → [1, 0, 0, 0, ..., 0]  (vocab_size dimensions)
"dog" → [0, 1, 0, 0, ..., 0]

Embedding (dense):
"cat" → [0.2, -0.5, 0.8, 0.1]  (embedding_dim dimensions)
"dog" → [0.3, -0.4, 0.7, 0.2]  # Similar to cat!

Key Properties

  • Dense: Most values non-zero
  • Low-dimensional: 50-1000 vs vocabulary size
  • Learned: Trained from data
  • Semantic: Similar items → similar vectors

How Embeddings Work

Embedding Layer

Simply a lookup table:

class Embedding:
    def __init__(self, vocab_size, embed_dim):
        self.weights = random(vocab_size, embed_dim)
    
    def forward(self, indices):
        return self.weights[indices]  # Just lookup!

Training

Gradients flow back through embedding layer:

Loss → ... → Embedding weights updated

Embeddings learn representations useful for the task.

Types of Embeddings

Word Embeddings

Vector representations of words:

  • Word2Vec: Skip-gram, CBOW
  • GloVe: Global Vectors
  • FastText: Subword embeddings

Contextual Embeddings

Same word, different vectors based on context:

  • BERT: Bidirectional context
  • GPT: Left-to-right context
"bank" (river) → [0.2, 0.8, ...]
"bank" (money) → [0.9, 0.1, ...]

Item Embeddings

Products, movies, users:

  • Recommendation systems
  • Learned from interactions

Graph Embeddings

Nodes in networks:

  • Node2Vec
  • GraphSAGE

Image Embeddings

From CNN encoders:

  • ResNet features
  • CLIP embeddings

Vector Arithmetic

Embeddings capture relationships:

vec("king") - vec("man") + vec("woman") ≈ vec("queen")
vec("Paris") - vec("France") + vec("Germany") ≈ vec("Berlin")

Similarity Measures

Cosine Similarity

cos_sim(a, b) = (a · b) / (||a|| × ||b||)

Range: -1 to 1. Most common for embeddings.

Euclidean Distance

dist(a, b) = ||a - b||

Smaller = more similar.

Dot Product

a · b = Σ aᵢbᵢ

Fast, but magnitude-sensitive.

Training Embeddings

As Part of Model

class TextClassifier(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_classes):
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.classifier = nn.Linear(embed_dim, num_classes)
    
    def forward(self, x):
        embedded = self.embedding(x)  # [batch, seq, embed]
        pooled = embedded.mean(dim=1)  # [batch, embed]
        return self.classifier(pooled)

Pre-trained Embeddings

# Load pre-trained
embedding.weight = torch.tensor(pretrained_vectors)

# Optionally freeze
embedding.weight.requires_grad = False

Contrastive Learning

Train to make similar items close:

Loss = distance(anchor, positive) - distance(anchor, negative) + margin

Embedding Dimension

How to Choose

ApplicationTypical Dimension
Word embeddings100-300
Sentence embeddings384-768
Recommendation32-128
Large language models768-4096

Rule of Thumb

embed_dim ≈ 4th root of vocab_size

# Or use power of 2 for efficiency
64, 128, 256, 512

Using Embeddings

Semantic Search

def search(query, documents):
    query_emb = embed(query)
    doc_embs = [embed(d) for d in documents]
    similarities = [cosine_sim(query_emb, d) for d in doc_embs]
    return documents[argmax(similarities)]

Clustering

from sklearn.cluster import KMeans

embeddings = [get_embedding(item) for item in items]
kmeans = KMeans(n_clusters=10).fit(embeddings)

Classification Features

X = np.array([get_embedding(text) for text in texts])
model.fit(X, labels)

Retrieval (RAG)

# Index
index.add(document_embeddings)

# Query
results = index.search(query_embedding, k=5)

Common Issues

Out-of-Vocabulary (OOV)

Unknown words have no embedding.

Solutions:

  • Use <UNK> token
  • Subword tokenization (BPE, WordPiece)
  • Character-level embeddings

Cold Start

New items have no learned embedding.

Solutions:

  • Content-based initial embedding
  • Average of similar items
  • Frequent retraining

Embedding Drift

Meaning changes over time.

Solutions:

  • Periodic retraining
  • Incremental updates

Vector Databases

For efficient similarity search:

  • Pinecone: Managed service
  • Weaviate: Open source
  • FAISS: Facebook's library
  • Chroma: Lightweight
  • Qdrant: Rust-based
import faiss

index = faiss.IndexFlatIP(embed_dim)  # Inner product
index.add(embeddings)
distances, indices = index.search(query_embedding, k=10)

Key Takeaways

  1. Embeddings map discrete items to dense vectors
  2. Similar items have similar vectors
  3. Learned end-to-end or pre-trained
  4. Cosine similarity is standard measure
  5. Enable semantic search, recommendations, clustering
  6. Vector databases enable efficient retrieval at scale

Practice Questions

Test your understanding with these related interview questions: