Half-Truths About Vector Search That Cost You Money

Embeddings and vector search have become the infrastructure layer beneath a surprising number of AI products—recommendation engines, semantic search bars, document retrieval systems, and the retrieval-augmented generation (RAG) pipelines now powering enterprise chatbots. Because the technology moved from research to production so quickly, a layer of mythology formed just as fast. Practitioners repeat half-truths picked up from blog posts, vendors oversell capabilities, and teams make architecture decisions based on assumptions that don't survive contact with real data.

This article is a direct correction. Each section takes a widely repeated claim about embeddings and vector search, explains why it's wrong or incomplete, and replaces it with the accurate picture. If you're building with these tools, evaluating vendors, or teaching your team to use AI responsibly, this is the ground truth you need before making expensive choices.

One framing note before diving in: embeddings and vector search are not magic. They are mathematical representations and approximate retrieval algorithms with well-understood strengths and failure modes. Understanding those failure modes is the fastest way to build systems that actually work—and to spot the myths before they cost you a relaunch.

Myth 1: Embeddings "Understand" Meaning the Way Humans Do

This is the most consequential misconception in the field. Embeddings are high-dimensional numerical vectors produced by a model trained to place semantically related text near each other in vector space. That's genuinely useful. It is not the same as understanding.

What embeddings actually capture

Embeddings capture statistical co-occurrence patterns from training data. Words and phrases that appear in similar contexts end up geometrically close in vector space. This produces real, exploitable signal—similar concepts cluster, analogies can sometimes be navigated arithmetically—but the representation is brittle in specific, predictable ways:

Negation failures: "The drug is not effective" and "The drug is effective" often land near each other because they share vocabulary. The model has no mechanism to privilege the negation.
Domain drift: A general-purpose embedding model trained on web text will misplace technical jargon. "Bond duration" in finance and "bond duration" in chemistry are not the same concept, but a generic model may conflate them.
Rare entity collapse: Proper nouns, product SKUs, and niche terminology with sparse training signal get embeddings that are essentially noise.

The practical implication: if your retrieval system returns wrong results, the instinct to "upgrade the embedding model" is often less effective than auditing whether the failure mode is semantic at all. Many retrieval failures are chunking problems, metadata problems, or query formulation problems—not embedding problems.

Myth 2: Bigger Embedding Dimensions = Better Performance

Vendors highlight dimensionality as a quality signal. A 3,072-dimension embedding model sounds more capable than a 768-dimension one. For many real-world tasks, this intuition is simply wrong.

The diminishing returns problem

Empirically, gains from increasing embedding dimensionality plateau quickly for most retrieval tasks. Going from 256 to 768 dimensions typically produces meaningful quality improvement. Going from 768 to 3,072 dimensions often produces marginal retrieval gains at significant infrastructure cost:

Index size: A 3,072-float vector is 12KB at float32. At one million documents, your index is 12GB before any overhead. Storage and memory costs scale linearly.
Query latency: Approximate nearest neighbor (ANN) algorithms like HNSW and IVF-PQ degrade in latency as dimensionality rises.
The curse of dimensionality: In very high-dimensional spaces, distances between vectors become increasingly similar to each other, reducing the discriminative power of cosine or dot-product similarity.

For most agency-scale document retrieval tasks—corpora under 10 million chunks—a well-tuned 768-dimension model with domain-appropriate fine-tuning will outperform a raw 3,072-dimension model. Match the model to the task. Benchmark before you commit.

Myth 3: Vector Search Will Replace Keyword Search

This narrative had a good run in 2022–2023. The reality is more nuanced, and the best-performing production systems have largely settled on a hybrid answer.

Where vector search wins

Semantic search genuinely outperforms BM25-style keyword search for:

Paraphrase and synonym retrieval ("affordable housing" vs. "low-income residences")
Cross-lingual retrieval when using multilingual models
Conceptual queries where the user doesn't know the exact terminology in the corpus

Where keyword search still wins

Exact-match retrieval is not a solved problem for vector search:

Product codes, version numbers, and identifiers: "v2.3.1" and "v2.4.0" will land close in vector space despite being meaningfully different.
Rare named entities with limited training data
Boolean logic queries ("documents about X but not Y")

The production standard for serious retrieval systems is hybrid search: run both a vector retrieval pass and a BM25/TF-IDF pass, then re-rank results using a cross-encoder or reciprocal rank fusion. This consistently outperforms either approach alone by 10–30% on standard retrieval benchmarks. If your vendor says you don't need keywords anymore, ask them to show you a head-to-head test on your actual data.

Myth 4: You Can Embed Once and Forget It

Organizations embed their knowledge base during initial setup and treat the embedding index as a static artifact. This creates a slow, invisible quality decay that often goes undiagnosed for months.

The three decay vectors

Model deprecation: Embedding models are updated or retired. If your retrieval pipeline uses embeddings from a model that's been updated, the new query embeddings and the old document embeddings are no longer in the same vector space. Cosine similarity comparisons become meaningless. This is not theoretical—it has broken production systems at multiple companies that weren't monitoring retrieval quality over time.

Distribution shift: Your documents evolve. New terminology enters the corpus; old documents become stale. The embedding of a document written in 2021 may not represent how the same topic would be queried in 2024.

Chunking strategy mismatch: If your team changes how documents are chunked—larger chunks for better context, smaller chunks for better precision—the old embeddings are no longer coherent with the new strategy, and mixing them produces unpredictable results.

Treat your embedding index like a database schema: it requires versioning, migration planning, and monitoring. This is closely related to the broader risk management questions covered in The Hidden Risks of How Generative AI Works (and How to Manage Them).

Myth 5: Cosine Similarity Is Always the Right Distance Metric

Cosine similarity is the default in most vector database documentation and tutorials. It's a reasonable default. It is not always correct.

When to use what

Cosine similarity measures angular distance. It's appropriate when vector magnitude doesn't carry information—typically true for normalized text embeddings from transformer models.
Dot product is faster to compute and equivalent to cosine similarity when vectors are normalized to unit length. When vectors are not normalized, dot product rewards magnitude, which can bias toward longer documents.
Euclidean distance (L2) matters for image embeddings and other domains where magnitude is semantically meaningful.
Hamming distance becomes relevant if you're working with binary quantized vectors to reduce index size.

The deeper issue is that many teams don't know whether their embedding model produces normalized vectors, which determines which metric is meaningful. Check the model card. If the documentation doesn't specify normalization, run a quick check on your own: compute the L2 norm of several output vectors. If they're all close to 1.0, your vectors are normalized.

Myth 6: RAG Fixes Hallucination Completely

Retrieval-augmented generation—pairing a language model with a vector search system to fetch relevant documents at inference time—has been widely pitched as a hallucination cure. It reduces hallucination risk. It does not eliminate it.

The failure modes RAG introduces

RAG moves the hallucination surface rather than removing it. New failure modes include:

Retrieval failure: If the relevant document isn't retrieved, the model hallucinates anyway, often confidently citing a retrieved-but-wrong document as if it supports the wrong answer.
Context window overflow: If retrieved chunks are too long or too numerous, the model's attention degrades on the material in the middle. This is the "lost in the middle" phenomenon, which is well-documented in inference research.
Faithfulness failure: The model is given accurate retrieved content but still generates a response that doesn't accurately represent it—substituting plausible-sounding paraphrases that shift meaning.

RAG is a significant improvement over naked generation for knowledge-intensive tasks, but it is an engineering system that requires the same rigorous evaluation as any other software component. Teams that skip retrieval quality evaluation and go straight to LLM output evaluation miss half the failure surface. For a fuller treatment of how these components fit together, see Advanced How Generative AI Works: Going Beyond the Basics.

Myth 7: Off-the-Shelf Embedding Models Work Well Across All Domains

The dominant embedding models—from OpenAI, Cohere, Google, and open-source alternatives like Nomic and BGE—are trained on broad corpora. For general-purpose retrieval of general-purpose text, they perform well. For specialized domains, performance degrades in ways that aren't obvious until you measure.

The domain adaptation gap

In fields like law, medicine, finance, and software engineering, technical vocabulary, citation structures, and reasoning patterns differ substantially from general web text. An embedding model that places "consideration" (legal concept) near "thoughtfulness" because that's the dominant usage in its training data will produce retrieval errors that are invisible until someone builds a test set.

Options for closing the gap, roughly ordered by cost:

Prompt-augmented embeddings: Prepend a short instruction to queries and documents describing the domain. Some models (like E5 and Instructor) support this natively and improve materially.
Fine-tuned retrieval models: Train a bi-encoder on domain-specific query-document pairs. Requires labeled data but produces meaningful gains.
Cross-encoder re-ranking: Use a fine-tuned cross-encoder as a second-pass re-ranker on top of a general bi-encoder. Often the best cost/performance trade-off for specialized retrieval.

If you're advising agencies on building AI products—as discussed in Rolling Out How Generative AI Works Across a Team—this is one of the most important calibration points: domain adaptation is almost always necessary and almost always underestimated in project scoping.

Myth 8: Vector Databases Are Just Regular Databases with Extra Steps

The final myth is architectural. Teams familiar with relational or document databases sometimes treat vector databases as a slightly exotic storage option. The operational implications are actually quite different.

Key operational distinctions

ANN, not exact search: Most vector databases return approximate nearest neighbors, not guaranteed exact nearest neighbors. The trade-off between recall and latency is tunable but never fully eliminated. Systems that require exact retrieval for compliance reasons need to account for this.
Index rebuild cost: Some ANN index types (particularly flat indexes and certain HNSW configurations) require full or partial index rebuilds when the corpus changes significantly. This has real-time and compute implications.
Filtering complexity: Metadata filtering in vector databases (e.g., "retrieve similar documents, but only from this client's corpus") interacts with the ANN index in counterintuitive ways. Pre-filtering can degrade recall; post-filtering can return fewer results than requested. Architecturing this correctly requires understanding how your specific vector database handles hybrid queries.

These are engineering realities, not showstoppers. But teams that treat vector database selection as equivalent to "which Postgres-compatible database do we use" are in for surprises during scaling and compliance reviews.

Frequently Asked Questions

Are embeddings the same as large language models?

No. Embeddings are typically produced by encoder-only or bi-encoder models specifically trained for representation tasks. Large language models are decoder-based and optimized for text generation. Some LLMs expose embedding APIs as a secondary capability, but the architecture and training objective differ meaningfully from dedicated embedding models.

How do I know if my retrieval system is actually working well?

Build an offline evaluation set: a collection of representative queries paired with known-relevant documents from your actual corpus. Track recall@k (what fraction of relevant documents appear in the top k results) and mean reciprocal rank (MRR). Without this, you're flying blind—intuition about retrieval quality based on manual spot-checking is unreliable.

Do I need a specialized vector database, or can I use PostgreSQL with pgvector?

For corpora under roughly one million chunks, pgvector on a well-provisioned PostgreSQL instance is often sufficient and operationally simpler. Above that scale, or when you need advanced ANN index types, dedicated vector databases like Qdrant, Weaviate, or Pinecone typically offer better query performance and more index management flexibility.

Can embeddings capture tabular or structured data?

Yes, but not straightforwardly. Serializing structured data to text (e.g., converting a CSV row to a sentence) before embedding often works for retrieval at small scale, but it loses structural signal and doesn't generalize well to complex schemas. Dedicated approaches—such as table-aware pre-training or hybrid retrieval combining keyword and column-level indexing—perform better for structured retrieval at scale.

Is it true that multilingual embeddings work equally well across all languages?

No. Multilingual embedding models are trained on corpora where high-resource languages (English, Spanish, Mandarin) vastly outnumber low-resource languages. Retrieval quality for languages underrepresented in training data is typically worse, sometimes dramatically so. Test explicitly on your target languages before committing to a multilingual model for a multilingual product.

How does vector search relate to the skills AI practitioners should be developing?

Understanding how retrieval systems work—and fail—is increasingly a baseline competency for anyone building AI products, not just ML engineers. As covered in How Generative AI Works as a Career Skill: Why It Matters and How to Build It, the practitioners who can reason about the full inference pipeline, including retrieval, are significantly more effective than those who treat embedding systems as black boxes.

Key Takeaways

Embeddings encode statistical patterns, not semantic understanding. Negation, rare entities, and domain-specific jargon are known failure modes.
Dimensionality is a cost-quality trade-off, not a linear quality signal. Benchmark before scaling up.
Hybrid search (vector + keyword + re-ranking) outperforms pure vector search in most production settings.
Embedding indexes decay. Model updates, corpus changes, and chunking strategy changes all require index maintenance and version control.
Cosine similarity is a reasonable default for normalized text embeddings, but verify your model's normalization and choose the metric intentionally.
RAG reduces hallucination risk; it does not eliminate it. Retrieval failures are their own failure mode.
Off-the-shelf embedding models underperform in specialized domains. Plan for domain adaptation—it is almost always required.
Vector databases have distinct operational characteristics around ANN recall, index rebuild costs, and metadata filtering that differ materially from relational database assumptions.

Myth 1: Embeddings "Understand" Meaning the Way Humans Do

What embeddings actually capture

Negation failures: "The drug is not effective" and "The drug is effective" often land near each other because they share vocabulary. The model has no mechanism to privilege the negation.
Domain drift: A general-purpose embedding model trained on web text will misplace technical jargon. "Bond duration" in finance and "bond duration" in chemistry are not the same concept, but a generic model may conflate them.
Rare entity collapse: Proper nouns, product SKUs, and niche terminology with sparse training signal get embeddings that are essentially noise.

Myth 2: Bigger Embedding Dimensions = Better Performance

Vendors highlight dimensionality as a quality signal. A 3,072-dimension embedding model sounds more capable than a 768-dimension one. For many real-world tasks, this intuition is simply wrong.

The diminishing returns problem

Index size: A 3,072-float vector is 12KB at float32. At one million documents, your index is 12GB before any overhead. Storage and memory costs scale linearly.
Query latency: Approximate nearest neighbor (ANN) algorithms like HNSW and IVF-PQ degrade in latency as dimensionality rises.
The curse of dimensionality: In very high-dimensional spaces, distances between vectors become increasingly similar to each other, reducing the discriminative power of cosine or dot-product similarity.

Myth 3: Vector Search Will Replace Keyword Search

This narrative had a good run in 2022–2023. The reality is more nuanced, and the best-performing production systems have largely settled on a hybrid answer.

Where vector search wins

Semantic search genuinely outperforms BM25-style keyword search for:

Paraphrase and synonym retrieval ("affordable housing" vs. "low-income residences")
Cross-lingual retrieval when using multilingual models
Conceptual queries where the user doesn't know the exact terminology in the corpus

Where keyword search still wins

Exact-match retrieval is not a solved problem for vector search:

Product codes, version numbers, and identifiers: "v2.3.1" and "v2.4.0" will land close in vector space despite being meaningfully different.
Rare named entities with limited training data
Boolean logic queries ("documents about X but not Y")

Myth 4: You Can Embed Once and Forget It

Organizations embed their knowledge base during initial setup and treat the embedding index as a static artifact. This creates a slow, invisible quality decay that often goes undiagnosed for months.

The three decay vectors

Myth 5: Cosine Similarity Is Always the Right Distance Metric

Cosine similarity is the default in most vector database documentation and tutorials. It's a reasonable default. It is not always correct.

When to use what

Cosine similarity measures angular distance. It's appropriate when vector magnitude doesn't carry information—typically true for normalized text embeddings from transformer models.
Dot product is faster to compute and equivalent to cosine similarity when vectors are normalized to unit length. When vectors are not normalized, dot product rewards magnitude, which can bias toward longer documents.
Euclidean distance (L2) matters for image embeddings and other domains where magnitude is semantically meaningful.
Hamming distance becomes relevant if you're working with binary quantized vectors to reduce index size.

Myth 6: RAG Fixes Hallucination Completely

The failure modes RAG introduces

RAG moves the hallucination surface rather than removing it. New failure modes include:

Retrieval failure: If the relevant document isn't retrieved, the model hallucinates anyway, often confidently citing a retrieved-but-wrong document as if it supports the wrong answer.
Context window overflow: If retrieved chunks are too long or too numerous, the model's attention degrades on the material in the middle. This is the "lost in the middle" phenomenon, which is well-documented in inference research.
Faithfulness failure: The model is given accurate retrieved content but still generates a response that doesn't accurately represent it—substituting plausible-sounding paraphrases that shift meaning.

Myth 7: Off-the-Shelf Embedding Models Work Well Across All Domains

The domain adaptation gap

Options for closing the gap, roughly ordered by cost:

Prompt-augmented embeddings: Prepend a short instruction to queries and documents describing the domain. Some models (like E5 and Instructor) support this natively and improve materially.
Fine-tuned retrieval models: Train a bi-encoder on domain-specific query-document pairs. Requires labeled data but produces meaningful gains.
Cross-encoder re-ranking: Use a fine-tuned cross-encoder as a second-pass re-ranker on top of a general bi-encoder. Often the best cost/performance trade-off for specialized retrieval.

Myth 8: Vector Databases Are Just Regular Databases with Extra Steps

Key operational distinctions

ANN, not exact search: Most vector databases return approximate nearest neighbors, not guaranteed exact nearest neighbors. The trade-off between recall and latency is tunable but never fully eliminated. Systems that require exact retrieval for compliance reasons need to account for this.
Index rebuild cost: Some ANN index types (particularly flat indexes and certain HNSW configurations) require full or partial index rebuilds when the corpus changes significantly. This has real-time and compute implications.
Filtering complexity: Metadata filtering in vector databases (e.g., "retrieve similar documents, but only from this client's corpus") interacts with the ANN index in counterintuitive ways. Pre-filtering can degrade recall; post-filtering can return fewer results than requested. Architecturing this correctly requires understanding how your specific vector database handles hybrid queries.

Frequently Asked Questions

Are embeddings the same as large language models?

How do I know if my retrieval system is actually working well?

Do I need a specialized vector database, or can I use PostgreSQL with pgvector?

Can embeddings capture tabular or structured data?

Is it true that multilingual embeddings work equally well across all languages?

How does vector search relate to the skills AI practitioners should be developing?

Key Takeaways

Embeddings encode statistical patterns, not semantic understanding. Negation, rare entities, and domain-specific jargon are known failure modes.
Dimensionality is a cost-quality trade-off, not a linear quality signal. Benchmark before scaling up.
Hybrid search (vector + keyword + re-ranking) outperforms pure vector search in most production settings.
Embedding indexes decay. Model updates, corpus changes, and chunking strategy changes all require index maintenance and version control.
Cosine similarity is a reasonable default for normalized text embeddings, but verify your model's normalization and choose the metric intentionally.
RAG reduces hallucination risk; it does not eliminate it. Retrieval failures are their own failure mode.
Off-the-shelf embedding models underperform in specialized domains. Plan for domain adaptation—it is almost always required.
Vector databases have distinct operational characteristics around ANN recall, index rebuild costs, and metadata filtering that differ materially from relational database assumptions.

Half-Truths About Vector Search That Cost You Money

Myth 1: Embeddings "Understand" Meaning the Way Humans Do

What embeddings actually capture

Myth 2: Bigger Embedding Dimensions = Better Performance

The diminishing returns problem

Myth 3: Vector Search Will Replace Keyword Search

Where vector search wins

Where keyword search still wins

Myth 4: You Can Embed Once and Forget It

The three decay vectors

Myth 5: Cosine Similarity Is Always the Right Distance Metric

When to use what

Myth 6: RAG Fixes Hallucination Completely

The failure modes RAG introduces

Myth 7: Off-the-Shelf Embedding Models Work Well Across All Domains

The domain adaptation gap

Myth 8: Vector Databases Are Just Regular Databases with Extra Steps

Key operational distinctions

Frequently Asked Questions

Are embeddings the same as large language models?

How do I know if my retrieval system is actually working well?

Do I need a specialized vector database, or can I use PostgreSQL with pgvector?

Can embeddings capture tabular or structured data?

Is it true that multilingual embeddings work equally well across all languages?

How does vector search relate to the skills AI practitioners should be developing?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Half-Truths About Vector Search That Cost You Money

Myth 1: Embeddings "Understand" Meaning the Way Humans Do

What embeddings actually capture

Myth 2: Bigger Embedding Dimensions = Better Performance

The diminishing returns problem

Myth 3: Vector Search Will Replace Keyword Search

Where vector search wins

Where keyword search still wins

Myth 4: You Can Embed Once and Forget It

The three decay vectors

Myth 5: Cosine Similarity Is Always the Right Distance Metric

When to use what

Myth 6: RAG Fixes Hallucination Completely

The failure modes RAG introduces

Myth 7: Off-the-Shelf Embedding Models Work Well Across All Domains

The domain adaptation gap

Myth 8: Vector Databases Are Just Regular Databases with Extra Steps

Key operational distinctions

Frequently Asked Questions

Are embeddings the same as large language models?

How do I know if my retrieval system is actually working well?

Do I need a specialized vector database, or can I use PostgreSQL with pgvector?

Can embeddings capture tabular or structured data?

Is it true that multilingual embeddings work equally well across all languages?

How does vector search relate to the skills AI practitioners should be developing?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?