Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/georgeguimaraes/arcana/llms.txt

Use this file to discover all available pages before exploring further.

What are Embeddings?

Embeddings are numerical representations of text that capture semantic meaning. They transform words, sentences, or documents into vectors (lists of numbers) in a high-dimensional space where similar meanings are positioned close together.

Why Embeddings Matter for RAG

Traditional keyword search matches exact words. Embeddings understand meaning:
Query: "ML algorithms"

Keyword match:
❌ "machine learning models"  (no match - different words)
✅ "ML algorithms overview"    (exact match)

Embedding match:
✅ "machine learning models"  (0.87 similarity - same concept)
✅ "ML algorithms overview"    (0.92 similarity)
✅ "neural networks"           (0.78 similarity - related)

How Vector Similarity Search Works

Arcana uses cosine similarity to find relevant chunks:
# 1. Embed the query
query = "What is Elixir?"
{:ok, query_embedding} = Embedder.embed(embedder, query, intent: :query)
# => [0.23, -0.45, 0.67, ...] (384 dimensions for bge-small)

# 2. Compare with stored chunk embeddings using cosine similarity
# Cosine similarity measures the angle between vectors (0 to 1)
#   1.0 = identical meaning
#   0.8+ = highly relevant
#   0.5-0.8 = somewhat relevant
#   under 0.5 = not relevant

# 3. PostgreSQL pgvector computes similarity efficiently
results = VectorStore.search(collection, query_embedding, limit: 5)
# Returns chunks sorted by similarity score

Cosine Similarity Formula

Given two vectors A and B:
similarity = (A · B) / (||A|| × ||B||)

Where:
- A · B = dot product (sum of element-wise products)
- ||A|| = magnitude of vector A
- ||B|| = magnitude of vector B
PostgreSQL implementation (from lib/arcana/vector_store/pgvector.ex:109):
SELECT 
  id, text,
  1 - (embedding <=> query_embedding) AS score
FROM arcana_chunks
ORDER BY embedding <=> query_embedding
LIMIT 10
The <=> operator computes cosine distance. Arcana converts it to similarity with 1 - distance.

Embedding Providers

Arcana supports multiple embedding providers with a pluggable architecture:
Default: Run models locally with no API costs.
# config/config.exs
config :arcana, embedder: :local
config :arcana, embedder: {:local, model: "BAAI/bge-large-en-v1.5"}
Pros:
  • No API costs
  • Data privacy (no external calls)
  • No rate limits
Cons:
  • Requires CPU/GPU resources
  • Slower initial model download
  • Needs Nx backend (EXLA, EMLX, or Torchx)
Popular models (from lib/arcana/embedder/local.ex:33-48):
ModelDimensionsSizeBest For
BAAI/bge-small-en-v1.5384~133 MBDefault - balanced speed/quality
BAAI/bge-base-en-v1.5768~438 MBBetter quality, slower
BAAI/bge-large-en-v1.51024~1.3 GBBest quality, slowest
intfloat/e5-small-v2384~133 MBRequires query/passage prefixes
sentence-transformers/all-MiniLM-L6-v2384~90 MBLightweight, fast
Setup:
# Add to supervision tree
children = [
  MyApp.Repo,
  {Arcana.Embedder.Local, model: "BAAI/bge-small-en-v1.5"}
]

# Configure Nx backend (required)
config :nx,
  default_backend: EXLA.Backend,
  default_defn_options: [compiler: EXLA]

E5 Models and Query/Passage Prefixes

E5 models from Microsoft require special prefixes to distinguish search queries from document content:
# Query embedding (what the user searches for)
Embedder.embed(embedder, "What is Elixir?", intent: :query)
# Behind the scenes: "query: What is Elixir?" (lib/arcana/embedder/local.ex:141-150)

# Document embedding (content being indexed)
Embedder.embed(embedder, "Elixir is a functional language...", intent: :document)
# Behind the scenes: "passage: Elixir is a functional language..."

Why Prefixes Matter

E5 models were trained with these prefixes to differentiate:
  • Queries = short, question-like text
  • Passages = longer document chunks
Using the wrong prefix significantly reduces retrieval quality. Automatic prefix handling (from lib/arcana/embedder/local.ex:141-151):
def prepare_text(text, model, intent) do
  if MapSet.member?(@e5_models, model) do
    case intent do
      :query -> "query: #{text}"
      :document -> "passage: #{text}"
      nil -> "passage: #{text}"  # default to passage
    end
  else
    text  # Other models don't need prefixes
  end
end
Only E5 models (intfloat/e5-*) require prefixes. BGE, GTE, and Sentence Transformers models do not use them.

Embedding Dimensions Comparison

Dimensions affect:
  • Storage size: More dimensions = larger database
  • Search speed: More dimensions = slower cosine similarity
  • Quality: Generally, more dimensions = better semantic understanding (with diminishing returns)

Storage Calculator

# Each dimension = 4 bytes (float32)
# Example: 10,000 chunks with bge-small (384 dims)

chunk_count = 10_000
dimensions = 384
bytes_per_dim = 4

total_mb = (chunk_count * dimensions * bytes_per_dim) / 1_024 / 1_024
# => ~14.6 MB for embeddings alone

# With text-embedding-3-large (3072 dims):
dimensions = 3_072
total_mb = (chunk_count * dimensions * bytes_per_dim) / 1_024 / 1_024
# => ~117 MB for embeddings

Dimension Trade-offs

Low Dimensions (384)

Models: bge-small, e5-small, MiniLMPros:
  • Fast search (under 10ms for 100K chunks)
  • Small storage footprint
  • Quick embedding generation
Cons:
  • Slightly lower semantic precision
  • May miss subtle relationships
Best for: High-volume applications, real-time search

Medium Dimensions (768)

Models: bge-base, e5-basePros:
  • Balanced quality/speed
  • Good semantic understanding
Cons:
  • 2x storage vs 384 dims
  • Moderate speed impact
Best for: Most production use cases

High Dimensions (1024-1536)

Models: bge-large, text-embedding-3-smallPros:
  • Excellent semantic precision
  • Better handling of nuanced queries
Cons:
  • 3-4x storage vs 384 dims
  • Slower search on large datasets
Best for: Research, legal, medical domains

Very High Dimensions (3072)

Models: text-embedding-3-largePros:
  • State-of-the-art quality
  • Best for complex domains
Cons:
  • 8x storage vs 384 dims
  • Noticeably slower search
  • Higher API costs
Best for: Critical applications where quality > cost

How Embeddings Work with Vector Stores

Ingestion flow (from lib/arcana/ingest.ex:125-156):
# 1. Text is chunked
chunks = Chunker.chunk(chunker_config, text, opts)
# => [%{text: "...", chunk_index: 0, token_count: 342}, ...]

# 2. Each chunk is embedded
Enum.reduce_while(chunks, {:ok, []}, fn chunk, {:ok, acc} ->
  case Embedder.embed(emb, chunk.text, intent: :document) do
    {:ok, embedding} ->
      # 3. Embedding stored with chunk
      chunk_record =
        %Chunk{}
        |> Chunk.changeset(%{
          text: chunk.text,
          embedding: embedding,  # [0.23, -0.45, 0.67, ...]
          chunk_index: chunk.chunk_index,
          document_id: document.id
        })
        |> repo.insert!()
      
      {:cont, {:ok, [chunk_record | acc]}}
      
    {:error, reason} ->
      {:halt, {:error, reason}}
  end
end)
Search flow (from lib/arcana/search.ex:228-246):
# 1. Query is embedded
case Embedder.embed(embedder, query, intent: :query) do
  {:ok, query_embedding} ->
    # 2. pgvector finds similar chunks
    results = VectorStore.search(collection, query_embedding, opts)
    
    # SQL behind the scenes (lib/arcana/vector_store/pgvector.ex:97-114):
    # SELECT id, text,
    #   1 - (embedding <=> $1) AS score
    # FROM arcana_chunks
    # WHERE 1 - (embedding <=> $1) > $2  -- threshold
    # ORDER BY embedding <=> $1
    # LIMIT $3
    
    {:ok, transform_results(results)}
end

Embedding Real Examples

# Start the serving (in supervision tree)
children = [
  {Arcana.Embedder.Local, model: "BAAI/bge-small-en-v1.5"}
]

# Embed a query
embedder = {:local, model: "BAAI/bge-small-en-v1.5"}
{:ok, embedding} = Arcana.Embedder.embed(
  embedder, 
  "How does Phoenix LiveView work?",
  intent: :query
)

# Result:
# {:ok, [0.234, -0.456, 0.678, ...]} (384 floats)
length(embedding)  # => 384

# Behind the scenes (lib/arcana/embedder/local.ex:99-115):
# 1. Text sent to Nx.Serving (Bumblebee model)
# 2. Model computes embedding on EXLA/EMLX backend
# 3. Nx tensor converted to Elixir list
# 4. Telemetry event emitted

Choosing the Right Embedding Model

1

Consider Your Dataset Size

Small (under 10K chunks): Use any model, even large onesMedium (10K-100K chunks): Use 384-768 dimensionsLarge (>100K chunks): Prefer 384 dimensions for speed
2

Evaluate Quality Requirements

General knowledge base: bge-small or MiniLM (384 dims)Technical/domain-specific: bge-base or e5-base (768 dims)Legal/medical/research: bge-large or OpenAI large (1024-3072 dims)
3

Factor in Costs

Budget-constrained: Use local models (no API costs)Scale & convenience: OpenAI (pay per usage)Hybrid: Local for documents, OpenAI for queries (rare embeddings)
4

Test with Your Data

# Benchmark different models
models = [
  {:local, model: "BAAI/bge-small-en-v1.5"},
  {:local, model: "BAAI/bge-base-en-v1.5"},
  {:openai, model: "text-embedding-3-small"}
]

test_queries = ["query 1", "query 2", "query 3"]

Enum.each(models, fn embedder ->
  # Ingest test data
  # Run test queries
  # Measure precision/recall
  # Compare results
end)
See Evaluation Guide for metrics and testing.

Best Practices

Use Intent Parameter

Always specify :intent for E5 models:
# For queries
embed(embedder, query, intent: :query)

# For documents
embed(embedder, chunk, intent: :document)

Cache Embeddings

Never re-embed the same text. Arcana stores embeddings automatically:
# Embeddings stored in arcana_chunks table
# Only query text needs fresh embedding

Monitor Dimensions

Ensure dimensions match across ingestion and search:
# Migration checks dimensions
Embedder.dimensions(embedder)
# Must match for all chunks in collection

Handle Errors

Embedding can fail (API limits, model load):
case Embedder.embed(embedder, text) do
  {:ok, embedding} -> # proceed
  {:error, reason} -> # retry or log
end

Next Steps

Search Modes

Learn how to search with embeddings using semantic, full-text, and hybrid modes

Chunking Strategies

Optimize how documents are split before embedding

Evaluation

Measure and improve embedding quality with metrics

Getting Started

Set up your first embedding-powered application