Documentation Index Fetch the complete documentation index at: https://mintlify.com/georgeguimaraes/arcana/llms.txt
Use this file to discover all available pages before exploring further.
What are Embeddings?
Embeddings are numerical representations of text that capture semantic meaning. They transform words, sentences, or documents into vectors (lists of numbers) in a high-dimensional space where similar meanings are positioned close together.
Why Embeddings Matter for RAG
Semantic Understanding
Cross-lingual
Synonym Awareness
Traditional keyword search matches exact words. Embeddings understand meaning: Query: "ML algorithms"
Keyword match:
❌ "machine learning models" (no match - different words)
✅ "ML algorithms overview" (exact match)
Embedding match:
✅ "machine learning models" (0.87 similarity - same concept)
✅ "ML algorithms overview" (0.92 similarity)
✅ "neural networks" (0.78 similarity - related)
Embeddings can match concepts across languages: Query (English): "artificial intelligence"
Results:
✅ "intelligence artificielle" (French - 0.85 similarity)
✅ "künstliche Intelligenz" (German - 0.83 similarity)
Requires multilingual models like sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Embeddings automatically handle synonyms: Document: "The automobile was fast"
Query: "car speed"
Similarity: 0.82 ✅
# "automobile" ≈ "car", "fast" ≈ "speed"
How Vector Similarity Search Works
Arcana uses cosine similarity to find relevant chunks:
# 1. Embed the query
query = "What is Elixir?"
{ :ok , query_embedding} = Embedder . embed (embedder, query, intent: :query )
# => [0.23, -0.45, 0.67, ...] (384 dimensions for bge-small)
# 2. Compare with stored chunk embeddings using cosine similarity
# Cosine similarity measures the angle between vectors (0 to 1)
# 1.0 = identical meaning
# 0.8+ = highly relevant
# 0.5-0.8 = somewhat relevant
# under 0.5 = not relevant
# 3. PostgreSQL pgvector computes similarity efficiently
results = VectorStore . search (collection, query_embedding, limit: 5 )
# Returns chunks sorted by similarity score
Given two vectors A and B :
similarity = (A · B) / (||A|| × ||B||)
Where:
- A · B = dot product (sum of element-wise products)
- ||A|| = magnitude of vector A
- ||B|| = magnitude of vector B
PostgreSQL implementation (from lib/arcana/vector_store/pgvector.ex:109):
SELECT
id, text ,
1 - (embedding <=> query_embedding) AS score
FROM arcana_chunks
ORDER BY embedding <=> query_embedding
LIMIT 10
The <=> operator computes cosine distance. Arcana converts it to similarity with 1 - distance.
Embedding Providers
Arcana supports multiple embedding providers with a pluggable architecture:
Local (Bumblebee)
OpenAI
Custom Provider
Default : Run models locally with no API costs.# config/config.exs
config :arcana , embedder: :local
config :arcana , embedder: { :local , model: "BAAI/bge-large-en-v1.5" }
Pros:
No API costs
Data privacy (no external calls)
No rate limits
Cons:
Requires CPU/GPU resources
Slower initial model download
Needs Nx backend (EXLA, EMLX, or Torchx)
Popular models (from lib/arcana/embedder/local.ex:33-48):Model Dimensions Size Best For BAAI/bge-small-en-v1.5384 ~133 MB Default - balanced speed/qualityBAAI/bge-base-en-v1.5768 ~438 MB Better quality, slower BAAI/bge-large-en-v1.51024 ~1.3 GB Best quality, slowest intfloat/e5-small-v2384 ~133 MB Requires query/passage prefixes sentence-transformers/all-MiniLM-L6-v2384 ~90 MB Lightweight, fast
Setup: # Add to supervision tree
children = [
MyApp . Repo ,
{ Arcana . Embedder . Local , model: "BAAI/bge-small-en-v1.5" }
]
# Configure Nx backend (required)
config :nx ,
default_backend: EXLA . Backend ,
default_defn_options: [ compiler: EXLA ]
Use OpenAI’s embedding API via Req.LLM. # config/config.exs
config :arcana , embedder: :openai
config :arcana , embedder: { :openai , model: "text-embedding-3-large" }
Pros:
No local resources needed
Fast inference
State-of-the-art quality
Cons:
API costs (~$0.13 per 1M tokens for text-embedding-3-small)
Rate limits
Data sent to OpenAI
Models (from lib/arcana/embedder/openai.ex:54-62):Model Dimensions Cost (per 1M tokens) text-embedding-3-small1536 $0.02 text-embedding-3-large3072 $0.13 text-embedding-ada-0021536 $0.10
Setup: # Add to mix.exs
{ :req_llm , "~> 0.3" }
# Set environment variable
export OPENAI_API_KEY = "sk-..."
Implement your own embedding provider (Cohere, Voyage AI, etc.): defmodule MyApp . CohereEmbedder do
@behaviour Arcana . Embedder
@impl true
def embed (text, opts) do
api_key = opts[ :api_key ] || System . get_env ( "COHERE_API_KEY" )
# Call Cohere API
body = Jason . encode! (%{
texts: [text],
model: "embed-english-v3.0" ,
input_type: "search_document"
})
case Req . post (
"https://api.cohere.ai/v1/embed" ,
headers: [{ "Authorization" , "Bearer #{ api_key } " }],
body: body
) do
{ :ok , %{ body: %{ "embeddings" => [embedding]}}} ->
{ :ok , embedding}
{ :error , reason} ->
{ :error , reason}
end
end
@impl true
def dimensions ( _opts ), do: 1024
end
# config/config.exs
config :arcana , embedder: { MyApp . CohereEmbedder , api_key: "..." }
Required callbacks (from lib/arcana/embedder.ex:56-77):
embed/2 - Embed single text → {:ok, [float()]} or {:error, term()}
dimensions/1 - Return embedding dimension count
embed_batch/2 (optional) - Batch embedding for efficiency
E5 Models and Query/Passage Prefixes
E5 models from Microsoft require special prefixes to distinguish search queries from document content:
# Query embedding (what the user searches for)
Embedder . embed (embedder, "What is Elixir?" , intent: :query )
# Behind the scenes: "query: What is Elixir?" (lib/arcana/embedder/local.ex:141-150)
# Document embedding (content being indexed)
Embedder . embed (embedder, "Elixir is a functional language..." , intent: :document )
# Behind the scenes: "passage: Elixir is a functional language..."
Why Prefixes Matter
E5 models were trained with these prefixes to differentiate:
Queries = short, question-like text
Passages = longer document chunks
Using the wrong prefix significantly reduces retrieval quality.
Automatic prefix handling (from lib/arcana/embedder/local.ex:141-151):
def prepare_text (text, model, intent) do
if MapSet . member? ( @e5_models , model) do
case intent do
:query -> "query: #{ text } "
:document -> "passage: #{ text } "
nil -> "passage: #{ text } " # default to passage
end
else
text # Other models don't need prefixes
end
end
Only E5 models (intfloat/e5-*) require prefixes. BGE, GTE, and Sentence Transformers models do not use them.
Embedding Dimensions Comparison
Dimensions affect :
Storage size : More dimensions = larger database
Search speed : More dimensions = slower cosine similarity
Quality : Generally, more dimensions = better semantic understanding (with diminishing returns)
Storage Calculator
# Each dimension = 4 bytes (float32)
# Example: 10,000 chunks with bge-small (384 dims)
chunk_count = 10_000
dimensions = 384
bytes_per_dim = 4
total_mb = (chunk_count * dimensions * bytes_per_dim) / 1_024 / 1_024
# => ~14.6 MB for embeddings alone
# With text-embedding-3-large (3072 dims):
dimensions = 3_072
total_mb = (chunk_count * dimensions * bytes_per_dim) / 1_024 / 1_024
# => ~117 MB for embeddings
Dimension Trade-offs
Low Dimensions (384) Models : bge-small, e5-small, MiniLMPros :
Fast search (under 10ms for 100K chunks)
Small storage footprint
Quick embedding generation
Cons :
Slightly lower semantic precision
May miss subtle relationships
Best for : High-volume applications, real-time search
Medium Dimensions (768) Models : bge-base, e5-basePros :
Balanced quality/speed
Good semantic understanding
Cons :
2x storage vs 384 dims
Moderate speed impact
Best for : Most production use cases
High Dimensions (1024-1536) Models : bge-large, text-embedding-3-smallPros :
Excellent semantic precision
Better handling of nuanced queries
Cons :
3-4x storage vs 384 dims
Slower search on large datasets
Best for : Research, legal, medical domains
Very High Dimensions (3072) Models : text-embedding-3-largePros :
State-of-the-art quality
Best for complex domains
Cons :
8x storage vs 384 dims
Noticeably slower search
Higher API costs
Best for : Critical applications where quality > cost
How Embeddings Work with Vector Stores
Ingestion flow (from lib/arcana/ingest.ex:125-156):
# 1. Text is chunked
chunks = Chunker . chunk (chunker_config, text, opts)
# => [%{text: "...", chunk_index: 0, token_count: 342}, ...]
# 2. Each chunk is embedded
Enum . reduce_while (chunks, { :ok , []}, fn chunk, { :ok , acc} ->
case Embedder . embed (emb, chunk.text, intent: :document ) do
{ :ok , embedding} ->
# 3. Embedding stored with chunk
chunk_record =
% Chunk {}
|> Chunk . changeset (%{
text: chunk.text,
embedding: embedding, # [0.23, -0.45, 0.67, ...]
chunk_index: chunk.chunk_index,
document_id: document.id
})
|> repo . insert! ()
{ :cont , { :ok , [chunk_record | acc]}}
{ :error , reason} ->
{ :halt , { :error , reason}}
end
end )
Search flow (from lib/arcana/search.ex:228-246):
# 1. Query is embedded
case Embedder . embed (embedder, query, intent: :query ) do
{ :ok , query_embedding} ->
# 2. pgvector finds similar chunks
results = VectorStore . search (collection, query_embedding, opts)
# SQL behind the scenes (lib/arcana/vector_store/pgvector.ex:97-114):
# SELECT id, text,
# 1 - (embedding <=> $1) AS score
# FROM arcana_chunks
# WHERE 1 - (embedding <=> $1) > $2 -- threshold
# ORDER BY embedding <=> $1
# LIMIT $3
{ :ok , transform_results (results)}
end
Embedding Real Examples
Local Bumblebee
OpenAI
Batch Embedding
# Start the serving (in supervision tree)
children = [
{ Arcana . Embedder . Local , model: "BAAI/bge-small-en-v1.5" }
]
# Embed a query
embedder = { :local , model: "BAAI/bge-small-en-v1.5" }
{ :ok , embedding} = Arcana . Embedder . embed (
embedder,
"How does Phoenix LiveView work?" ,
intent: :query
)
# Result:
# {:ok, [0.234, -0.456, 0.678, ...]} (384 floats)
length (embedding) # => 384
# Behind the scenes (lib/arcana/embedder/local.ex:99-115):
# 1. Text sent to Nx.Serving (Bumblebee model)
# 2. Model computes embedding on EXLA/EMLX backend
# 3. Nx tensor converted to Elixir list
# 4. Telemetry event emitted
# Set API key
System . put_env ( "OPENAI_API_KEY" , "sk-..." )
# Embed with OpenAI
embedder = { :openai , model: "text-embedding-3-small" }
{ :ok , embedding} = Arcana . Embedder . embed (
embedder,
"Explain vector databases" ,
intent: :query
)
# Result:
# {:ok, [0.123, -0.234, 0.345, ...]} (1536 floats)
length (embedding) # => 1536
# Behind the scenes (lib/arcana/embedder/openai.ex:40-50):
# 1. Call ReqLLM.embed("openai:text-embedding-3-small", text)
# 2. HTTP POST to OpenAI API
# 3. Parse response JSON
# 4. Return embedding vector
# Embed multiple texts efficiently
texts = [
"First document chunk" ,
"Second document chunk" ,
"Third document chunk"
]
{ :ok , embeddings} = Arcana . Embedder . embed_batch (embedder, texts)
# Result:
# {:ok, [
# [0.12, -0.34, ...], # 384 floats
# [0.23, -0.45, ...], # 384 floats
# [0.34, -0.56, ...] # 384 floats
# ]}
# Note: Falls back to sequential embedding if provider
# doesn't implement embed_batch/2 (lib/arcana/embedder.ex:113-126)
Choosing the Right Embedding Model
Consider Your Dataset Size
Small (under 10K chunks) : Use any model, even large onesMedium (10K-100K chunks) : Use 384-768 dimensionsLarge (>100K chunks) : Prefer 384 dimensions for speed
Evaluate Quality Requirements
General knowledge base : bge-small or MiniLM (384 dims)Technical/domain-specific : bge-base or e5-base (768 dims)Legal/medical/research : bge-large or OpenAI large (1024-3072 dims)
Factor in Costs
Budget-constrained : Use local models (no API costs)Scale & convenience : OpenAI (pay per usage)Hybrid : Local for documents, OpenAI for queries (rare embeddings)
Test with Your Data
# Benchmark different models
models = [
{ :local , model: "BAAI/bge-small-en-v1.5" },
{ :local , model: "BAAI/bge-base-en-v1.5" },
{ :openai , model: "text-embedding-3-small" }
]
test_queries = [ "query 1" , "query 2" , "query 3" ]
Enum . each (models, fn embedder ->
# Ingest test data
# Run test queries
# Measure precision/recall
# Compare results
end )
See Evaluation Guide for metrics and testing.
Best Practices
Use Intent Parameter Always specify :intent for E5 models: # For queries
embed (embedder, query, intent: :query )
# For documents
embed (embedder, chunk, intent: :document )
Cache Embeddings Never re-embed the same text. Arcana stores embeddings automatically: # Embeddings stored in arcana_chunks table
# Only query text needs fresh embedding
Monitor Dimensions Ensure dimensions match across ingestion and search: # Migration checks dimensions
Embedder . dimensions (embedder)
# Must match for all chunks in collection
Handle Errors Embedding can fail (API limits, model load): case Embedder . embed (embedder, text) do
{ :ok , embedding} -> # proceed
{ :error , reason} -> # retry or log
end
Next Steps
Search Modes Learn how to search with embeddings using semantic, full-text, and hybrid modes
Chunking Strategies Optimize how documents are split before embedding
Evaluation Measure and improve embedding quality with metrics
Getting Started Set up your first embedding-powered application