Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/georgeguimaraes/arcana/llms.txt

Use this file to discover all available pages before exploring further.

The Arcana module is the primary interface for RAG (Retrieval Augmented Generation) operations in Elixir. It provides document ingestion, vector search, and question answering capabilities that integrate with any Phoenix/Ecto application.

Overview

Arcana provides a complete RAG pipeline:
  1. Ingest documents and files with automatic chunking and embedding
  2. Search using semantic, fulltext, or hybrid modes
  3. Ask questions with context-aware LLM responses
  4. Delete documents and their associated chunks

Core Functions

ingest/2

Ingests text content, creating a document with embedded chunks.
ingest(text, opts) :: {:ok, %Arcana.Document{}} | {:error, term()}
text
string
required
The text content to ingest
opts
keyword
Options for ingestion:
repo
module
required
The Ecto repo to use for database operations
source_id
string
Optional identifier for grouping/filtering documents
metadata
map
Optional metadata to store with the document
chunk_size
integer
default:"1024"
Maximum chunk size in characters
chunk_overlap
integer
default:"200"
Overlap between chunks in characters
collection
string | map
default:"\"default\""
Collection name (string) or map with :name and :description keys
graph
boolean
Enable GraphRAG entity/relationship extraction (overrides config default)
ok
{:ok, %Arcana.Document{}}
Returns the created document with fields:
  • id - UUID of the document
  • content - Original text content
  • source_id - Optional source identifier
  • metadata - Document metadata
  • status - Processing status (:completed, :failed, etc.)
  • chunk_count - Number of chunks created
  • collection_id - Associated collection UUID
error
{:error, term()}
Returns an error tuple if ingestion fails
Example:
# Basic ingestion
{:ok, document} = Arcana.ingest(
  "Elixir is a functional programming language.",
  repo: MyApp.Repo
)

# With metadata and custom chunking
{:ok, document} = Arcana.ingest(
  long_article_text,
  repo: MyApp.Repo,
  source_id: "blog_post_123",
  metadata: %{author: "Jane Doe", published: "2024-01-15"},
  chunk_size: 512,
  chunk_overlap: 100,
  collection: %{name: "blog_posts", description: "Company blog articles"}
)

ingest_file/2

Ingests a file by parsing its content and creating a document with embedded chunks.
ingest_file(path, opts) :: {:ok, %Arcana.Document{}} | {:error, term()}
path
string
required
Path to the file to ingest. Supports plain text, markdown, and PDF formats.
opts
keyword
Options (same as ingest/2):
repo
module
required
The Ecto repo to use
source_id
string
Optional identifier for grouping/filtering
metadata
map
Optional metadata map
chunk_size
integer
default:"1024"
Maximum chunk size in characters
chunk_overlap
integer
default:"200"
Overlap between chunks
collection
string
default:"\"default\""
Collection name to organize the document
Example:
# Ingest a markdown file
{:ok, document} = Arcana.ingest_file(
  "/path/to/docs/README.md",
  repo: MyApp.Repo,
  collection: "documentation",
  metadata: %{category: "readme"}
)

# Ingest a PDF with custom chunking
{:ok, document} = Arcana.ingest_file(
  "/path/to/report.pdf",
  repo: MyApp.Repo,
  source_id: "annual_report_2024",
  chunk_size: 2048,
  chunk_overlap: 400
)

search/2

Searches for chunks similar to the query using semantic, fulltext, or hybrid search.
search(query, opts) :: {:ok, [result]} | {:error, term()}
query
string
required
The search query text
opts
keyword
Search options:
repo
module
required
The Ecto repo to use (required for pgvector backend)
limit
integer
default:"10"
Maximum number of results to return
mode
atom
default:":semantic"
Search mode: :semantic, :fulltext, or :hybrid
source_id
string
Filter results to a specific source
threshold
float
default:"0.0"
Minimum similarity score (0.0 to 1.0)
collection
string
Filter results to a specific collection by name
collections
list(string)
Filter results to multiple collections
semantic_weight
float
default:"0.5"
Weight for semantic scores in hybrid mode (0.0 to 1.0)
fulltext_weight
float
default:"0.5"
Weight for fulltext scores in hybrid mode (0.0 to 1.0)
ok
{:ok, [result]}
Returns a list of result maps, each containing:
  • id - Chunk UUID
  • text - The chunk text content
  • document_id - Parent document UUID
  • chunk_index - Position in the document
  • score - Similarity/relevance score
Example:
# Semantic search
{:ok, results} = Arcana.search(
  "What is functional programming?",
  repo: MyApp.Repo,
  limit: 5
)

# Hybrid search with custom weights
{:ok, results} = Arcana.search(
  "elixir concurrency",
  repo: MyApp.Repo,
  mode: :hybrid,
  semantic_weight: 0.7,
  fulltext_weight: 0.3,
  collection: "documentation"
)

IO.inspect(results)
# [
#   %{
#     id: "550e8400-e29b-41d4-a716-446655440000",
#     text: "Elixir provides lightweight processes...",
#     document_id: "660e8400-e29b-41d4-a716-446655440000",
#     chunk_index: 3,
#     score: 0.87
#   },
#   ...
# ]

rewrite_query/2

Rewrites a query using a provided rewriter function to improve retrieval.
rewrite_query(query, opts \\ []) :: {:ok, rewritten_query} | {:error, term()}
Query rewriting can improve retrieval by:
  • Expanding abbreviations
  • Adding synonyms
  • Reformulating the query for better matching
  • Correcting typos
query
string
required
The original query to rewrite
opts
keyword
Rewrite options:
rewriter
function
required
A function that takes a query and returns {:ok, rewritten} or {:error, reason}. The function should have the signature: (String.t()) -> {:ok, String.t()} | {:error, term()}
Example:
# Define a rewriter function
rewriter = fn query ->
  # Simple expansion example
  expanded = query <> " programming language development"
  {:ok, expanded}
end

# Rewrite the query
{:ok, rewritten} = Arcana.rewrite_query(
  "Elixir", 
  rewriter: rewriter
)

IO.puts(rewritten)
# => "Elixir programming language development"

# Then use in search
{:ok, results} = Arcana.search(rewritten, repo: MyApp.Repo)
LLM-based rewriter:
rewriter = fn query ->
  llm = Application.get_env(:my_app, :llm)
  
  prompt = """
  Rewrite this search query to be more specific and include related terms.
  Original: #{query}
  Rewritten:
  """
  
  case Arcana.LLM.complete(llm, prompt, [], []) do
    {:ok, rewritten} -> {:ok, String.trim(rewritten)}
    error -> error
  end
end

{:ok, rewritten} = Arcana.rewrite_query("RAG", rewriter: rewriter)
For automatic query rewriting in the Agent pipeline, use Agent.rewrite/2 which provides built-in LLM-based rewriting.

ask/2

Ask questions using RAG (Retrieval Augmented Generation) with context from your knowledge base.
ask(question, opts) :: {:ok, answer, context} | {:error, term()}
question
string
required
The question to ask
opts
keyword
Ask options:
repo
module
required
The Ecto repo to use
llm
term
required
LLM implementing the Arcana.LLM protocol (e.g., "openai:gpt-4o-mini")
limit
integer
default:"5"
Maximum number of context chunks to retrieve
mode
atom
default:":semantic"
Search mode for context retrieval: :semantic, :fulltext, or :hybrid
source_id
string
Filter context to a specific source
threshold
float
default:"0.0"
Minimum similarity score for context chunks
collection
string
Filter context to a specific collection
collections
list(string)
Filter context to multiple collections
prompt
function
Custom prompt function: fn question, context -> system_prompt_string end
ok
{:ok, answer, context}
Returns a tuple with:
  • answer - The LLM’s response string
  • context - List of context chunks used
Example:
# Basic question answering
{:ok, answer, context} = Arcana.ask(
  "What are the benefits of Elixir?",
  repo: MyApp.Repo,
  llm: "openai:gpt-4o-mini"
)

IO.puts(answer)
# "Based on the documentation, Elixir offers several benefits..."

# With custom prompt
{:ok, answer, _context} = Arcana.ask(
  "Summarize the key features",
  repo: MyApp.Repo,
  llm: my_llm_config,
  collection: "product_docs",
  limit: 3,
  prompt: fn question, context ->
    """
    You are a technical writer. Be concise and use bullet points.
    Answer based on the following documentation excerpts.
    
    Question: #{question}
    """
  end
)

delete/2

Deletes a document and all its associated chunks from the knowledge base.
delete(document_id, opts) :: :ok | {:error, :not_found}
document_id
string
required
UUID of the document to delete
opts
keyword
repo
module
required
The Ecto repo to use
ok
:ok
Returns :ok if the document was successfully deleted
error
{:error, :not_found}
Returns an error if the document doesn’t exist
Example:
# Delete a document
:ok = Arcana.delete(
  "550e8400-e29b-41d4-a716-446655440000",
  repo: MyApp.Repo
)

# Handle not found
case Arcana.delete(document_id, repo: MyApp.Repo) do
  :ok -> Logger.info("Document deleted")
  {:error, :not_found} -> Logger.warn("Document not found")
end

Configuration Functions

config/0

Returns the current Arcana configuration.
config() :: map()
Example:
config = Arcana.config()
IO.inspect(config)
# %{
#   embedder: {Arcana.Embedders.OpenAI, [model: "text-embedding-3-small"]},
#   chunker: {Arcana.Chunkers.Simple, []},
#   graph_enabled: true
# }

embedder/0

Returns the configured embedder as a {module, opts} tuple.
embedder() :: {module(), keyword()}

chunker/0

Returns the configured chunker as a {module, opts} tuple.
chunker() :: {module(), keyword()}

graph_enabled?/1

Checks whether GraphRAG is enabled for the given options.
graph_enabled?(opts) :: boolean()