Query (RAG)
Retrieve relevant context and generate answers using an LLM.
Query memories
POST /v1/memories/query — Retrieves relevant context and generates an answer using an LLM. Three modes control the speed/quality tradeoff.
Query modes
| Mode | Retrieval | Best for |
|---|---|---|
fast | Vector only | Low-latency chatbots |
balanced | Vector + reranking | Most use cases (default) |
precise | Vector + full-text + graph + reranking | Knowledge-heavy apps |
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | Yes | The question to answer |
mode | string | No | fast, balanced (default), or precise |
maxSources | number | No | Maximum number of sources to use |
instructions | string | No | System instructions for the LLM |
userId | string | No | Scope retrieval to a specific user |