MemoryKit

Query (RAG)

Retrieve relevant context and generate answers using an LLM.

Query memories

POST /v1/memories/query — Retrieves relevant context and generates an answer using an LLM. Three modes control the speed/quality tradeoff.

Query modes

ModeRetrievalBest for
fastVector onlyLow-latency chatbots
balancedVector + rerankingMost use cases (default)
preciseVector + full-text + graph + rerankingKnowledge-heavy apps
const answer = await mk.memories.query({
  query: "Summarize our Q4 goals",
  mode: "balanced",
  maxSources: 5,
  instructions: "Be concise. Use bullet points.",
});
 
console.log(answer.answer);
console.log(answer.sources.length, "sources used");
console.log(answer.usage.tokens_used, "tokens");

Parameters

ParameterTypeRequiredDescription
querystringYesThe question to answer
modestringNofast, balanced (default), or precise
maxSourcesnumberNoMaximum number of sources to use
instructionsstringNoSystem instructions for the LLM
userIdstringNoScope retrieval to a specific user

On this page