MemoryKit Docs

RAG (Retrieval-Augmented Generation)

A technique that combines information retrieval with LLM text generation. Instead of relying solely on the model's training data, RAG first searches your stored content for relevant context, then passes that context to an LLM to generate an accurate answer. MemoryKit implements RAG through the Query and Chat endpoints.

Embeddings

Dense numerical vector representations of text. MemoryKit automatically generates embeddings for each chunk of your content, enabling semantic similarity search. You never need to manage embeddings directly — they are created and indexed behind the scenes during ingestion.

Chunks

Segments of a memory's content created during ingestion. Large content is split into smaller, semantically meaningful pieces so that retrieval can return precise, relevant passages rather than entire documents. Chunking is automatic and optimized for retrieval quality.

Hybrid Search

A retrieval strategy that combines two complementary search methods: vector similarity search (finding semantically similar content) and full-text search (matching exact keywords and phrases). MemoryKit automatically combines both and applies reranking to produce the best results. See the Search guide.

Vector Similarity

A search technique that compares the embedding vectors of your query and stored chunks using mathematical distance metrics (like cosine similarity). Content that is semantically similar — even if it uses different words — will have vectors that are close together. This powers the fast retrieval mode.

Reranking

A second-pass scoring step that re-evaluates search results using a cross-encoder model for higher precision. After the initial retrieval (vector and/or full-text), a reranker scores each result against the original query more carefully. This is used in balanced and precise query modes.

SSE (Server-Sent Events)

A standard HTTP protocol for streaming data from server to client over a single long-lived connection. MemoryKit uses SSE for real-time streaming of RAG responses. The client receives events (text, sources, usage, done, error) as they are generated. See the Streaming guide.

Knowledge Graph

An interconnected representation of relationships between memories and their content. When includeGraph: true is passed to a search request, MemoryKit traverses these connections to find related content that might not match the query directly but is contextually relevant. Used in precise retrieval mode.

Smart Ingestion

MemoryKit's automatic content analysis during memory creation. When you create a memory without specifying title, tags, type, or language, Smart Ingestion uses an LLM to extract these fields automatically. You can override any auto-extracted field by providing your own values.

Cursor-based Pagination

A pagination method used by all list endpoints (GET /v1/memories, GET /v1/chats). Instead of page numbers, the API returns a cursor token and a has_more boolean. Pass the cursor to the next request to fetch the next page. This approach handles real-time data correctly — new items won't cause duplicates or skipped results.

Memory

The core storage unit in MemoryKit. A memory holds text content along with metadata (title, tags, type, custom key-value pairs). During ingestion, the content is automatically chunked, embedded, and indexed for retrieval. See the Memories guide.

Chat Session

A conversational thread managed by MemoryKit. Each chat session maintains message history and uses RAG to generate context-aware responses. Chats can be scoped to a specific user via userId. See the Chats guide.

Glossary