Transparency: Memory

4 janvier 2026

At EmoBay, Memory is an opt-in personalization layer designed to help the companion stay consistent across time—without requiring you to repeat stable details every session. This page is a technical disclosure of what we store, what we compute, and what controls you have.

This is not “secret tracking” and it’s not magic. Memory is implemented as a combination of (1) a small set of user-approved structured Memory entries, and (2) a derived semantic index over your own chat history to retrieve relevant context when you ask a related question later.

1) What Memory is (and isn’t)

Memory entries are short, factual summaries of user-shared context (for example: “Prefers to be called Jun,” “Has finals in January,” “Working night shifts this month”). We intentionally keep Memory entries concise and reviewable.

Memory is not a hidden profile, and it is not used to infer sensitive attributes. We also avoid turning momentary feelings into permanent facts. If something changes, Memory can be updated or removed.

2) Consent, eligibility, and controls

Memory is off by default and requires explicit consent. Guest accounts do not use Memory. When enabled, we store a consent timestamp server-side and treat that as the gate for any Memory writes.

You can toggle Memory off at any time. Turning it off stops new Memory updates and stops Memory retrieval being used to personalize responses. Turning Memory off does not automatically delete your past data—deletion requires an explicit action.

3) Data we store

Memory is implemented using two related (but distinct) data layers:

(A) Memory entries (structured): stored per user, each containing a short summary, an optional category (e.g., work, hobby), optional confidence metadata, and timestamps. These entries are what you review and manage as “Memory.”

(B) Semantic search index (derived): to retrieve relevant context from your history, we compute text embeddings for chat messages and store them in a per-user index. This index is a derived representation of chat text—useful for similarity search, and not intended to be human-readable.

High-level flow

You chat normally
(If Memory is enabled) the system may extract a concise factual Memory entry
Messages are embedded server-side and written into a per-user semantic index
Later, when you ask something related, we retrieve top-matching snippets from your history
Only those snippets are provided back to the chat model as additional context

4) How Memory entries are created/updated

When Memory is enabled, the assistant can call a dedicated server-side “Memory extraction” tool. That tool is constrained to create, update, or delete Memory entries using short summaries and predefined categories. Importantly, the tool checks your Memory setting before doing anything.

This tool-based design makes Memory updates auditable at the system layer and keeps the “write to database” capability separate from the free-form chat model response.

5) How retrieval works (semantic search)

To retrieve relevant context from a large history, we use embeddings-based similarity search. In production, that means:

We embed the user’s current question (server-side).
We search the user’s embedding index for the most similar messages.
We apply a similarity threshold and return only the top matches (top-k).

When the database supports pgvector, we use a vector column and an HNSW indexfor fast approximate nearest-neighbor search. When pgvector is not available, the system automatically falls back to a JSON-based embedding store and computes similarity in application code. This fallback keeps functionality working while trading off performance.

6) Indexing, backfill, and progress

New messages are indexed incrementally. For existing histories, we backfill the embedding index in small batches so we don’t overload the system. Indexing skips tool-only messages and other empty content; those are marked as processed to avoid retry loops.

We also compute a simple indexing progress signal: how many eligible messages exist vs how many have an embedding. This helps the app communicate when Memory retrieval will be most accurate.

7) Security and privacy safeguards

Memory logic runs server-side and is gated by your Memory setting (consent + toggle).
Access is scoped by authenticated user identity. Guest accounts are blocked from Memory endpoints.
We avoid logging raw message content in production during indexing and backfill workflows.

Memory is designed to be transparent and user-controlled. If you choose to delete your Memory entries, those records are removed. If you choose to purge the semantic index, derived embeddings are removed while keeping your underlying chat messages intact. Account deletion is the strongest option and removes associated data.

8) Limitations (what Memory cannot guarantee)

Memory retrieval is similarity-based. It can surface the wrong snippet if your wording is ambiguous, if the signal is weak, or if the underlying embedding model misses nuance (especially across languages, sarcasm, or code-switching). We treat Memory as assistive context—not as a source of truth.

Transparency: Memory

4 janvier 2026

1) What Memory is (and isn’t)

2) Consent, eligibility, and controls

Memory is off by default and requires explicit consent. Guest accounts do not use Memory. When enabled, we store a consent timestamp server-side and treat that as the gate for any Memory writes.

3) Data we store

Memory is implemented using two related (but distinct) data layers:

High-level flow

You chat normally
(If Memory is enabled) the system may extract a concise factual Memory entry
Messages are embedded server-side and written into a per-user semantic index
Later, when you ask something related, we retrieve top-matching snippets from your history
Only those snippets are provided back to the chat model as additional context

4) How Memory entries are created/updated

This tool-based design makes Memory updates auditable at the system layer and keeps the “write to database” capability separate from the free-form chat model response.

5) How retrieval works (semantic search)

To retrieve relevant context from a large history, we use embeddings-based similarity search. In production, that means:

We embed the user’s current question (server-side).
We search the user’s embedding index for the most similar messages.
We apply a similarity threshold and return only the top matches (top-k).

6) Indexing, backfill, and progress

We also compute a simple indexing progress signal: how many eligible messages exist vs how many have an embedding. This helps the app communicate when Memory retrieval will be most accurate.

7) Security and privacy safeguards

Memory logic runs server-side and is gated by your Memory setting (consent + toggle).
Access is scoped by authenticated user identity. Guest accounts are blocked from Memory endpoints.
We avoid logging raw message content in production during indexing and backfill workflows.

1) What Memory is (and isn’t)

2) Consent, eligibility, and controls

3) Data we store

High-level flow

4) How Memory entries are created/updated

5) How retrieval works (semantic search)

6) Indexing, backfill, and progress

7) Security and privacy safeguards

8) Limitations (what Memory cannot guarantee)

Chargement d'EmoBay

1) What Memory is (and isn’t)

2) Consent, eligibility, and controls

3) Data we store

High-level flow

4) How Memory entries are created/updated

5) How retrieval works (semantic search)

6) Indexing, backfill, and progress

7) Security and privacy safeguards

8) Limitations (what Memory cannot guarantee)