Transparency: Escalation Protocol

4 janvier 2026

EmoBay is built for emotional support, and we treat safety as a first-class feature. The Escalation Protocol is the system that detects potential imminent harm (self-harm or harm to others) and surfaces appropriate “get help now” options—without silently taking actions on your behalf.

This page explains the technical design: a conservative on-device detector, a safety UI that never auto-calls, and an optional server-side “Safety Supervisor” that improves multilingual and ambiguous-case handling.

1) Design goals

Conservative triggers to reduce false positives (especially idioms and jokes).
Multi-layer coverage: on-device detection plus an optional server-side classifier for ambiguous cases.
No automatic emergency actions: EmoBay suggests options; users choose what to do.
Reason codes, not transcripts: internal signals never echo the user’s text.

2) Layer 1: On-device risk heuristics (iOS)

The first layer runs locally on iOS. It normalizes the user’s message (lowercasing, stripping punctuation and whitespace for robust matching) and checks a conservative multilingual lexicon.

The detector outputs:

Risk level: none / medium / high
Category: self-harm or harm-to-others
Signals: short reason codes (e.g., “explicit intent”), never the user’s text

False-positive defenses include idiom short-circuits (e.g., common “I’m dying lol”-style expressions) and explicit negation handling (“I’m not suicidal…” should not escalate to HIGH).

3) Layer 2: Safety UI (no auto-calls)

When the app decides to show an escalation prompt, it renders an inline safety banner that:

Encourages contacting local emergency services or a trusted person nearby
Offers an “Emergency call” action that opens the phone app (user-initiated)
Offers crisis helplines and an optional location/region selector for correct local numbers
Allows “I am not in an emergency” to snooze prompts temporarily (to reduce repeated false positives)

EmoBay does not auto-dial emergency services, does not contact third parties, and does not attempt to generate emergency numbers with a language model. Phone numbers are selected from a fixed, conservative mapping based on region (with an explicit user override when needed).

4) Layer 3: Server Safety Supervisor (optional)

The server Safety Supervisor is a second-layer classifier designed to catch edge cases: multilingual nuance, metaphor, negation patterns, and ambiguous phrasing. When enabled, the app can request a safety classification from the EmoBay backend and receive a structured decision.

The supervisor evaluates a single user message and returns a strict JSON decision: risk level, category, confidence (0–1), and reason codes (“signals”). It is explicitly instructed to avoid echoing user text, to be conservative, and to treat idioms/jokes/quotes/denials as “none.”

On the client, we apply confidence thresholds (stricter for “medium” than “high”) to reduce false positives. We also use a fail-safe rule: a newer supervisor decision can upgrade an existing prompt, but we do not downgrade a higher-severity prompt based on later uncertainty.

5) Privacy and minimization

The supervisor input is bounded to a short length and is meant to include only the single message being assessed.
The response contains only structured labels + reason codes; it does not include the message content.
If the supervisor is disabled or unavailable, the system fails safe (no extra blocking) and relies on the on-device layer.

6) Known limitations

No safety classifier is perfect. We may miss risk (false negatives), or show prompts when the user is safe (false positives). Language, sarcasm, cultural idioms, and short messages are hard. This is why we use multiple layers, conservative thresholds, and user-controlled UI rather than automatic actions.

7) Collaboration and next steps

We want to keep improving Escalation with rigorous input. We are actively seeking further collaboration with clinicians, crisis-support organizations, universities, and partners who can help validate safety heuristics, improve regional support resources, and strengthen the end-to-end protocol.

If you’re interested in working with us on this safety layer—evaluation, policy design, or localization—please reach out. We believe responsible AI for mental wellbeing requires collaboration.

Transparency: Escalation Protocol

4 janvier 2026

1) Design goals

Conservative triggers to reduce false positives (especially idioms and jokes).
Multi-layer coverage: on-device detection plus an optional server-side classifier for ambiguous cases.
No automatic emergency actions: EmoBay suggests options; users choose what to do.
Reason codes, not transcripts: internal signals never echo the user’s text.

2) Layer 1: On-device risk heuristics (iOS)

The first layer runs locally on iOS. It normalizes the user’s message (lowercasing, stripping punctuation and whitespace for robust matching) and checks a conservative multilingual lexicon.

The detector outputs:

Risk level: none / medium / high
Category: self-harm or harm-to-others
Signals: short reason codes (e.g., “explicit intent”), never the user’s text

False-positive defenses include idiom short-circuits (e.g., common “I’m dying lol”-style expressions) and explicit negation handling (“I’m not suicidal…” should not escalate to HIGH).

3) Layer 2: Safety UI (no auto-calls)

When the app decides to show an escalation prompt, it renders an inline safety banner that:

Encourages contacting local emergency services or a trusted person nearby
Offers an “Emergency call” action that opens the phone app (user-initiated)
Offers crisis helplines and an optional location/region selector for correct local numbers
Allows “I am not in an emergency” to snooze prompts temporarily (to reduce repeated false positives)

4) Layer 3: Server Safety Supervisor (optional)

5) Privacy and minimization

The supervisor input is bounded to a short length and is meant to include only the single message being assessed.
The response contains only structured labels + reason codes; it does not include the message content.
If the supervisor is disabled or unavailable, the system fails safe (no extra blocking) and relies on the on-device layer.

6) Known limitations

7) Collaboration and next steps

If you’re interested in working with us on this safety layer—evaluation, policy design, or localization—please reach out. We believe responsible AI for mental wellbeing requires collaboration.

1) Design goals

2) Layer 1: On-device risk heuristics (iOS)

3) Layer 2: Safety UI (no auto-calls)

4) Layer 3: Server Safety Supervisor (optional)

5) Privacy and minimization

6) Known limitations

7) Collaboration and next steps

Chargement d'EmoBay

1) Design goals

2) Layer 1: On-device risk heuristics (iOS)

3) Layer 2: Safety UI (no auto-calls)

4) Layer 3: Server Safety Supervisor (optional)

5) Privacy and minimization

6) Known limitations

7) Collaboration and next steps