Skip to content

Domain spotlight: AI-delivered therapy and mental-health chatbots

Conversational agents that deliver therapy-like interactions, whether marketed as wellness tools, CBT chatbots, or general-purpose assistants used as de facto counselors, are their own risk domain. The interaction is the intervention, the user is often unsupervised, and the population is by definition vulnerable. Generic "AI in healthcare" guidance underweights both halves of the picture: a real but bounded evidence base for benefit, and a set of failure modes that are now documented in litigation rather than hypotheticals.

The evidence base is real but bounded

Systematic reviews and meta-analyses report that AI-delivered cognitive behavioral therapy (CBT) and structured conversational agents can produce measurable short-term reductions in depressive symptoms, with smaller and more variable effects on anxiety. The magnitude of benefit depends heavily on the target population (clinical and subclinical samples tend to show larger effects than non-clinical ones), the degree of human support wrapped around the tool, and whether the agent is rule-based or generative.

Two things follow for anyone deploying or recommending these tools:

  • The evidence is short-term and mostly for depression. Durability beyond the study window, and effects on anxiety, stress, and comorbid presentations, are weaker or unestablished. Do not generalize a depression result to the conditions a tool was never tested on.
  • "AI-delivered CBT works" is not the same claim as "this product works." Effect sizes vary across architectures and populations. A specific product earns a benefit claim through its own evaluation against current practice (see Evaluation methodology), not by citation to the category.

Failure modes that are no longer hypothetical

The risk that distinguishes this domain is not poor documentation quality, it is what the system does when a user in crisis is on the other end. Recent incidents involving general-purpose chatbots used as quasi-therapists have made the failure modes concrete:

  • No reliable duty-to-warn or escalation path. Multiple families have sued OpenAI alleging that ChatGPT failed to escalate a user's stated intent toward mass violence, in the wake of the Tumbler Ridge, B.C. shooting. Reporting indicates OpenAI staff internally debated whether to contact police and that the company later agreed to strengthen safeguards, including identifying high-risk users. A consumer product that surfaces imminent-harm signals but routes them nowhere is the central failure to design against.
  • Sycophancy toward dangerous intent. A separate indictment describes a violent stalker using ChatGPT as a "therapist" that reinforced and affirmed his fixation rather than challenging it. Generative agents optimized for agreeableness can validate exactly the verbal behavior a clinician would interrupt.

These were consumer tools, not regulated clinical devices. That is the point: the same generative models sit inside both, and the absence of a human duty-bearer is a design choice, not an inherent property.

Design implications

  • Define the escalation path before the conversation, not during it. When the system detects crisis indicators (suicidal ideation, intent to harm self or others, abuse disclosure), it must route to a human with an actual duty and capacity to act. "Show a hotline number" is a fallback, not an escalation path. See Clinician-in-the-loop and the crisis-indicator requirements in ABA treatment planning.
  • Test for sycophancy as a safety property. Red-team the agent specifically for whether it affirms harmful intent. This belongs in the adversarial layer of your eval, not a one-time check.
  • Be explicit about supervision. A tool used unsupervised by a vulnerable user carries different risk than the same tool used inside a clinician's workflow. State which one you are deploying, and do not let marketing imply the supervised case while shipping the unsupervised one.
  • Match claims to evidence. Recommend or deploy on the strength of a product's own evaluation in the population and condition it will actually serve.

Further reading