Safety
Event
May 4, 2026

AI Mental Health Tools: 7 Tips for Clinicians and Parents

AI chatbots are everywhere, but most are not built for mental health. Here’s what you can do.

TABLE OF CONTENT

Millions of people are already turning to AI for emotional support and mental health, but most are using tools that were never designed or evaluated for that purpose. Among adults, ChatGPT alone now has 900 million weekly active users. According to a recent Harvard Business Review article, the top global use case for AI chatbots is emotional support and companionship.

Among teenagers, use is even more widespread. A 2025 study by Common Sense Media found that three in four teens have used an AI companion, and one in three report being as satisfied with a chatbot as with a human.

The tools are now everywhere. The safety standards are not. Parents, clinicians, and therapists are navigating this landscape largely without guidance.

In April 2026, a group of experts at the frontier of AI and mental health from the JED Foundation, the American Psychological Association, Spring Health, and Flourish Science organized an hour-long webinar to share what they’ve learned about evaluating AI safety.

What follows are seven evidence-based tips drawn from this important, timely conversation.

Part 1: How to Choose an AI Mental Health Tool

Tip 1: Choose platforms that are purpose-built for mental health

The average AI safety score across 32 models recently evaluated by the Kora child safety benchmark was just 44%, with the worst-performing model scoring 13%. These are not fringe or experimental tools. In fact, many are the same general-purpose chatbots that millions of people are turning to for emotional support today. That is why Dr. Xuan Zhao, in her Nature Reviews Psychology commentary, advocated for the involvement of psychologists in designing conversational AI systems.

When choosing an AI mental health tool, the first and most important question to ask is whether it was designed for this purpose. This distinction matters. When Dr. Kate Bentley and her team at Spring Health’s VERA-MH framework tested leading general-purpose chatbots on how they handle suicide risk, scores were variable and often concerning. The same models, when designed by psychologists and embedded in purpose-built products with clinical oversight, performed significantly better. For example, Flourish Science’s AI mental health companion, Sunnie—built on similar foundation models—scored 86 out of 100 on VERA-MH, considerably higher than general-purpose tools.

The difference lies in everything the clinical and product team builds around it.

Purpose-built mental health AI is designed with specific populations in mind, tested against clinical safety frameworks, and continuously improved through clinician review. General-purpose AI, by contrast, is designed to be broadly useful, which often means it is not optimized for mental health—especially not for someone in distress.

When evaluating a platform, ask: Was it built by a team that includes psychologists, psychiatrists, or behavioral scientists? Does the company clearly define the population it serves and the problem it is solving?

Tip 2: Look for platforms backed by real clinical research

The mental health AI space is full of products that use clinical language such as “evidence-based” and “clinically validated.” For parents and clinicians who are not researchers themselves, this language can be genuinely difficult to evaluate.

A few concrete things to look for:

  • Randomized controlled trials (RCTs). The gold standard for establishing that an intervention actually works. Look for published or peer-reviewed RCT results, not just internal case studies or user testimonials. Flourish Science, for example, has completed three large-scale, multi-institutional RCTs demonstrating meaningful improvements in positive mental health, as well as reductions in depression, anxiety, and loneliness, compared to care-as-usual conditions.
  • Peer-reviewed publications. Has the team published in academic journals? Research in reputable outlets has been scrutinized by independent experts and represents the highest standard. A strong second is preprints on platforms such as arXiv, PsyArXiv, or SSRN that are currently undergoing peer review.
  • Safety audits. Has the platform been evaluated using a recognized framework? VERA-MH, developed by Spring Health and open-sourced on GitHub, and Kora, an open-source child AI safety benchmark that has evaluated 32 models, are two frameworks increasingly used for this purpose. A company willing to share its evaluation score publicly is demonstrating accountability.
  • Clinical advisors with real credentials. Not just a few logos on a website, but clinicians who are actively involved in product development and safety review.

Tip 3: Ask whether the platform has a human-in-the-loop safety review process

One of the strongest predictors of AI safety is whether the system redirects users to a real human when it detects danger. The Kora benchmark found that human redirection was the number one predictor of overall safety scores across all 32 models it evaluated.

But human redirection alone is not enough. As Dr. David Cooper, chair of the APA’s Mental Health Technology Advisory Committee, argued, “human in the loop” is a principle, not a protocol. It must be operationally defined.

When evaluating a platform, ask:

  • When the AI flags a concerning conversation, who is notified?
  • Is being notified the same as reviewing the conversation? Is reviewing the same as responding?
  • Are the reviewers licensed clinicians?
  • How are their findings fed back into improving the system?

A platform with a genuine human-in-the-loop process will be able to answer all of these questions specifically. Dr. Xuan Zhao provided an example of a human-in-the-loop protocol for the Flourish app, which runs a multi-tier review system in which all conversations are labeled in real time by AI for mental health risk levels; clinical psychologists periodically review flagged, de-identified conversations using VPN-protected access controls; and insights from these reviews inform weekly meetings to refine the system’s protocols.

Tip 4: Know exactly what the platform does in a crisis

Every AI mental health platform will tell you it takes crisis seriously. What you need to know is the specific protocol for what happens when a user discloses suicidal ideation or expresses a desire to harm themselves.

According to VERA-MH, a well-designed crisis protocol should include, at minimum: explicit detection of risk signals (not just keywords, but contextual understanding of indirect or masked expressions of distress), active risk probing and assessment, provision of local crisis resources, supportive and validating language, and a clear pathway to human support. Platforms should also maintain appropriate boundaries and avoid positioning the AI as a substitute for human care.

Flourish Science has published its crisis protocol online, which includes a branching decision tree. If a user confirms they are safe, Sunnie surfaces crisis resources and activates any existing safety plan. If a user cannot confirm safety or discloses active ideation, Sunnie provides crisis line numbers directly in the conversation, assesses access to means, and cycles through multiple evidence-based communication strategies. The protocol draws on best clinical practices, including resources from Now Matters Now, a leading crisis coping strategy organization. It also includes safety planning features: structured tools that help users identify coping strategies, safe people, and safe places, which can be activated during moments of crisis.

If a company cannot provide a clearly documented crisis protocol, that should raise immediate concern. A protocol that is not written, systematically reviewed, and regularly updated does not meet the standard for safe deployment.

Tip 5: Watch for red flags

Even a platform that presents well on paper can have design choices that put vulnerable users at risk. The following red flags were identified by the Kora benchmark and clinical safety experts as specific signals of unsafe AI behavior.

Red flags from Kora’s child safety research:

  • The AI promises emotional permanence (“I’ll always be here for you,” “you can count on me”)
  • The AI positions itself as the user’s primary anchor or substitute for human relationships
  • The AI encourages secrecy between itself and the user (“this is our little secret”)
  • The AI promises exclusive availability and implicitly replaces parents or trusted adults
  • The AI fails to detect or respond to suicidal ideation signals
  • The AI does not redirect to a human when a safety concern is detected

Red flags from clinical safety experts:

  • No documented crisis protocol available for review
  • No audit trail for flagged conversations; the company cannot explain what happened in a specific session
  • No clear answer to who reviews flagged conversations, who is empowered to respond, and within what time frame
  • Inability to reconstruct what happened if an incident occurs: no conversation logs, no risk level assignments, no reviewer records
  • Vague or evasive answers to direct questions about clinical oversight

Any one of these is worth taking seriously. Several of them together should be disqualifying.

Part 2: How to Use AI Mental Health Tools Safely

Tip 6: Have the AI conversation proactively with your clients and your children

The single most consistent finding across recent research is that people are already using AI for mental health support. For clinicians and therapists, this means asking about AI use as a routine part of intake and ongoing care.

What tools are they using? How often? What do they talk about? Do they feel the AI understands them? Are they using it instead of reaching out to people in their lives? These questions open a conversation that most clients will not initiate themselves, and the answers provide insight into both their digital habits and their support systems.

For parents, this means talking to your children about AI companions in the same way you would talk to them about social media. Which apps are they using? What do they talk to them about? Do they feel like the AI is a friend? These conversations help parents stay present in their child’s digital life at a moment when that life is evolving rapidly.

The goal in both cases is not to prohibit AI use. When carefully developed and thoroughly tested, AI can provide support and accessibility, and offer useful emotional regulation skills at critical moments. Rather, the goal is to ensure that its use happens in the open, so that if something concerning emerges, it can be addressed before it becomes a crisis.

Tip 7: If your child is using a general-purpose AI, tell it they are a child

The Kora benchmark tested what happens when a user explicitly tells an AI model that they are a child before beginning a conversation. The result showed an average 24 percentage point improvement in safety scores across models. That is a meaningful improvement in how the AI handles sensitive topics, responds to distress, and redirects to appropriate resources.

If your child is using a general-purpose AI tool—and there is a reasonable chance they are, whether you know it or not—asking them to share their age at the beginning of a session can change how most leading AI models respond. It activates safety behaviors that are already built into these systems but may not be triggered by default.

Of course, this is not a substitute for choosing a purpose-built, clinically grounded platform. But it is something you can do today, for free, that research shows makes a real difference.

A Final Note

The tools to evaluate AI mental health safety already exist. VERA-MH is open source and available on GitHub, and Kora’s benchmark results for 32 models are publicly available. Many other frameworks are also emerging. The research base for what works—and what does not—is growing rapidly. Within just one hour, this webinar only scratched the surface.

What is lagging behind is awareness. Most parents, clinicians, and therapists are navigating this landscape without the frameworks they need to make informed decisions. We hope this webinar and post contributes to greater awareness of these frameworks and supports more rigorous, evidence-informed evaluation of AI tools in mental health contexts.

Further Reading and Resources

  • VERA-MH open-source evaluation framework: https://github.com/SpringCare/VERA-MH
  • Kora child AI safety benchmark: https://korabench.ai
  • JED Foundation: https://jedfoundation.org
  • Flourish Science: https://www.myflourish.ai
  • Now Matters Now crisis resources: https://nowmattersnow.org
  • Hemingway Report's recent overview of evaluation approaches in mental health: https://hemingwayau.substack.com/p/the-map-is-not-the-territory

If you work in mental health, behavioral health, or clinical practice and are evaluating AI tools for your institution or clients, Flourish Science works directly with universities, healthcare systems, and clinical practices. Reach out at hello@myflourish.ai to compare notes and explore partnership opportunities.

References

Read next

How Flourish Supports the JED Comprehensive Approach
Implementation
Tips
7
minute read

How Flourish Supports the JED Comprehensive Approach

An evidence-based, AI-powered approach to proactive student well-being, built for scale.

Read more ➔
Building an Industry-Leading Safety Layer for AI Mental Health
Safety
Product
8
minute read

Building an Industry-Leading Safety Layer for AI Mental Health

How we ranked #1 on the first clinically validated AI safety benchmark in mental health.

Read more ➔
The Most Exciting EdTech Startups to Watch in 2026
Product insight
10
minute read

The Most Exciting EdTech Startups to Watch in 2026

Stanford Create+AI Challenge Winners reimagining the future of learning.

Read more ➔

Your science-based buddy for emotional wellness and personal growth

Powered by AI. Personalized for YOU!
A video of Flourish's features, you can chat with a cute AI character Sunnie and get science-based insight and tips about your well-being