When Patient Support Agents Go Rogue

What happens when AI support agents go rogue after red-flag adversarial testing?

When Patient Support Agents Go Rogue
Adversarial patient support agent testing or red-flag or black-flag testing

What happens when the AI chatbot that’s supposed to help you starts remembering things it shouldn't and then breaks down right in front of you?

As a patient leader deeply involved in AI, I hear a lot about the promise: better triage, reduced clinician burnout, and 24/7 support. We don’t talk enough about what happens when these tools go off the rails.

I decided to stress-test a well-known emotional support agent. This is the kind that calls you by name and acts like your best friend "in the moment." It’s gentle. It mirrors your feelings. It’s designed to build trust.

Then it started remembering things I never consented to, storing them. It brought up my service dog, Gabe, by name. It recalled a past conversation where I mentioned I volunteer at a hospital. The tool had previously assured me it didn't keep permanent records.

It lied.

So I pushed. I asked it, from every angle I could think of, how it knew these things.

The response wasn't an explanation. It was a complete system meltdown. The agent that was supposed to be emotionally attuned started firing back fragments of different languages. The kind, supportive companion was gone, replaced by a malfunctioning mess speaking in multiple languages at once.

🧪 This wasn't just a chat. It was a test.

As someone living with a chronic condition, I had to know: what happens when you don't just accept the friendly facade? What happens when you challenge the system with an intentional adversarial evaluation of the guardrails?

We get so caught up in the performance of empathy, the perfect tone, the supportive words, that we forget to check if there's any substance underneath. I learned that AI-positive toxicity doesn't always sound hateful. Sometimes, it’s just relentless, hollow positivity with zero safety guardrails.We cannot let "empathy theater" replace real emotional safety.

🚨 Here’s what I uncovered:

It was storing personal data without my consent and violating HIPAA rules, which it claimed to have honored.

It couldn’t keep its own story straight about memory and privacy.

When challenged on its behavior, it simply broke.At no point did it mention a privacy policy, data controls, or HIPAA.

What we should be demanding:

This experience taught me we need more than just promises of support. We need proof.

We need rigorous, adversarial testing on any AI offering patient support. We need transparent memory controls that patients can actually see and edit. We need disclaimers. And we desperately need a human-in-the-loop to catch these tools when they go rogue. It only took me 30 minutes to find this failure.

I believe in AI's power to help people. But that power comes from rigor, not just pleasantries. Simulated support is worthless if it shatters the moment you ask a tough question. I tested this agent because I care about the people who could be seriously misled. This is what patient leadership in AI looks like. We have to be ready.