What We Learned from Three Clinical AI Pilots (The Good, the Bad, and the Unexpected)
Last year I worked with three different health services on clinical AI pilots. All three completed. One succeeded and is now in production rollout. One was discontinued. One is still being evaluated, and honestly, I’m not sure which way it will go.
Here’s what we learned.
Pilot 1: ED Triage AI — The Success
A metropolitan public hospital with a chronically overcrowded emergency department. The goal was simple: use AI to improve triage accuracy and identify patients who needed urgent attention faster.
What we did right:
From the start, we framed this as a nursing-led initiative, not an IT project. The Director of Nursing was the executive sponsor. The clinical lead was a senior triage nurse. IT played a supporting role.
This mattered because triage nurses were the users. Getting their buy-in wasn’t an afterthought—it was the strategy.
We ran the pilot in shadow mode for six weeks. The AI made triage recommendations, but nurses didn’t see them. We collected data on how often AI recommendations matched nurse assessments.
The results were encouraging: 87% agreement, and in the cases where they disagreed, AI was identifying high-acuity patients that nurses had triaged as less urgent about 60% of the time.
What we did wrong:
We underestimated the workflow integration challenges. The AI system displayed results on a separate screen. That doesn’t sound like a big deal, but in a busy ED, asking nurses to look at an extra screen was asking too much. Usage dropped significantly in the first two weeks of live operation.
We had to go back and work with the vendor on integration into the existing triage documentation system. That took eight weeks and cost more than we’d budgeted.
The outcome:
After the integration fix, usage climbed. We’re now seeing consistent improvements in early identification of sepsis and cardiac events. The hospital is rolling out to its second ED, with plans for system-wide deployment.
The key success factor: clinical ownership from day one.
Pilot 2: Pathology AI — The Failure
A private pathology provider wanted to use AI for initial screening of histopathology slides. The goal was to increase pathologist productivity by flagging slides that clearly showed no abnormality.
What went wrong:
The fundamental problem was cultural, not technical. Pathologists saw the AI as a threat to their professional judgment. Some worried about liability if they missed something the AI had flagged. Others felt their expertise was being devalued.
We did stakeholder engagement. We had workshops. We emphasised that AI was a tool, not a replacement. It didn’t matter. Resistance was deep.
When we launched the pilot, pathologists found workarounds. They’d mark slides as reviewed by AI but then ignore the AI assessment completely. Usage metrics looked fine; actual adoption was near zero.
The unexpected lesson:
The technical performance of the AI was actually good. In validation testing, it identified low-yield slides with high accuracy. The technology worked.
But technology that nobody uses is worthless.
I should have pushed harder for a smaller initial group of pathologists who were genuinely interested. Instead, we tried to pilot across a whole department, and the sceptics set the tone.
The outcome:
The pilot was discontinued. The pathology provider hasn’t abandoned AI entirely, but they’re taking a much slower approach—starting with individual pathologists who actively want to experiment.
The key failure factor: underestimating professional culture change.
Pilot 3: Medication Safety AI — The Uncertain One
A regional health service implemented an AI system to detect potential medication errors and adverse drug interactions that might be missed by standard pharmacy checks.
What made this complicated:
The AI caught real problems. Within the first month, it flagged three significant potential interactions that had passed through standard processes.
But it also generated a lot of noise. The false positive rate was higher than the vendor had suggested. Pharmacists were spending time reviewing AI alerts that turned out to be clinically insignificant.
The trade-off we couldn’t resolve:
High sensitivity (catching more real problems) comes with more false positives (more noise for pharmacists to deal with). We could tune the threshold, but it was a trade-off, not a solution.
At the current threshold:
- We’re catching real issues
- Pharmacists are frustrated with alert fatigue
- Net effect on safety is probably positive, but we can’t prove it statistically yet
At a less sensitive threshold:
- We’d miss some real issues (unacceptable)
- Pharmacists would be happier
- The intervention would be harder to justify
Where we are now:
Six months in, the governance committee is divided. Clinical pharmacy leadership wants to continue, arguing that the three real catches justify the noise. Pharmacy staff satisfaction has dropped, and there’s concern about the workload impact.
I genuinely don’t know what the right answer is. We’re collecting more data to try to quantify the safety benefit more precisely.
The lesson (so far):
Some AI implementations don’t have clean success/failure outcomes. You end up managing trade-offs indefinitely.
What These Pilots Have in Common
Despite different outcomes, I noticed common patterns:
Clinical ownership determines adoption. When clinicians drove the initiative, it worked. When IT or executive sponsors drove it, clinical resistance was harder to overcome.
Workflow integration is harder than anyone expects. Budget twice what the vendor says for integration. Then add contingency.
Performance in controlled testing doesn’t predict real-world performance. Your clinical environment, your data, your workflows—they’re different from the vendor’s test environment.
Change management isn’t optional. It’s actually the majority of the work, not a supplementary activity.
Governance structures need to exist before go-live. Building governance after the fact is too late.
Advice for Your Next Pilot
Based on these experiences, here’s what I’d recommend:
Start smaller than you think. One unit, one department, one use case. Prove it works there before expanding.
Find clinical champions who actually want this. Not clinicians who’ve been volunteered by their manager. Clinicians who are genuinely curious about AI.
Define success criteria before you start. What metrics matter? What level of performance would trigger discontinuation?
Plan for workflow integration from day one. Don’t assume the vendor’s standard integration will work.
Budget for the unexpected. Our pilots all cost more than estimated. Plan for that.
Accept uncertainty. Not every pilot has a clear outcome. You might end up managing trade-offs rather than declaring victory or defeat.
AI pilots are learning experiences. The goal isn’t just to evaluate a specific technology—it’s to build organisational capability for AI adoption more broadly.
Even the failed pathology pilot taught us something valuable about how to approach clinical AI implementation. That knowledge will inform future initiatives.
Dr. Rebecca Liu is a health informatics specialist and former Chief Clinical Information Officer. She advises healthcare organisations on clinical AI strategy and implementation.