Healthcare AI is becoming accurate enough to be useful yet imperfect enough that physicians must still verify the output. Yet, as Corey Doctorow explained, is that “the story of AI being managed by a ‘human in the loop’ is a fantasy because humans are neurologically incapable of maintaining vigilance in watching for rare errors.” For example, TSA agents are great at detecting the water bottles travelers commonly leave in their bags. But so-called "Red Teams" of Homeland Security agents posing as passengers get weapons past TSA agents 95% of the time! Like all humans, physicians struggle to maintain attention without actively engaging. And we will struggle even more as AI becomes more reliable, less novel, and moves more to the background. Eli Ben-Joseph, the thoughtful CEO of Regard, whose widely used AI tools help physicians document and surface diagnoses, explained to Politico that “sometimes when our users got used to our product, they would start just kind of blindly trusting it.” In a new JAMA editorial [doi:10.1001/jama.2024.3620], UCSF’s Bob Wachter and colleagues explain that “the path forward rests on designing and deploying AI in ways that enhance human vigilance.” They outlined five options for promoting vigilance: 1. Using visual cues to highlight the degree of uncertainty (e.g., highlighting recommendations that are more likely to be erroneous). 2. Tracking physicians to see if they are (or are not) remaining vigilant (e.g., someone who accepts 100% of AI recommendations is not paying attention). 3. Reducing expectations that AI will boost productivity. 4. Introducing “deliberate shocks” to see if physicians are paying attention (analogous to the example above, how “red teams” will randomly place fake firearms into carry-on bags). 5. Shifting the paradigm so AI watches over clinicians, rather than vice versa (analogous to how spellcheckers only highlight potentially misspelled words). Each of these approaches must be evaluated in the real world. None will be perfect. At the same time, we must admit that we already exhibit automation bias without AI. For example, teaching physicians (myself included) rarely carefully read and edit our residents’ and fellows’ notes before signing off on them. The point is that, like all technology, the various forms of healthcare AI will have benefits and drawbacks (like automation bias). If we do not recognize and work to mitigate automation bias, physicians and other healthcare workers ultimately risk becoming a bunch of “OK-button-mashing automatons.” #healthcareai #automationbias #healthcareonlinkedin
Addressing Reliability Challenges in Medical AI
Explore top LinkedIn content from expert professionals.
Summary
Addressing reliability challenges in medical AI involves overcoming issues that arise from the probabilistic nature of AI systems and their impact on clinical decision-making, including automation bias and system accuracy. Ensuring patient safety requires balancing technological capabilities with human oversight and ethical governance.
- Highlight uncertainty clearly: Use visual indicators or alerts to emphasize areas where the AI's recommendations may be uncertain, prompting users to stay engaged and verify outputs.
- Establish strong governance: Create clear policies and frameworks for clinical oversight, ensuring AI tools are used responsibly and align with patient safety and ethical standards.
- Focus on reliability: Adopt data-driven strategies to analyze and address underlying system limitations, ensuring AI tools deliver consistent and accurate results.
-
-
OpenAI's GPT-5 launch positioned healthcare as "one of the preeminent uses of ChatGPT," emphasizing its complex medical reasoning capabilities. But clinical diagnosis transcends pattern recognition—it requires knowing what questions to ask and reasoning through the answers. AI models excel at responding to prompts but struggle with the meta-cognitive skills clinicians use daily: spotting red flags, recognizing missing context, and distinguishing urgent intervention from watchful waiting. The challenge isn't technical accuracy—it's clinical appropriateness, diagnostic reasoning, and understanding when to act. What concerns me about OpenAI's positioning is emphasizing broad accessibility over clinical governance. "Expert-level intelligence in everyone's hands" could be profoundly democratizing, but we must ensure these tools reflect true healthcare expertise—judgment, ethics, and accountability that algorithms cannot provide. Healthcare leaders must act now: Audit AI governance frameworks: Every patient-facing AI tool needs explicit clinical oversight, not just technical validation. This includes GPT-5 deployments in patient portals or telehealth platforms. Define institutional policies: Establish clear guidelines for staff and patient use of generative AI, covering decision support and education, with escalation protocols when AI outputs conflict with clinician judgment. Understand patient behavior: Patients will increasingly use AI to understand health conditions and prepare for visits. We should embrace this trend while studying its impact—does it improve clinical conversations and access, or delay care-seeking and introduce inappropriate care? AI's healthcare potential is enormous, but patient safety remains our north star. If tech companies won't embed transparent clinical governance, healthcare leaders must fill that gap. We cannot let the race to deploy generative AI erode the clinical rigor that protects patients. What clinical governance frameworks are you implementing for generative AI in your health system? https://lnkd.in/eBRdvqCH #HealthcareAI #AIGovernance #PatientSafety #ClinicalDecisionMaking #GPT5 #DigitalHealth #HealthTech #MedicalAI #HealthcareLeadership #AIEthics #HealthSystemStrategy #ClinicalOversight
-
Reliability Engineering > Software Engineering Building AI software that works 70% of the time? Anyone with access to an LLM can do that today. But pushing from 90% to 95%? From 95% to 97%? 97% to 98%? That final stretch of accuracy of AI agents represents a monumental engineering task and most teams aren't prepared for it. We've entered an era of non-deterministic systems. Traditional software was binary -- it either worked or it didn't. AI systems generate outputs probabilistically, introducing a fundamental shift. Software traditionally runs at 100% precision. But AI will always be wrong 𝘴𝘰𝘮𝘦𝘵𝘪𝘮𝘦𝘴. Even when an AI agent outperforms people at certain tasks, users still expect it to behave like deterministic software -- perfectly. This fundamental mismatch between AI's probabilistic nature and user expectations creates an entirely new engineering & product challenge. Most teams stuck at lower accuracy levels are playing whack-a-mole instead of addressing core architectural issues. Each incremental improvement requires more sophisticated approaches. Breaking through often requires completely rethinking how the system works. The required mindset shift is profound. Teams must embrace tight, data-driven iteration loops with comprehensive instrumentation. You need exhaustive logging of every input, output, and system state. Full audit trails become non-negotiable. Without this level of visibility and data collection, you're flying blind. It's not about features but how well they perform. Reliability used to be QA's job, something tacked on at the end. Now, with AI systems, it's the most critical engineering challenge. It requires dedicated teams with specialized skills in prompt engineering, evaluation design, and probabilistic systems. Reliability isn't just about uptime anymore but about consistent, dependable outputs across an infinite range of inputs. #AI #ReliabilityEngineering #HealthTechAI #HealthcareAI