Today, Radiology published our latest study on breast cancer. This work, led by Felipe Oviedo Perhavec from Microsoft’s AI for Good Lab and Savannah Partridge (UW/Fred Hutch) in collaboration with researchers from Fred Hutch , University of Washington, University of Kaiserslautern-Landau, and the Technical University of Berlin, explores how AI can improve the accuracy and trustworthiness of breast cancer screening. We focused on a key challenge: MRI is an incredibly sensitive screening tool, especially for high-risk women—but it generates far too many false positives, leading to anxiety, unnecessary procedures, and higher costs. Our model, FCDD, takes a different approach. Rather than trying to learn what cancer looks like, it learns what normal looks like and flags what doesn’t. In a dataset of over 9,700 breast MRI exams—including real-world screening scenarios—our model: Doubled the positive predictive value vs. traditional models Reduced false positives by 25% Matched radiologists’ annotations with 92% accuracy Generalized well across multiple institutions without retraining What’s more, the model produces visual heatmaps that help radiologists see and understand why something was flagged—supporting trust, transparency, and adoption. We’ve made the code and methodology open to the research community. You can read the full paper in Radiology https://lnkd.in/gc82kXPN AI won't replace radiologists—but it can sharpen their tools, reduce false alarms, and help save lives.
Improving AI Diagnostic Accuracy
Explore top LinkedIn content from expert professionals.
Summary
Advancements in artificial intelligence (AI) are revolutionizing diagnostic accuracy in healthcare by reducing errors, supporting clinicians, and improving patient outcomes. From enhancing breast cancer screenings to streamlining electronic health record analysis and creating collaborative AI-assisted diagnostic tools, these innovations are paving the way for a more precise and efficient healthcare system.
- Focus on understanding normal patterns: Use AI systems designed to recognize "normal" data and flag deviations, reducing false positives and building trust with clear visualizations for healthcare professionals.
- Integrate structured data: Combine structured data like lab results with unstructured information to improve the precision and breadth of AI-assisted diagnostic systems.
- Design for collaboration: Develop AI tools with transparency and trust at their core, enabling seamless human-AI teamwork and empowering clinicians to make more informed decisions efficiently.
-
-
Stanford researchers just made clinical information retrieval 70% faster and more accurate. Electronic health records (EHRs) hold critical patient insights. Yet, extracting key information from unstructured clinical notes remains a major challenge for AI. Large language models (LLMs) with retrieval-augmented generation (RAG) have helped, but they’re still slow and inefficient. CLEAR (CLinical Entity Augmented Retrieval) is a breakthrough AI pipeline that retrieves clinical information 𝗯𝗮𝘀𝗲𝗱 𝗼𝗻 𝗿𝗲𝗹𝗲𝘃𝗮𝗻𝘁 𝗺𝗲𝗱𝗶𝗰𝗮𝗹 𝗲𝗻𝘁𝗶𝘁𝗶𝗲𝘀 𝗿𝗮𝘁𝗵𝗲𝗿 𝘁𝗵𝗮𝗻 𝗲𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀. 1. 𝗥𝗲𝗱𝘂𝗰𝗲𝗱 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝘁𝗶𝗺𝗲 𝗯𝘆 𝟳𝟮%, cutting processing from 20.08s to 4.95s per note. 2. Achieved higher accuracy, with an F1 score of 0.90, 𝗼𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗶𝗻𝗴 𝘁𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗥𝗔𝗚 𝗮𝗻𝗱 𝗳𝘂𝗹𝗹-𝗻𝗼𝘁𝗲 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵𝗲𝘀. 3. 𝗥𝗲𝗱𝘂𝗰𝗲𝗱 𝗶𝗻𝗽𝘂𝘁 𝘁𝗼𝗸𝗲𝗻 𝘂𝘀𝗮𝗴𝗲 𝗯𝘆 𝟳𝟬%, making AI-based EHR retrieval significantly more efficient. 4. Improved precision in extracting 18 clinical variables, including substance use, mental health conditions, and radiology findings. 5. Enabled automated weak labeling for fine-tuning smaller AI models, achieving near-GPT-4 performance. CLEAR employs a multi-stage design combining zero-shot NER (via Flan-T5-XXL), entity augmentation (using UMLS ontologies and GPT-4), and domain-specific filtering (with Bio+Clinical BERT embeddings). This modular architecture enables the system to extract 18 clinical variables with high accuracy. Such a hybrid approach leverages both domain-agnostic and domain-specific models. This demonstrates that combining different model types can yield performance and efficiency gains over monolithic architectures. This is huge for practical applications of integrating into EHRs. Here's the awesome work: https://lnkd.in/gV7M942h Congrats to Ivan Lopez, Akshay Swaminathan, Robert Gallo, Nigam Shah, Jonathan H. Chen, and co! I post my takes on the latest developments in health AI – 𝗰𝗼𝗻𝗻𝗲𝗰𝘁 𝘄𝗶𝘁𝗵 𝗺𝗲 𝘁𝗼 𝘀𝘁𝗮𝘆 𝘂𝗽𝗱𝗮𝘁𝗲𝗱! Also, check out my health AI blog here: https://lnkd.in/g3nrQFxW
-
A must-read study in JAMA Network Open just compared a traditional diagnostic decision support system (DDSS), DXplain, with two large language models, ChatGPT-4 (LLM1) and Gemini 1.5 (LLM2), using 36 unpublished complex clinical cases. Key Findings: - When lab data was excluded, DDSS outperformed both LLMs: 56% vs. 42% (LLM1) and 39% (LLM2) in listing the correct diagnosis. - When lab data was included, performance improved for all: DDSS (72%), LLM1 (64%), LLM2 (58%). - Importantly, each system captured diagnoses that the others missed, indicating potential synergy between expert systems and LLMs. While DDSS still leads, the exponential improvement in #LLMs cannot be ignored. The study presents a compelling case for hybrid approaches—combining deterministic rule-based systems with the linguistic and contextual fluency of LLMs, while also incorporating structured data with standardized coding, such as LOINC codes and SNOMED International..etc The inclusion of structured data significantly enhanced diagnostic accuracy across the board. This validates the notion that structured and unstructured data must collaborate, not compete, to deliver better #CDS outcomes. #HealthcareonLinkedin #Datascience #ClinicalInformatics #HealthIT #AI #GenAI #ClinicalDecisionSupport
-
Superhuman AI agents will undoubtedly transform healthcare, creating entirely new workflows and models of care delivery. In our latest paper from Google DeepMind Google Research Google for Health, "Towards physician-centered oversight of conversational diagnostic AI," we explore how to build this future responsibly. Our approach was motivated by two key ideas in AI safety: 1. AI architecture constraints for safety: Inspired by concepts like 'Constitutional AI,' we believe systems must be built with non-negotiable rules and contracts (disclaimers aren’t enough). We implemented this using a multi-agent design where a dedicated ‘guardrail agent’ enforces strict constraints on our AMIE AI diagnostic dialogue agent, ensuring it cannot provide unvetted medical advice and enabling appropriate human physician oversight. 2. AI system design for trust and collaboration: For optimal human-AI collaboration, it's not enough for an AI's final output to be correct or superhuman; its entire process must be transparent, traceable and trustworthy. We implemented this by designing the AI system to generate structured SOAP notes and predictive insights like diagnoses and onward care plans within a ‘Clinician Cockpit’ interface optimized for human-AI interaction. In a comprehensive, randomized OSCE study with validated patient actors, these principles and design show great promise: 1. 📈 Doctors time saved for what truly matters: Our study points to a future of greater efficiency, giving valuable time back to doctor. The AI system first handled comprehensive history taking with the patient. Then, after the conversation, it synthesized that information to generate a highly accurate draft SOAP note with diagnosis - 81.7% top-1 diagnostic accuracy 🎯 and > 15% absolute improvements over human clinicians - for the doctor’s review. This high-quality draft meant the doctor oversight step took around 40% less time ⏱️ than a full consultation performed by a PCP in a comparable prior study. 2. 🧑⚕️🤝 A framework built on trust: The focus on alignment resulted in a system preferred by everyone. The architecture guardrails proved highly reliable with the composite system deferring medical advice >90% of the time. Overseeing physicians reported a better experience with the AI ✅ compared to the human control groups, and (actor) patients strongly preferred interacting with AMIE ⭐, citing its empathy and thoroughness. While this study is an early step, we hope its findings help advance the conversation on building AI that is not only superhuman in capabilities but also deeply aligned with the values of the practice of medicine. Paper - https://lnkd.in/gTZNwGRx Huge congrats to David Stutz Elahe Vedadi David Barrett Natalie Harris Ellery Wulczyn Alan Karthikesalingam MD PhD Adam Rodman Roma Ruparel, MPH Shashir Reddy Mike Schäkermann Ryutaro Tanno Nenad Tomašev S. Sara Mahdavi Kavita Kulkarni Dylan Slack for driving this with all our amazing co-authors.
-
𝗔𝗜 𝗢𝘂𝘁𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝘀 𝗣𝗵𝘆𝘀𝗶𝗰𝗶𝗮𝗻𝘀 𝗶𝗻 𝗗𝗶𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴: 𝗗𝗼𝗲𝘀 𝗧𝗵𝗶𝘀 𝗔𝗽𝗽𝗹𝘆 𝘁𝗼 𝗣𝘀𝘆𝗰𝗵𝗶𝗮𝘁𝗿𝘆? JAMA Network Open recently published a study (https://lnkd.in/e2TGtzVi) showing LLMs significantly outperformed physicians on diagnostic reasoning tasks – even when physicians had access to the same AI! I wondered: Would these results extend to psychiatric diagnosis? 🧠 I created a psychiatric case vignette to explore this: 🔹 28yo female with worsening mood, fatigue, and racing thoughts for 3 months following work promotion 🔹 Sleep disturbance with early awakening and recent panic attack episode 🔹 Family history of bipolar disorder and alcohol use disorder 🔹 Mental status showing psychomotor agitation and circumstantial thought process 🔬 I tested this case with AI and found it provided remarkably nuanced diagnostic reasoning: ✅ Comprehensive differential including MDD with anxious features, GAD, Adjustment Disorder, and rule-out Bipolar II given family history ✅ Thoughtful analysis of supporting vs contradicting evidence for each diagnosis ✅ Clear identification of most likely diagnosis with appropriate caveats ✅ Suggested specific diagnostic measures: standardized assessments (PHQ-9, GAD-7, MDQ), sleep study, comprehensive substance evaluation, thyroid panel 🔮 My prediction: Current GPT models are even more sophisticated than the GPT-4 version used in the JAMA study. I believe if tested on psychiatric cases with similar methodology, LLMs would outperform human mental health clinicians in diagnostic accuracy and reasoning quality. ⚕️ Clinical implications: 🧩 AI could become invaluable "diagnostic assistants" in psychiatric settings 🧩 Might standardize diagnostic quality across different training levels 🧩 Could help address psychiatrist shortages, especially in underserved areas What's your take? Are psychiatric diagnoses uniquely resistant to AI capabilities, or are we next? #PsychiatryAI #DiagnosticInnovation #MentalHealthTech