FDA-approved ≠ clinically validated. A new study shows most AI-enabled medical devices may not generalize to real-world patient populations. 🚨 This large-scale analysis of 903 FDA-cleared AI medical devices reveals major gaps in clinical evaluation, study design, and demographic performance reporting. As AI tools flood the market, this study raises critical concerns about their safety, transparency, and real-world generalizability. Key Takeaways - Only 56% of devices reported any clinical performance data; 24% stated no such study was done. - Just 2.4% of studies were randomized; most were retrospective or unspecified. - Core metrics like sensitivity and specificity were missing in over 60% of cases. - Subgroup reporting was rare: only 29% by sex and 23% by age. - 4.8% of devices were recalled, most approved via the 510(k) pathway with limited evaluation. Regulatory clearance doesn’t ensure generalizability. Without rigorous, diverse, and transparent evaluation, AI tools risk widening health disparities rather than closing them. ______________________________________________________ #ai #healthcare #policy #health #medicine #medicaldevice
Limitations of AI in Medical Applications
Explore top LinkedIn content from expert professionals.
-
-
Dale Tetzloff suffered a stroke in October 2022. His doctor determined he needed at least 100 days of post-acute care for proper recovery. After just 40 days, the insurance company's AI system decided he was "ready for discharge." This wasn't a one-off mistake. It's now part of a class action lawsuit against one of America's largest insurers. Patients with dementia or early cognitive decline are especially vulnerable to these AI decisions. 1/ AI only sees documented conditions ↳ Most cognitive impairment isn't in medical records ↳ 60% of dementia cases go undiagnosed ↳ Cognitive status rarely assessed during routine visits ↳ AI makes decisions on severely incomplete pictures 2/ Injuries "unmask" cognitive decline ↳ Hospital environments amplify cognitive vulnerabilities ↳ Disrupted routines trigger cascading confusion ↳ Pain medications further compromise cognition ↳ What looks like "difficult behavior" is often unrecognized dementia 3/ Recovery trajectories differ dramatically ↳ Cognitively impaired patients need 2-3x longer rehabilitation ↳ Standard protocols assume intact executive function ↳ Medication adherence requires cognitive abilities many lack ↳ Home safety assessments miss cognitive context entirely 4/ The human cost of algorithmic decisions ↳ AI systems can't detect the cognitive complexities ↳ Rehabilitation potential is severely underestimated ↳ Premature discharge creates revolving-door readmissions ↳ The algorithm doesn't see the person, only the parameters 5/ Creating cognitively-aware AI requires: ↳ Routine cognitive assessment in primary care ↳ Documentation standards for cognitive status ↳ Algorithms that flag potential cognitive concerns ↳ Human override mechanisms for complex cases The promise of AI in healthcare is real. But we're feeding these systems partial truths. An algorithm trained on incomplete data will make confidently incomplete decisions. The result is someone's life. Until we document cognitive health as rigorously as physical health, AI will remain dangerously blind to dementia. —----------------------------- ⁉️ Have you seen cases where standard protocols missed cognitive complications? ♻️ Share if you believe healthcare AI needs a cognitive assessment component. 👉 Follow me (Reza Hosseini Ghomi, MD, MSE) for more insights at the intersection of neuroscience, technology, and care delivery.
-
Digital pathology AI models show promise for improving lung cancer diagnosis, but robust external validation remains a critical bottleneck for clinical adoption. Methods: Researchers conducted a systematic scoping review of external validation studies for AI pathology models in lung cancer diagnosis from 2010-2024. They searched medical and engineering databases, identifying 22 studies that met inclusion criteria and assessed methodological quality using a modified QUADAS tool. Results: Key findings revealed significant validation gaps: - Only ~10% of lung cancer AI pathology papers included external validation 16 models performed subtyping tasks (LUAD vs LUSC) with AUC values ranging from 0.746-0.999 - 86% of studies showed high risk of bias in participant selection/study design -Most studies used small, retrospective datasets from single centers -Only 18% reported clinically meaningful metrics like sensitivity/specificity Conclusions: While AI models demonstrate strong performance on subtyping tasks, methodological concerns limit real-world applicability. The review highlights critical needs for: - Larger, multi-center prospective studies - Better demographic diversity in validation datasets - Standardized reporting of clinically relevant metrics - Moving beyond retrospective case-control designs - For AI pathology to achieve clinical impact, the field must prioritize rigorous external validation that reflects real-world clinical conditions and diverse patient populations. Paper and research by Soumya Arun, @Mariia Grosheva, Mark Kosenko, @Jan Lukas Robertus, Oleg Blyuss, Judith Offman and larger team. See the comments for link to the full paper:
-
Is AI Easing Clinician Workloads—or Adding More? Healthcare is rapidly embracing AI and Large Language Models (LLMs), hoping to reduce clinician workload. But early adoption reveals a more complicated reality: verifying AI outputs, dealing with errors, and struggling with workflow integration can actually increase clinicians’ cognitive load. Here are four key considerations: 1. Verification Overload - LLMs might produce coherent summaries, but “coherent” doesn’t always mean correct. Manually double-checking AI-generated notes or recommendations becomes an extra task on an already packed schedule. 2. Trust Erosion - Even a single AI-driven mistake—like the wrong dosage—can compromise patient safety. Errors that go unnoticed fracture clinicians’ trust and force them to re-verify every recommendation, negating AI’s efficiency. 3. Burnout Concerns - AI is often touted as a remedy for burnout. Yet if it’s poorly integrated or frequently incorrect, clinicians end up verifying and correcting even more, adding mental strain instead of relieving it. 4. Workflow Hurdles LLMs excel in flexible, open-ended tasks, but healthcare requires precision, consistency, and structured data. This mismatch can lead to patchwork solutions and unpredictable performance. Moving Forward - Tailored AI: Healthcare-specific designs that reduce “prompt engineering” and improve accuracy. - Transparent Validation: Clinicians need to understand how AI arrives at its conclusions. - Human-AI Collaboration: AI should empower, not replace, clinicians by streamlining verification. - Continuous Oversight: Monitoring, updates, and ongoing training are crucial for safe, effective adoption. If implemented thoughtfully, LLMs can move from novelty to genuine clinical asset. But we have to address these limitations head-on to ensure AI truly lightens the load. Want a deeper dive? Check out the full article where we explore each of these points in more detail—and share how we can build AI solutions that earn clinicians’ trust instead of eroding it.
-
𝐂𝐡𝐚𝐭𝐆𝐏𝐓 𝐏𝐫𝐨 (𝐨𝟏 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠): 𝐀 𝐃𝐨𝐜𝐭𝐨𝐫’𝐬 𝐈𝐧𝐬𝐢𝐝𝐞 𝐋𝐨𝐨𝐤 𝐚𝐭 𝐌𝐞𝐝𝐢𝐜𝐚𝐥 𝐀𝐈’𝐬 𝐏𝐨𝐭𝐞𝐧𝐭𝐢𝐚𝐥 𝐚𝐧𝐝 𝐏𝐢𝐭𝐟𝐚𝐥𝐥𝐬 As an ER physician and AI futurist, I recently tested the new𝗖𝗵𝗮𝘁𝗚𝗣𝗧 𝗣𝗿𝗼 (𝗼𝟭 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗺𝗼𝗱𝗲𝗹)—𝗢𝗽𝗲𝗻𝗔𝗜’𝘀 $𝟮𝟬𝟬/𝗺𝗼𝗻𝘁𝗵 𝗽𝗿𝗲𝗺𝗶𝘂𝗺 𝘀𝘂𝗯𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝗼𝗻—in real-world medical diagnostics. The outcomes? A blend of potential breakthroughs and pressing limitations that every healthcare professional should know. ( This is just me testing it out, very small sample size) 𝐓𝐞𝐬𝐭𝐢𝐧𝐠 𝐂𝐡𝐚𝐭𝐆𝐏𝐓 𝐏𝐫𝐨 𝐢𝐧 𝐌𝐞𝐝𝐢𝐜𝐚𝐥 𝐃𝐢𝐚𝐠𝐧𝐨𝐬𝐭𝐢𝐜𝐬 𝙀𝙆𝙂 𝘼𝙣𝙖𝙡𝙮𝙨𝙞𝙨: Missed 7 Wolff-Parkinson-White (WPW) cases Failed to identify 2 STEMI cases Overlooked a Right Bundle Branch Block (RBBB) with ST Elevation (STE) This highlights a gap in recognizing subtle yet life-threatening cardiac patterns. 𝘾𝙏 𝙄𝙢𝙖𝙜𝙞𝙣𝙜: Correctly identified acute appendicitis in 2 CT images Shows promise in structured imaging tasks like radiology. 𝗫-𝗥𝗮𝘆 𝗜𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗮𝘁𝗶𝗼𝗻: Missed a clear Pneumothorax on a chest X-ray Misidentified a Galeazzi fracture as a Colles fracture Reveals challenges in differentiating subtle imaging variations. 𝘿𝙚𝙧𝙢𝙖𝙩𝙤𝙡𝙤𝙜𝙮 𝘾𝙖𝙨𝙚: Correctly identified Stevens-Johnson Syndrome (SJS) from 2 skin images Demonstrates potential in recognizing certain complex dermatological conditions. 𝐊𝐞𝐲 𝐓𝐚𝐤𝐞𝐚𝐰𝐚𝐲𝐬 𝐟𝐨𝐫 𝐌𝐞𝐝𝐢𝐜𝐚𝐥 𝐏𝐫𝐨𝐟𝐞𝐬𝐬𝐢𝐨𝐧𝐚𝐥𝐬 𝙎𝙩𝙧𝙚𝙣𝙜𝙩𝙝𝙨: Excels with structured data (CT scans) and some visual diagnoses (SJS), indicating future utility in specific specialties. 𝙒𝙚𝙖𝙠𝙣𝙚𝙨𝙨𝙚𝙨: Struggles with EKG interpretation, subtle fracture patterns, and detecting critical pathologies like pneumothorax. This complexity mirrors human experts' clinical challenges, reminding us that high-stakes decisions require rigorous validation. 𝙒𝙝𝙚𝙧𝙚 𝘼𝙄 𝙁𝙞𝙩𝙨 𝙞𝙣 𝙈𝙚𝙙𝙞𝙘𝙞𝙣𝙚 𝙏𝙤𝙙𝙖𝙮 Non-Critical Imaging & Dermatology Support: AI can assist with routine interpretations, freeing specialists up for complex cases. Research & Literature Reviews: Summarizing complex topics and supporting data interpretation are strengths. Patient Education: Translating medical jargon into patient-friendly language remains a reliable AI skill. 𝘽𝙤𝙩𝙩𝙤𝙢 𝙇𝙞𝙣𝙚 While ChatGPT Pro’s advancements deserve attention, these tests confirm it’s not yet ready for frontline diagnostic decisions. As we push the boundaries of medical AI, let’s prioritize rigorous validation, accuracy, and—most importantly—patient safety. My favorite phrase is HUMAN + AI > the best ai or the best human. See my TedX talk. #OpenAI #ChatGPT #AI #ArtificialIntelligence #AITools #MachineLearning #AIResearch #AIMedicine #MedicalAI #RadiologyAI #HealthTech #TechInnovation
-
Great Opportunity. Great Questions. I had the fortune of speaking to clinicians involved in perinatal medicine and neonatal critical care medicine at Brigham and Women's this week. I spoke on the topic of #AI in neonatal critical care medicine and pediatrics. A great question asked by an attendee.. paraphrased... "if AI is so powerful where is the GAP as it is not yet being widely used in healthcare, especially in clinical care." This is a question with a lot of answers. Here are a few: 1- Most AI tools have not been evaluated for the neonatal ICU (NICU). An exception is the HeRO monitor, an ML tool that identifies changes in a patient's heart rate variability over time that can be used to predict the probability for significant morbidities such as sepsis and NEC. AI tools from #LLMs for gen AI, #ML for prediction, #DL for image-based recognition, simply need more structured evaluation before they can be widely applied in medicine. 2-There are not enough non-data scientists and non-AI focused clinicians, educators, patients, caregivers, administrators, ethicists, staff, and IT involved in the evaluation of AI tools in healthcare. 3-Healthcare data are messy. It is estimated that 30% of healthcare data are missing, mis-labeled, or incorrect. With the remaining, the vast majority is unstructured data in the form of text, images... that is not easy to work with. 4-Bias is built into a lot of healthcare data. I use an example of EKG interpretation, which is not as accurate or effective for some racial/ethinic groups. Bias detection and model correction will be an issue. 5-Who is Accountable when an error occurs with the use of AI in healthcare (because it will occur)? The company that created it? the hospital or system that purchased it? The clinician that used it? the patient or caregiver who used it on their own accord? There needs to be defined accountability-- it can not be the wild wild west. 6-Explainability-- Clinicians will not use something in patient care that they do not understand or that they can not fact check. Unfortunately, even those that develop these tools cannot always explain why and how they generate an output. There are many more. AI is here. Those involved in healthcare, that understand healthcare systems, processes, outputs-- need to be involved from the very beginning of model ideation, creation, deployment, evaluation, maintenance, and error correction if AI is to be a trusted, ethical, effective, safe, and unbiased tool in healthcare. #UsingWhatWeHaveBetter
-
The data, as it exists now in EHRs, cannot be used as it is to develop algorithms. Data systems are not designed to capture data that can be used for clinical research or to build artificial intelligence. They are optimized for administrative tasks first and clinical care second, but not for learning health systems. The data we use to train algorithms, the data from EHRs, is unsatisfactory because the different actors who “pass” the data from the point of capture, to aggregation, to harmonization, to curation, and to model development are all “incidental” to the life cycle of AI. Their roles were not designed to build AI. These actors are oblivious to the social patterning of the data generation process, including but not limited to measurement bias of the medical devices that we use to capture the data. The data collectors (the EHR vendors), the data processors (FHIR and OMOP community), the data curators and model developers (clinicians and the ML community) are mostly oblivious to data issues that have profound effect on prediction, classification and optimization. https://lnkd.in/e9dJMtEs
-
Research paper incoming. We just tested AI on something doctors do every day. The results? Concerning. We fed AI models every FDA-approved medication from 2024 and asked basic safety questions: → Safe for kids? → Safe for elderly? → Safe for pregnant women? Simple, single-condition questions. Accuracy: 60-65% Then we added one more condition. "29-year-old pregnant woman with kidney stones" Accuracy dropped to 30%. Let that sink in. AI can somewhat handle straightforward cases. But the moment you introduce real-world complexity - the kind every doctor faces daily - performance collapses. This isn't just about better training data. It's about understanding that healthcare AI isn't ready for the nuanced, multi-condition reality of actual patients. The gap between AI demos and clinical reality is wider than we thought. What does this mean for healthcare AI deployment? Everything.
-
🚨 AI in Healthcare: A Regulatory Wake-Up Call? 🚨 Large language models (LLMs) like GPT-4 and Llama-3 are showing incredible promise in clinical decision support. But here’s the catch: they’re not regulated as medical devices, and yet, they’re already generating recommendations that look a lot like regulated medical guidance. A recent study found that even when prompted to avoid device-like recommendations, these AI models often provided clinical decision support in ways that could meet FDA criteria for a medical device. In some cases, their responses aligned with established medical standards—but in others, they ventured into high-risk territory, making treatment recommendations that should only come from trained professionals. This raises a big question: Should AI-driven clinical decision support be regulated? And if so, how do we balance innovation with patient safety? Right now, there’s no clear framework for LLMs used by non-clinicians in critical situations. 🔹 What does this mean for healthcare professionals? AI is advancing fast, and while it can be a powerful tool, it’s crucial to recognize its limitations. 🔹 For regulators? There’s an urgent need to define new oversight models that account for generative AI’s unique capabilities. 🔹 For AI developers? Transparency, accuracy, and adherence to safety standards will be key to building trust in medical AI applications. As AI continues to evolve, we’re entering uncharted territory. The conversation about regulation isn’t just theoretical—it’s becoming a necessity. What do you think? Should AI in clinical decision support be regulated like a medical device? Let’s discuss. 👇
-
What if AI in Healthcare Is Built on Bad Science? AI is moving fast, but are we p-hacking our way into the future? Many AI models in medicine are built on research practices that have long plagued scientific integrity. Here’s the reality: • 12–54% of health studies show p-values suspiciously clustered around 0.05—suggesting data manipulation. • 40% false discovery rates in psychology and biomedicine due to selective reporting. • 50% of clinical trials fail to fully report outcomes, skewing guidelines and real-world care. If the data that informs AI models is flawed, the tools we build will inherit those flaws - replicating and scaling them across entire health systems. Why This Happens The rush to be first, whether in research or commercialization, creates shortcuts that compromise rigor: • Publication pressure – High-impact journals prioritize novelty over replication. If a study presents a “breakthrough,” it’s more likely to be published, even if the methods are weak. • Algorithmic p-hacking – Training AI on datasets with many variables but failing to correct for multiple comparisons inflates false positives, making models seem more effective than they really are. • Data bias blind spots – Many AI models are trained on datasets that do not represent the full spectrum of patients, leading to biased and unreliable predictions, particularly for underrepresented populations. The result? AI tools that look promising in development but fail when applied in real-world clinical settings. What to Look for When Evaluating AI Tools To separate meaningful AI innovation from hype, ask these questions: 1. Was the model validated in real-world settings or just on a curated dataset? 2. Did the study pre-register its hypotheses, training data, and evaluation metrics? (Or were metrics changed after results were analyzed?) 3. Was bias assessed beyond accuracy? Who was missing from the training data, and how does that impact model performance across different populations? The Bottom Line AI will transform healthcare - but speed is not the same as rigor. If we don’t scrutinize how these models are built, we risk deploying tools that mislead clinicians, widen health disparities, and ultimately harm patients. We should be demanding transparency, reproducibility, and accountability. AI in healthcare should be built on truth, not statistical manipulation. #AI #HealthcareAI #ResearchIntegrity #DataEthics #AIinhealthcare