The saying "more data beats clever algorithms" is not always so. In new research from Amazon, we show that using AI can turn this apparent truism on its head. Anomaly detection and localization is a crucial technology in identifying and pinpointing irregularities within datasets or images, serving as a cornerstone for ensuring quality and safety in various sectors, including manufacturing and healthcare. Finding them quickly, reliably, at scale matters, so automation is key. The challenge is that anomalies - by definition! - are usually rare and hard to detect - making it hard to gather enough data to train a model to find them automatically. Using AI, Amazon has developed a new method to significantly enhance anomaly detection and localization in images, which not only addresses the challenges of data scarcity and diversity but also sets a new benchmark in utilizing generative AI for augmenting datasets. Here's how it works... 1️⃣ Data Collection: The process starts by gathering existing images of products to serve as a base for learning. 2️⃣ Image Generation: Using diffusion models, the AI creates new images that include potential defects or variations not present in the original dataset. 3️⃣ Training: The AI is trained on both the original and generated images, learning to identify what constitutes a "normal" versus an anomalous one. 4️⃣ Anomaly Detection: Once trained, the AI can analyze new images, detecting and localizing anomalies with enhanced accuracy, thanks to the diverse examples it learned from. The results are encouraging, and show that 'big' quantities of data can be less important than high quality, diverse data when building autonomous systems. Nice work from the Amazon science team. The full paper is linked below. #genai #ai #amazon
AI Techniques For Image Recognition
Explore top LinkedIn content from expert professionals.
-
-
I finally had the chance to explore a new document extraction technique introduced in a paper last September. Bonus: the code and model are free to use (Apache 2.0). This new approach, called General OCR Theory (GOT-OCR2.0), suggests a unified end-to-end model that handles tasks traditional OCR systems struggle with. Unlike legacy OCR, which relies on complex multi-modular pipelines, GOT uses a simple encoder-decoder architecture with only 580M parameters that outperforms models 10-100× larger. Paper highlights: (1) Unified architecture - a high-compression encoder paired with a long-context decoder that handles everything from scene text to complex formulas (2) Stunning performance - delivers nearly perfect text accuracy on documents, surpassing Qwen-VL-Max (>72B) and other leading models (3) Versatility beyond text - processes math formulas, molecular structures, and even geometric shapes (4) Interactive capabilities - supports region-level recognition guided by coordinates or colors I just tried it out and was blown away by how it handles complex documents with mixed content types. The ability to convert math formulas from Arxiv PDFs to Mathpix format alone is worth exploring this model. What strikes me most about GOT is how it challenges the notion that only billion-parameter LLMs can tackle complex visual tasks. Paper + code + model can be found in their GitHub repo https://lnkd.in/dbHzUUYx — Join thousands of world-class researchers and engineers from Google, Stanford, OpenAI, and Meta staying ahead on AI http://aitidbits.ai
-
This. Adobe is knowingly facilitating the laundering of data and the subsequent sale of generative images relying on living, non-consenting artists/photographers/creators names as prompts. Adobe allows Stock contributors to upload these images to its Stock service, despite the fact that these images "go against their generative AI content policy," where the Stock contributors (the prompters, not the artists whose names are being co-opted without consent) then make money on their sale. Further, the images uploaded to Adobe Stock serve as the training inputs for Adobe Firefly, their so-called ethical poster child for generative AI image-making. They then put the burden of maintenance and policing on end-users. Marc Simonetti reported images using his name, and how often will he have to search Adobe Stock for his own name, for his handles, for fuzzy matches (non-exact spellings, deliberate typos, etc to bypass exact-match filters)? Especially when there is no consequence for the Stock contributors deliberately uploading MidJourney outputs violating Adobe's policy? How many artists don't have the time, the Adobe Stock subscription, the psychological energy to police an enterprise-grade platform because the enterprise behind it knows they can wait them out? Adobe has more than enough tools at its disposal to catch and prevent the uploading of generated images with questionable data provenance, or to prevent the uploading of images with non-consenting artists' names or styles (haveibeentrained already has a list of 1.4 billion image-based opt-outs by creators). So, Adobe, why aren't you? #ai #generativeai #genai #firefly #adobe #adobestock #datatheft #datalaundering #datadignity #consent #copyright #ethics
-
A teacher's use of AI to generate pictures of her students in the future to motivate them captures the potential of AI for good, showing students visually how they can achieve their dreams. This imaginative use of technology not only engages students but also sparks a conversation about self-potential and future possibilities. However, this innovative method also brings up significant ethical questions regarding the use of AI in handling personal data, particularly images. As wonderful as it is to see AI used creatively in education, it raises concerns about privacy, consent, and the potential misuse of AI-generated images. 𝐊𝐞𝐲 𝐈𝐬𝐬𝐮𝐞𝐬 𝐭𝐨 𝐂𝐨𝐧𝐬𝐢𝐝𝐞𝐫 >> Consent and Privacy: It's crucial that the individuals whose images are being used (or their guardians, in the case of minors) have given informed consent, understanding exactly how their images will be used and manipulated. >> Data Security: Ensuring that the data used by AI, especially sensitive personal data, is secured against unauthorized access and misuse is paramount. >> Ethical Use: There should be clear guidelines and purposes for which AI can use personal data, avoiding scenarios where AI-generated images could be used for purposes not originally intended or agreed upon. 𝐑𝐞𝐬𝐩𝐨𝐧𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐲 𝐚𝐧𝐝 𝐑𝐞𝐠𝐮𝐥𝐚𝐭𝐢𝐨𝐧 >> Creators and Users of AI: Developers and users of AI technologies must adhere to ethical standards, ensuring that their creations respect privacy and are used responsibly. >> Legal Frameworks: Stronger legal frameworks may be necessary to govern the use of AI with personal data, specifying who is responsible and what actions can be taken if misuse occurs. As we continue to innovate and integrate AI into various aspects of life, including education, it's vital to balance the benefits with a strong commitment to ethical practices and respect for individual rights. 🤔 What are your thoughts on the use of AI to inspire students? How should we address the ethical considerations that come with such technology? #innovation #technology #future #management #startups
-
🧭Governing AI Ethics with ISO42001🧭 Many organizations treat AI ethics as a branding exercise, a list of principles with no operational enforcement. As Reid Blackman, Ph.D. argues in "Ethical Machines", without governance structures, ethical commitments are empty promises. For those who prefer to create something different, #ISO42001 provides a practical framework to ensure AI ethics is embedded in real-world decision-making. ➡️Building Ethical AI with ISO42001 1. Define AI Ethics as a Business Priority ISO42001 requires organizations to formalize AI governance (Clause 5.2). This means: 🔸Establishing an AI policy linked to business strategy and compliance. 🔸Assigning clear leadership roles for AI oversight (Clause A.3.2). 🔸Aligning AI governance with existing security and risk frameworks (Clause A.2.3). 👉Without defined governance structures, AI ethics remains a concept, not a practice. 2. Conduct AI Risk & Impact Assessments Ethical failures often stem from hidden risks: bias in training data, misaligned incentives, unintended consequences. ISO42001 mandates: 🔸AI Risk Assessments (#ISO23894, Clause 6.1.2): Identifying bias, drift, and security vulnerabilities. 🔸AI Impact Assessments (#ISO42005, Clause 6.1.4): Evaluating AI’s societal impact before deployment. 👉Ignoring these assessments leaves your organization reacting to ethical failures instead of preventing them. 3. Integrate Ethics Throughout the AI Lifecycle ISO42001 embeds ethics at every stage of AI development: 🔸Design: Define fairness, security, and explainability objectives (Clause A.6.1.2). 🔸Development: Apply bias mitigation and explainability tools (Clause A.7.4). 🔸Deployment: Establish oversight, audit trails, and human intervention mechanisms (Clause A.9.2). 👉Ethical AI is not a last-minute check, it must be integrated/operationalized from the start. 4. Enforce AI Accountability & Human Oversight AI failures occur when accountability is unclear. ISO42001 requires: 🔸Defined responsibility for AI decisions (Clause A.9.2). 🔸Incident response plans for AI failures (Clause A.10.4). 🔸Audit trails to ensure AI transparency (Clause A.5.5). 👉Your governance must answer: Who monitors bias? Who approves AI decisions? Without clear accountability, ethical risks will become systemic failures. 5. Continuously Audit & Improve AI Ethics Governance AI risks evolve. Static governance models fail. ISO42001 mandates: 🔸Internal AI audits to evaluate compliance (Clause 9.2). 🔸Management reviews to refine governance practices (Clause 10.1). 👉AI ethics isn’t a magic bullet, but a continuous process of risk assessment, policy updates, and oversight. ➡️ AI Ethics Requires Real Governance AI ethics only works if it’s enforceable. Use ISO42001 to: ✅Turn ethical principles into actionable governance. ✅Proactively assess AI risks instead of reacting to failures. ✅Ensure AI decisions are explainable, accountable, and human-centered.
-
Responsible data development is at the core of Responsible AI (RAI). If a training dataset was created poorly (under-represented, skewed data) this will lead to a biased model. In AI development, using real data has privacy, ethical, and IP implications, to name a few. On the other hand, using synthetic (AI-generated) data is not a panacea (as much as it’s been hailed). It leads to other kinds of downstream issues that need to be taken into account. This paper explores two key risks of using synthetic data in AI model development: 1. Diversity-washing (synthetic data can give the appearance of diversity) 2. Consent circumvention (consent stops being a “procedural hook” that limits downstream harms from AI model use and this – along with data source obfuscation - complicates enforcement) The paper focuses on facial recognition technology (FRT) highlighting the risks of using synthetic data, and the trade-offs between utility, fidelity, and privacy. It’s important to develop participatory governance models along with data lineage and transparency which are crucial when it comes to mitigating these risks.
-
MedSAM2 just brought “segment anything” to 3D medical images and videos. Generalist segmentation models like SAM2 have shown promise in natural images, but struggle with medical data. 𝗠𝗲𝗱𝗦𝗔𝗠𝟮 bridges that gap as a 𝗽𝗿𝗼𝗺𝗽𝘁𝗮𝗯𝗹𝗲 𝘀𝗲𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 𝗺𝗼𝗱𝗲𝗹 𝘁𝘂𝗻𝗲𝗱 𝗳𝗼𝗿 𝗯𝗼𝘁𝗵 𝟯𝗗 𝗺𝗲𝗱𝗶𝗰𝗮𝗹 𝘀𝗰𝗮𝗻𝘀 𝗮𝗻𝗱 𝘁𝗲𝗺𝗽𝗼𝗿𝗮𝗹 𝘃𝗶𝗱𝗲𝗼 𝗳𝗿𝗮𝗺𝗲𝘀. 1. Fine-tuned on over 455,000 3D image-mask pairs and 76,000 video frames 2. Achieved 88.84% Dice score on CT organs, 88.37% on MRI lesions, and 87.22% on PET lesions which leads across all tasks in the 3D benchmark. 3. Outperformed all models on ultrasound and endoscopy videos, with up to 96.13% accuracy on left ventricle segmentation and 92.22% on hard polyps. 4. Cut 3D lesion annotation time by 86% for CT (from 526s to 74s/lesion) and 87% for liver MRI (from 520s to 65s/lesion) through a human-in-the-loop pipeline. Couple thoughts: • use of memory attention mechanisms to maintain spatial and temporal consistency across slices/frames is a cool arch choice • 𝗵𝘂𝗺𝗮𝗻-𝗶𝗻-𝘁𝗵𝗲-𝗹𝗼𝗼𝗽 𝗮𝗻𝗻𝗼𝘁𝗮𝘁𝗶𝗼𝗻 𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗵𝗮𝘀 𝗶𝗺𝗺𝗲𝗻𝘀𝗲 𝗽𝗿𝗮𝗰𝘁𝗶𝗰𝗮𝗹 𝘃𝗮𝗹𝘂𝗲 𝗮𝗻𝗱 𝘀𝗵𝗼𝘂𝗹𝗱 𝗯𝗲 𝗶𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗲𝗱 𝗺𝗼𝗿𝗲 • love how the tool is available on 3D Slicer, Jupyter, Colab, and Gradio to translate research into usable tools. The barrier to use models NEED to be lower in medical AI research in general for faster adoption and iterative loops of improvement Here's the awesome work: https://lnkd.in/g7jy5W7G Congrats to Jun Ma, Zongxin Yang, Sumin Kim, Beatrice Bihui Chen, Bo Wang, and co! I post my takes on the latest developments in health AI – 𝗰𝗼𝗻𝗻𝗲𝗰𝘁 𝘄𝗶𝘁𝗵 𝗺𝗲 𝘁𝗼 𝘀𝘁𝗮𝘆 𝘂𝗽𝗱𝗮𝘁𝗲𝗱! Also, check out my health AI blog here: https://lnkd.in/g3nrQFxW
-
𝗡𝗲𝘄 𝗿𝗲𝘀𝗲𝗮𝗿𝗰𝗵 𝗽𝘂𝗯𝗹𝗶𝘀𝗵𝗲𝗱! Medical imaging is packed with hidden clinical biomarkers, but privacy hurdles and data scarcity often keep this treasure trove locked away from AI innovation. Frustrating, right? That’s exactly what inspired me and Abdullah Hosseini to ask: Can we generate synthetic medical images that not only look real, but also preserve the critical biomarkers clinicians rely on? So, we dove in. Using cutting-edge diffusion models fused with Swin-transformer networks, we generated synthetic images across three modalities—radiology (chest X-rays), ophthalmology (OCT), and histopathology (breast cancer slides). The big question: 𝗗𝗼 𝘁𝗵𝗲𝘀𝗲 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗶𝗺𝗮𝗴𝗲𝘀 𝗸𝗲𝗲𝗽 𝘁𝗵𝗲 𝘀𝘂𝗯𝘁𝗹𝗲, 𝗱𝗶𝘀𝗲𝗮𝘀𝗲-𝗱𝗲𝗳𝗶𝗻𝗶𝗻𝗴 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀 𝗶𝗻𝘁𝗮𝗰𝘁? • Our diffusion models faithfully preserved key biomarkers—like lung markings in X-rays and retinal abnormalities in OCT—across all datasets. • Classifiers trained only on synthetic data performed nearly as well as those trained on real images, with F1 and AUC scores hitting 0.8–0.99. • No statistically significant difference in diagnostic performance—meaning synthetic data could stand in for real data in many AI tasks, while protecting patient privacy. This work shows synthetic data isn’t just a lookalike—it’s a powerful, privacy-preserving tool for research, clinical AI, and education. Imagine sharing and scaling medical data without the headaches of privacy risk or limited access! Read the full paper: https://lnkd.in/eW6TM9H2 Get the code & datasets: https://lnkd.in/ek4wSkg3 #AI #Innovation #SyntheticData #DiffusionModels #MedicalImaging #HealthcareInnovation #DigitalHealth #Frontiers #WeillCornell #HealthTech #HealthcareAI #PrivacyPreservingAI #GenerativeAI #Biomarkers #MachineLearning #Qatar #MENA #MiddleEast #NorthAfrica #MENAIRegion #MENAInnovation #UAE #UnitedArabEmirates #SaudiArabia #KSA #Egypt AI Innovation Lab Weill Cornell Medicine Weill Cornell Medicine - Qatar Cornell Tech Cornell University
-
𝐓𝐡𝐢𝐧𝐤 Ultralytics 𝐘𝐎𝐋𝐎𝐯11 𝐢𝐬 𝐠𝐫𝐞𝐚𝐭 𝐨𝐮𝐭 𝐨𝐟 𝐭𝐡𝐞 𝐛𝐨𝐱? 𝐖𝐚𝐢𝐭 𝐮𝐧𝐭𝐢𝐥 𝐲𝐨𝐮 𝐡𝐞𝐚𝐫 𝐡𝐨𝐰 NVIDIA 𝐓𝐀𝐎 𝐓𝐨𝐨𝐥𝐤𝐢𝐭 𝐢𝐧𝐬𝐩𝐢𝐫𝐞𝐝 𝐦𝐞 𝐭𝐨 𝐩𝐮𝐬𝐡 𝐢𝐭𝐬 𝐥𝐢𝐦𝐢𝐭𝐬 𝐰𝐢𝐭𝐡 𝐚𝐮𝐠𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧 𝐚𝐧𝐝 𝐭𝐮𝐧𝐢𝐧𝐠! Excited to share my journey working with YOLOv11 for object detection! Here’s what I’ve been up to: 1) 𝐄𝐧𝐡𝐚𝐧𝐜𝐞𝐝 𝐀𝐮𝐠𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧: Inspired by my experience with the NVIDIA TAO Toolkit, I explored how to layer additional custom augmentations after leveraging Roboflow. This approach helped diversify the training data, making the model more robust and adaptable. 3) 𝐓𝐮𝐧𝐢𝐧𝐠 𝐇𝐲𝐩𝐞𝐫𝐩𝐚𝐫𝐚𝐦𝐞𝐭𝐞𝐫𝐬: Drawing from best practices in the TAO Toolkit, I focused on hyperparameter optimization to fine-tune YOLOv11. Adjusting learning rates, experimenting with momentum, and exploring weight decay provided key insights and noticeable performance improvements. 3) 𝐌𝐨𝐝𝐞𝐥 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐢𝐧 Google 𝐂𝐨𝐥𝐥𝐚𝐛: Using the YOLOv11 framework, I set up a training pipeline directly in Google Collab. With custom hyperparameters such as learning rate, momentum, weight decay, etc. I fine-tuned the model for optimal performance. 𝐓𝐚𝐤𝐞𝐚𝐰𝐚𝐲𝐬: The TAO Toolkit’s approach to model training and augmentation inspired these strategies and reinforced the importance of a well-prepared pipeline. Combining tools and methodologies accelerates innovation and enhances results. 𝐍𝐞𝐱𝐭 𝐬𝐭𝐞𝐩𝐬: Continue refining the model and testing its real-world applications. Have you used the NVIDIA TAO Toolkit or experimented with advanced augmentation techniques and hyperparameter tuning? ♻️ Repost to your LinkedIn followers and follow Timothy Goebel for more actionable insights on AI and innovation. #YOLOv11 #ComputerVision #ObjectDetection #MachineLearning #AI #DeepLearning #NVIDIA #TAOToolkit #DataAugmentation #HyperparameterTuning
-
Processing whole slide images typically requires analyzing 18,000+ tiles and hours of computation. But what if AI could work like a pathologist? The computational bottleneck: Current AI approaches face fundamental inefficiency. Whole slide images are massive gigapixel files divided into thousands of tiles for analysis. Most systems process every tile regardless of diagnostic relevance, averaging 18,000 tiles per slide. This brute-force approach demands enormous resources and creates clinical adoption barriers. Experienced pathologists don't examine every millimeter uniformly. They strategically focus on diagnostically informative regions while quickly scanning normal tissue or artifacts. Peter Neidlinger et al. developed EAGLE (Efficient Approach for Guided Local Examination), mimicking this selective strategy. The system combines two foundation models: CHIEF for identifying regions meriting detailed analysis, and Virchow2 for extracting features from selected areas. Key metrics: - Speed: Processed slides in 2.27 seconds, reducing computation time by 99% - Accuracy: Outperformed state-of-the-art models across 31 tasks spanning four cancer types - Interpretability: Allows pathologists to validate which tiles informed decisions The authors note that "careful tile selection, slide-level encoding, and optimal magnification are pivotal for high accuracy, and combining a lightweight tile encoder for global scanning with a stronger encoder on selected regions confers marked advantage." Practical implications: This efficiency addresses multiple adoption barriers. Reduced computational requirements eliminate dependence on high-performance infrastructure, democratizing access for smaller institutions. The speed enables real-time workflows integrating into existing diagnostic routines rather than separate batch processing. Most importantly, the selective approach provides interpretability - pathologists can examine specific tissue regions influencing AI analysis, supporting validation and trust-building. Broader context: EAGLE represents a shift from computational brute force toward intelligent efficiency in medical AI. Rather than scaling hardware requirements, it scales down computational demands while improving performance. This illustrates how understanding domain expertise can inform more effective AI architectures than purely data-driven approaches. How might similar efficiency-focused approaches change AI implementation in your field? paper: https://lnkd.in/eR_Hj7ip code: https://lnkd.in/eX8wEfy6 #DigitalPathology #MedicalAI #ComputationalPathology #MachineLearning #ClinicalAI #FoundationModels