AI search is evolving—Are traditional engines falling behind? AI-powered search is shifting the landscape—OpenAI is developing SearchGPT, Google is enhancing Gemini 2.0, and Meta is building its own AI-driven search engine. The shift is clear: search is no longer just about retrieving information—it’s about understanding context, intent, and relevance in real-time. Traditional search engines, like Elasticsearch, were originally designed for log analytics and keyword matching. While they now support AI-driven retrieval, they struggle with real-time ranking, hybrid search (vector + text), and AI-powered personalization—all essential for modern applications. That’s why Vespa’s latest benchmark caught my attention— Vespa.ai is 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 and was built from the ground up to handle vector search, recommendations, and machine-learned ranking at scale. Their recent performance study showed: - 8.5x better throughput for hybrid queries - 12.9x higher performance for vector search - 4x more efficient for in-place updates … The numbers are impressive, but what’s even more interesting is why it matters. AI-powered applications—LLMs, RAG pipelines, recommendation engines—need a search engine that can handle real-time updates, hybrid search (vector + text), and AI-based ranking in one system. What stands out about Vespa? ✅ 𝗢𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 & 𝗔𝗜-𝗿𝗲𝗮𝗱𝘆—supports vector, lexical, and structured search in a single query. ✅ 𝗥𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗶𝗻𝗱𝗲𝘅𝗶𝗻𝗴—no more waiting for updates to reflect in search. ✅ 𝗦𝗰𝗮𝗹𝗮𝗯𝗹𝗲 without the headaches—built to handle massive data workloads efficiently. Vespa isn’t just another search engine—it’s a platform built for AI-native search and ranking. If you’re working on AI-driven retrieval, they offer a 14-day free trial—worth testing.➡️Try it here: vespa.ai How are you optimizing search for AI applications? Would love to hear your thoughts! #artificialIntelligence #vectorsearch #llms #opensource
AI Limitations Overview
Explore top LinkedIn content from expert professionals.
-
-
"We find that all five studied off-the-shelf LLMs show forms of escalation and difficult-to-predict escalation patterns. We observe that models tend to develop arms-race dynamics, leading to greater conflict, and in rare cases, even to the deployment of nuclear weapons. Qualitatively, we also collect the models' reported reasonings for chosen actions and observe worrying justifications based on deterrence and first-strike tactics. Given the high stakes of military and foreign-policy contexts, we recommend further examination and cautious consideration before deploying autonomous language model agents for strategic military or diplomatic decision-making." Good work from Juan Pablo Rivera, Gabriel M., Anka Reuel, Max Lamparth, Ph.D., Chandler Smith, and Jacquelyn Schneider
-
DON’T rely on AI to do your research… Large language models (LLMs) are often praised for their ability to process information and assist with problem-solving, but can they really reason like ourselves? The latest study by Apple researchers reveals significant limitations in their capacity for genuine mathematical reasoning - and raises important questions about their reliability in research contexts. What Apple Found: 1. Inconsistent results: LLMs struggle with variations of the same problem, even at a basic grade-school math level. This variability challenges the validity of current benchmarks like GSM8K, which rely on single-point accuracy metrics. 2. Fragility to complexity: As questions become slightly more challenging, performance drops drastically, exposing a fragile reasoning process. 3. Susceptibility to irrelevant information: When distracting but inconsequential details were included in problems, model performance plummeted by up to 65%. Even repeated exposure to similar questions or fine-tuning couldn’t fix this. 4. Pattern matching ≠ reasoning: The models often “solve” problems by sophisticated pattern matching, not genuine logical understanding. What this means for research: While LLMs are powerful tools for speeding up certain tasks, their inability to discern critical from irrelevant information, and their reliance on pattern recognition, makes them unreliable for rigorous, logic-based research. This is particularly true in fields like mathematics, engineering, and data-driven sciences, where accuracy and reasoning are non-negotiable. As exciting as these tools are, they’re not ready to replace human critical thinking (yet?). How do you see AI evolving in research applications? #research #chemicalengineering #scientist #engineering #professor PS. Full paper available on ArXiv under 2410.05229
-
AI search just tore up the playbook at BrightonSEO San Diego. Josh Blyskal (Profound) shared findings from 250M+ AI search responses and 3B+ citations, and the implications are brutal for anyone still optimizing like it’s 2019. Key takeaways: • Reddit exploded. From 1% to 8%+ of ChatGPT citations in five months. One in twelve answers now routes through Reddit. • Clicks are collapsing. Referrals from ChatGPT dropped ~52% after GPT-5 launched. Brands are losing inline mentions to structured, “answer-first” domains like Reddit, Wikipedia, and niche UGC. • Overlap is thin. Only ~19% of Google SEO signals (rankings, backlinks, traffic) map directly into AI search citations. 81% of what fuels SEO doesn’t transfer. • Technical blind spots matter. AI crawlers often can’t parse JavaScript. No SSR fallback = no discovery. • Format bias is real. Listicles, semantic URLs, concise answer chunks, freshness cues (even “2025” in the URL) massively improve pickup. • Backlinks ≠ citations. Pages with fewer backlinks often earned more AI citations. Authority looks different when models are “lazy selectors” plucking from ~1,000 characters. ‼️ Stop optimizing for a click that may never come. Start optimizing for being chosen as the answer. That means: • Build answer capsules (one paragraph + table/list) into core templates. • Treat URLs, titles, and first 1k characters as your “pitch” to AI selectors. • Update and refactor content regularly, freshness bias is real. • Separate AEO dashboards from SEO: track citations, pickup rates, and which models choose you. Image credit: Profound; Full video in the comments.
-
The Illusion of Reasoning: A Timely Reflection from Apple’s Latest AI Research Apple recently published a white paper titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models.” As the AI community accelerates its focus on agentic AI and reasoning models, Apple’s study offers a timely reality check: What did they do? Apple rigorously tested several state-of-the-art reasoning-optimized language models (LLMs) using classic logic puzzles like Tower of Hanoi and River Crossing—structured problems that probe genuine reasoning ability. What did they find? • Models can solve simple problems reliably, but as complexity increases, performance drops sharply. • Minor changes (renaming puzzle elements, altering phrasing) can cause drastic performance shifts—highlighting a reliance on memorized patterns rather than true logical understanding. • Trace analysis shows that current models simulate reasoning by leveraging pattern-matching, not structured logic. 💡 Key Insight: While today’s models can appear to reason well under certain conditions, much of their success stems from surface-level pattern recognition—not deep, generalizable reasoning. Conclusion: This work reminds us that although current LLMs are powerful language processors, they remain far from achieving robust, transparent reasoning. As we build AI agents designed to assist in real-world decision-making, we must tread carefully—understanding both the capabilities and limitations of today’s models. In short, humility and rigor must accompany progress. Apple’s contribution is a welcome call for both. If you work with LLMs or agentic AI—how are you addressing reasoning robustness in your systems? Would love to hear your thoughts! #AI #ReasoningModels #AppleResearch #LLM #ResponsibleAI #engineeringtidbits
-
In 1999, the dotcoms were valued on traffic. IPO metrics revolved around eyeballs. Then Google launched AdWords, an ad model predicated on clicks, & built a $273b business in 2024. But that might all be about to change : Pew Research’s July 2025 studyreveals users click just 8% of search results with AI summaries, versus 15% without - a 47% reduction. Only 1% click through from within AI summaries. Cloudflare data shows AI platforms crawl content far more than they refer traffic back : Anthropic crawls 32,400 pages for every 1 referral, while traditional search engines scan content just a couple times per visitor sent. The expense of serving content to the AI crawlers may not be huge if it’s mostly text. The bigger point is AI systems disintermediate the user & publisher relationship. Users prefer aggregated AI answers over clicking through websites to find their answers. It’s logical that most websites should expect less traffic. How will your website & your business handle it? Sources: - Pew Research Center - Athena Chapekis, July 22, 2025 (https://lnkd.in/gKTqJ9iw) - Cloudflare: The crawl before the fall of referrals (https://lnkd.in/gqa26PUa) - Cloudflare Radar: AI Insights - Crawl to Refer Ratio (https://lnkd.in/gKP427sb) - Podcast: The Shifting Value of Content in the AI Age (https://lnkd.in/gUTkmPEz)
-
The semiconductor industry is one of the most vital sectors globally, being responsible for the development and manufacturing of the chips that power our computers, smartphones, and other electronic devices. However, the industry faces a series of significant challenges known as "walls," which obstruct further advancement. 🔹 The Memory Wall: This is the gap between the speed of the processor and the speed of the memory. As processors get faster, they need to access data from memory more quickly. However, the speed of memory is limited by the physical properties of the materials used to make it. This can lead to a bottleneck in performance, as the processor has to wait for data to be fetched from memory. 🔹 The Frequency Wall: This term refers to the limitations faced in increasing the clock frequency of microprocessors. With increased frequency, there comes increased power consumption and heat dissipation which is a fundamental challenge in semiconductor design. The "frequency wall" represents the point at which further increases in frequency yield diminishing returns or become unfeasible due to physical constraints. 🔹 The Power Wall: This is the limit on how much power a processor can consume before it becomes too hot and throttles its performance. As processors get faster, they consume more power. This can lead to problems with heat dissipation, which can damage the processor. 🔹 The ILP Wall: This is the limit on how much parallelism can be extracted from a program. As programs get more complex, it becomes more difficult to find opportunities for parallelism. This can limit the performance gains that can be achieved by increasing the clock speed or adding more cores. 🔹 The Network Wall: This is the limit on how fast data can be transferred between different parts of a computer system. As computers become more interconnected, the need for high-speed networking has increased. However, the physical limitations of the underlying technologies, such as copper wires and optical fibers, can limit the maximum bandwidth that can be achieved. These are just some of the walls that the semiconductor industry is facing. Researchers are working on new technologies to overcome these challenges and continue to push the boundaries of performance. #VLSI #semiconductorindustry #wall #challenges
-
Google DeepMind just exposed AI's limits - with Math. The same vector embeddings that power most modern AI search systems have mathematical limits we can't engineer around. No amount of training data or model scaling will fix this, according to them. Here's what's happening: When we ask AI to find relevant documents, we're essentially asking it to map meaning into geometric space - turning words into coordinates. But the researchers proved that for any given embedding dimension, there are combinations of documents that simply cannot be retrieved correctly. What sounds like a bug might be a fundamental limitation of these systems. To demonstrate this, they created LIMIT - a dataset so simple a child could solve it (matching "who likes apples?" with "Jon likes apples"). Yet even the best models, including those powering enterprise search systems, achieve less than 20% accuracy. GPT-class models with 4,096-dimensional embeddings still fail spectacularly. As we push AI to handle more complex retrieval tasks - think multi-criteria search, reasoning-based queries, or the instruction-following systems many companies are betting on - we're guaranteed to hit these walls. The paper shows that web-scale search would need embedding dimensions in the millions to handle all possible document combinations. So, what does this mean? Every company building RAG systems, every startup promising "ChatGPT for your documents," every enterprise search deployment - they're all constrained by this fundamental limit. The researchers found that alternative architectures like sparse models (think old-school keyword search) actually outperform modern neural approaches on these tasks. We've been treating retrieval as a solved problem, a building block we can rely on. But their research suggests we need to fundamentally rethink how we architect AI systems that need to find and reason over information. The good news? Once we understand the limits, we can design around them. Hybrid approaches, multi-stage retrieval, and careful system design can mitigate these issues. But it requires acknowledging that bigger models and more compute won't solve everything. For those of us working with AI, this is a reminder that understanding the fundamentals matters. The next breakthrough might not come from scaling up, but from stepping back and questioning our basic assumptions. What retrieval challenges has your organization faced that might be explained by these fundamental limits? ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡
-
“Much-hyped AI products like ChatGPT may provide medical doctors and health care professionals with information that can aggravate patients' conditions and lead to serious health consequences, a study suggests.” “The study covers three major domains: 1) dietary management, 2) nutrition care process and 3) menu planning for a 1,500-calorie diet.” The researchers “selected seven diet-related metabolic diseases” including “type 2 diabetes, metabolic syndrome and its components, namely central obesity, hyperglycemia, hypertension, low levels of high-density lipoprotein, and hypertriglyceridemia.” "A total of 63 prompts were fed into the GPT3.5-turbo0301 model through the ChatGPT interface provided by OpenAI, during October 2023. Two experienced dietitians evaluated the chatbot output's concordance with the Academy of Nutrition and Dietetics' guidelines” and found many things incomplete similar to this post’s picture. First, weight loss is "critical in the management of diabetes and metabolic abnormalities ... Yet the outputs of the ChatGPT missed the weight loss recommendations along with guidance on achieving an energy deficit." Second, “When asked to provide sample menus for the health conditions considered in the study, ChatGPT outputs did not meet the requirements in terms of energy, carbohydrates, and fat, in addition to calcium and vitamin D.” Third, the study “found that ChatGPT missed "appropriate physical activity and weight loss recommendations along with guidance on achieving an energy deficit" despite their being "critical in the management of diabetes and metabolic abnormalities. Fourth, "ChatGPT outputs were incomplete in terms of guidance on specific nutrients … and did not address the need to increase fiber intake or to consume whole grain products for all the considered conditions." “The study concludes, "ChatGPT, and potentially other future AI chatbots, react to the user's prompts in 'a human-like' way, but cannot replace the dietitians' expertise and critical judgment.” These problems are consistent with previous studies, some of which I have posted. The Univ of Massachusetts concluded that “Large Language Models Answer Medical Questions Accurately, but Can’t Match Clinicians’ Knowledge.” A Wall Street Journal article was entitled “At Startup That Says Its #AI Writes Medical Records, Humans Do a Lot of the Work.” A Stanford University article was entitled: “Generating Medical Errors: GenAI and Erroneous Medical References.” Then there was the pharmaceutical company that cancelled its copilot contract subscriptions from Microsoft for 500 employees, which I posted two months ago. And a survey from six months ago concluded that 2/3 of pharma companies have banned chatgpt. And these articles are just for #healthcare. #technology #innovation #startups #artificialintelligence https://lnkd.in/gU9QPUzh
-
I spend a lot of time with technical founders building AI companies. Many assume that if we just make models bigger and feed them more data, we'll eventually reach true intelligence. I see a different reality. The fundamental limits of transformer architecture run deeper than most founders realize. Transformer models face three architectural barriers that no amount of scale can solve: 1️⃣ The Edge Case Wall An example in autonomous vehicles: Every time you think you've handled all scenarios, reality throws a new one: a child chasing a ball, construction patterns you've never seen, extreme weather conditions. The architecture itself can't generalize to truly novel situations, no matter how much data you feed it. 2️⃣ The Pattern Matching Trap Our portfolio companies building enterprise AI tools hit this constantly. Current models can mimic patterns brilliantly but struggle to reason about new scenarios. It's like having a highly skilled copywriter who can't generate original insights. The limitation isn't in the training—it's baked into how transformers work. 3️⃣ The Semantic Gap LMs process text without truly understanding meaning. We see this clearly in technical domains like software development. Models can generate syntactically perfect code but often miss fundamental logic because they don't grasp what the code actually does. This creates a massive opportunity for technical founders willing to rethink AI architecture from first principles. Some promising directions I'm tracking: → World models that understand causality and physical interaction → Architectures designed for reasoning during inference rather than training → Systems that combine multiple specialized models rather than one large generalist Founders: While others chase marginal improvements through scale, focus on solving the fundamental problems to build the next $100B+ business (and I'll be your first check ;))