AI search is evolving—Are traditional engines falling behind? AI-powered search is shifting the landscape—OpenAI is developing SearchGPT, Google is enhancing Gemini 2.0, and Meta is building its own AI-driven search engine. The shift is clear: search is no longer just about retrieving information—it’s about understanding context, intent, and relevance in real-time. Traditional search engines, like Elasticsearch, were originally designed for log analytics and keyword matching. While they now support AI-driven retrieval, they struggle with real-time ranking, hybrid search (vector + text), and AI-powered personalization—all essential for modern applications. That’s why Vespa’s latest benchmark caught my attention— Vespa.ai is 𝗼𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 and was built from the ground up to handle vector search, recommendations, and machine-learned ranking at scale. Their recent performance study showed: - 8.5x better throughput for hybrid queries - 12.9x higher performance for vector search - 4x more efficient for in-place updates … The numbers are impressive, but what’s even more interesting is why it matters. AI-powered applications—LLMs, RAG pipelines, recommendation engines—need a search engine that can handle real-time updates, hybrid search (vector + text), and AI-based ranking in one system. What stands out about Vespa? ✅ 𝗢𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲 & 𝗔𝗜-𝗿𝗲𝗮𝗱𝘆—supports vector, lexical, and structured search in a single query. ✅ 𝗥𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝗶𝗻𝗱𝗲𝘅𝗶𝗻𝗴—no more waiting for updates to reflect in search. ✅ 𝗦𝗰𝗮𝗹𝗮𝗯𝗹𝗲 without the headaches—built to handle massive data workloads efficiently. Vespa isn’t just another search engine—it’s a platform built for AI-native search and ranking. If you’re working on AI-driven retrieval, they offer a 14-day free trial—worth testing.➡️Try it here: vespa.ai How are you optimizing search for AI applications? Would love to hear your thoughts! #artificialIntelligence #vectorsearch #llms #opensource
AI Workloads and Search Engine Limitations
Explore top LinkedIn content from expert professionals.
Summary
AI workloads and search engine limitations refer to the challenges modern artificial intelligence systems face when working with current search engines, especially as they attempt to process huge amounts of unstructured data and deliver highly relevant, context-aware answers. Traditional search engines were designed for simpler keyword searches, but AI applications require more advanced capabilities that these older systems struggle to provide.
- Rethink content strategy: Create concise, answer-focused formats and regularly update your content to increase the chance AI models will select your site as a source.
- Monitor crawlers closely: Pay attention to how AI-driven search tools are accessing your site and track referral patterns, since user clicks and traffic may decrease as AI answers become more direct.
- Evaluate your infrastructure: Assess whether your data platforms can truly support AI workloads, and consider hybrid or modern solutions to handle text-heavy, AI-powered tasks more smoothly.
-
-
AI search just tore up the playbook at BrightonSEO San Diego. Josh Blyskal (Profound) shared findings from 250M+ AI search responses and 3B+ citations, and the implications are brutal for anyone still optimizing like it’s 2019. Key takeaways: • Reddit exploded. From 1% to 8%+ of ChatGPT citations in five months. One in twelve answers now routes through Reddit. • Clicks are collapsing. Referrals from ChatGPT dropped ~52% after GPT-5 launched. Brands are losing inline mentions to structured, “answer-first” domains like Reddit, Wikipedia, and niche UGC. • Overlap is thin. Only ~19% of Google SEO signals (rankings, backlinks, traffic) map directly into AI search citations. 81% of what fuels SEO doesn’t transfer. • Technical blind spots matter. AI crawlers often can’t parse JavaScript. No SSR fallback = no discovery. • Format bias is real. Listicles, semantic URLs, concise answer chunks, freshness cues (even “2025” in the URL) massively improve pickup. • Backlinks ≠ citations. Pages with fewer backlinks often earned more AI citations. Authority looks different when models are “lazy selectors” plucking from ~1,000 characters. ‼️ Stop optimizing for a click that may never come. Start optimizing for being chosen as the answer. That means: • Build answer capsules (one paragraph + table/list) into core templates. • Treat URLs, titles, and first 1k characters as your “pitch” to AI selectors. • Update and refactor content regularly, freshness bias is real. • Separate AEO dashboards from SEO: track citations, pickup rates, and which models choose you. Image credit: Profound; Full video in the comments.
-
Google DeepMind just exposed AI's limits - with Math. The same vector embeddings that power most modern AI search systems have mathematical limits we can't engineer around. No amount of training data or model scaling will fix this, according to them. Here's what's happening: When we ask AI to find relevant documents, we're essentially asking it to map meaning into geometric space - turning words into coordinates. But the researchers proved that for any given embedding dimension, there are combinations of documents that simply cannot be retrieved correctly. What sounds like a bug might be a fundamental limitation of these systems. To demonstrate this, they created LIMIT - a dataset so simple a child could solve it (matching "who likes apples?" with "Jon likes apples"). Yet even the best models, including those powering enterprise search systems, achieve less than 20% accuracy. GPT-class models with 4,096-dimensional embeddings still fail spectacularly. As we push AI to handle more complex retrieval tasks - think multi-criteria search, reasoning-based queries, or the instruction-following systems many companies are betting on - we're guaranteed to hit these walls. The paper shows that web-scale search would need embedding dimensions in the millions to handle all possible document combinations. So, what does this mean? Every company building RAG systems, every startup promising "ChatGPT for your documents," every enterprise search deployment - they're all constrained by this fundamental limit. The researchers found that alternative architectures like sparse models (think old-school keyword search) actually outperform modern neural approaches on these tasks. We've been treating retrieval as a solved problem, a building block we can rely on. But their research suggests we need to fundamentally rethink how we architect AI systems that need to find and reason over information. The good news? Once we understand the limits, we can design around them. Hybrid approaches, multi-stage retrieval, and careful system design can mitigate these issues. But it requires acknowledging that bigger models and more compute won't solve everything. For those of us working with AI, this is a reminder that understanding the fundamentals matters. The next breakthrough might not come from scaling up, but from stepping back and questioning our basic assumptions. What retrieval challenges has your organization faced that might be explained by these fundamental limits? ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡
-
In 1999, the dotcoms were valued on traffic. IPO metrics revolved around eyeballs. Then Google launched AdWords, an ad model predicated on clicks, & built a $273b business in 2024. But that might all be about to change : Pew Research’s July 2025 studyreveals users click just 8% of search results with AI summaries, versus 15% without - a 47% reduction. Only 1% click through from within AI summaries. Cloudflare data shows AI platforms crawl content far more than they refer traffic back : Anthropic crawls 32,400 pages for every 1 referral, while traditional search engines scan content just a couple times per visitor sent. The expense of serving content to the AI crawlers may not be huge if it’s mostly text. The bigger point is AI systems disintermediate the user & publisher relationship. Users prefer aggregated AI answers over clicking through websites to find their answers. It’s logical that most websites should expect less traffic. How will your website & your business handle it? Sources: - Pew Research Center - Athena Chapekis, July 22, 2025 (https://lnkd.in/gKTqJ9iw) - Cloudflare: The crawl before the fall of referrals (https://lnkd.in/gqa26PUa) - Cloudflare Radar: AI Insights - Crawl to Refer Ratio (https://lnkd.in/gKP427sb) - Podcast: The Shifting Value of Content in the AI Age (https://lnkd.in/gUTkmPEz)
-
The inconvenient truth about traditional data platforms: they weren’t built for the AI era. Sure, #Spark and #Snowflake are fantastic for structured data. If your world is SQL, tables, and batch jobs, they’ve got you covered. But here’s the problem: Modern data workloads aren’t just about structured data anymore. AI runs on text, but our infrastructure doesn’t. LLMs, embeddings, vector search—companies today are sitting on goldmines of unstructured text but lack the right infrastructure and tooling to turn that data into business value. The reality? ❌ Spark wasn’t built for optimizing embedding workflows or LLM inference. ❌ Snowflake wasn’t designed for managing retrieval-augmented generation (RAG). ❌ Existing query engines struggle to make sense of massive, evolving text datasets. Yet, AI-driven products need all of this. So what do teams do? They build brittle, complex, homegrown pipelines just to duct-tape solutions together. DIY is the name of the game. This isn’t sustainable. We need a new kind of data platform—one that natively understands and optimizes AI workloads from the ground up. Text is where the real value is—but it shouldn’t be where the friction is. Curious—how are you tackling text and AI workloads in your data stack today? Are you feeling the friction? #AI #DataInfrastructure #Serverless #LLMs #AIInfrastructure #UnstructuredData