AI models are reasoning, creating, and evolving. The evidence is no longer theoretical; it's peer-reviewed, measurable, and, in some domains, superhuman. In the last 18 months, we’ve seen LLMs move far beyond next-token prediction. They’re beginning to demonstrate real reasoning, hypothesis generation, long-horizon planning, and even scientific creativity. Here are six breakthroughs that redefine what these models can do: Superhuman Clinical Reasoning (Nature Medicine, 2025) In a rigorous test across 12 specialties, GPT-4 scored 89% on the NEJM Knowledge+ medical reasoning exam, outperforming the average physician score of 74%. This wasn’t just Q&A; it involved multi-hop reasoning, risk evaluation, and treatment planning. That’s structured decision-making in high-stakes domains. Creative Research Ideation (Zhou et al., 2024 – arXiv:2412.10849) Across 10 fields from physics to economics, GPT-4 and Claude generated research questions rated more creative than human-generated ones in 53% of cases. This wasn’t trivia; domain experts blindly compared ideas from AI and researchers. In over half the cases, the AI won. Falsifiable Hypotheses from Raw Data (Nemati et al., 2024) GPT-4o was fed raw experimental tables from biology and materials science and asked to propose novel hypotheses. 46% of them were judged publishable by experts, outperforming PhD students (29%) on the same task. That’s not pattern matching, that’s creative scientific reasoning from scratch. Self-Evolving Agents (2024) LLM agents that reflect, revise memory, and re-prompt themselves improved their performance on coding benchmarks from 21% → 34% in just four self-corrective cycles, without retraining. This is meta-cognition in action: learning from failure, iterating, and adapting over time. Long-Term Agent Memory (A-MEM, 2025) Agents equipped with dynamic long-term memory (inspired by Zettelkasten) achieved 2× higher success on complex web tasks, planning across multiple steps with context continuity. Emergent Social Reasoning (AgentSociety, 2025) In a simulation of 1,000 LLM-driven agents, researchers observed emergent social behaviors: rumor spreading, collaborative planning, and even economic trade. No hardcoding. Just distributed reasoning, goal propagation, and learning-by-interaction. These findings span healthcare, science, software engineering, and multi-agent simulations. They reveal systems that generate, reason, and coordinate, not just predict. So when some argue that “AI is only simulating thought,” we should ask: Are the tests capturing how real reasoning happens? The Tower of Hanoi isn’t where science, medicine, or innovation happens. The real test is: 1. Can a model make a novel discovery? 2. Can it self-correct across steps? 3. Can it outperform domain experts in structured judgment? And increasingly, the answer is: yes. Let’s not confuse symbolic puzzles with intelligence. Reasoning is already here, and it’s evolving.
Future Directions for AI Reasoning
Explore top LinkedIn content from expert professionals.
Summary
The field of AI reasoning is evolving rapidly, moving beyond basic problem-solving and prediction to encompass advanced capabilities like creative thinking, long-term planning, and self-correction. Future directions for AI reasoning involve innovations in architecture, inference techniques, and scalable computation to develop systems that can reason, adapt, and perform tasks more intelligently and efficiently.
- Focus on reasoning capabilities: Explore approaches like chain-of-thought and self-refinement techniques to help models break down problems into steps, review outputs, and produce logical and reliable solutions.
- Combine specialized systems: Leverage multiple smaller, focused AI models instead of relying on a single, large model to handle complex tasks and improve performance and scalability.
- Rethink AI architecture: Prioritize new designs that go beyond pattern recognition, emphasizing causality, inference-time reasoning, and domain-specific solutions to address the limitations of current transformer models.
-
-
I spend a lot of time with technical founders building AI companies. Many assume that if we just make models bigger and feed them more data, we'll eventually reach true intelligence. I see a different reality. The fundamental limits of transformer architecture run deeper than most founders realize. Transformer models face three architectural barriers that no amount of scale can solve: 1️⃣ The Edge Case Wall An example in autonomous vehicles: Every time you think you've handled all scenarios, reality throws a new one: a child chasing a ball, construction patterns you've never seen, extreme weather conditions. The architecture itself can't generalize to truly novel situations, no matter how much data you feed it. 2️⃣ The Pattern Matching Trap Our portfolio companies building enterprise AI tools hit this constantly. Current models can mimic patterns brilliantly but struggle to reason about new scenarios. It's like having a highly skilled copywriter who can't generate original insights. The limitation isn't in the training—it's baked into how transformers work. 3️⃣ The Semantic Gap LMs process text without truly understanding meaning. We see this clearly in technical domains like software development. Models can generate syntactically perfect code but often miss fundamental logic because they don't grasp what the code actually does. This creates a massive opportunity for technical founders willing to rethink AI architecture from first principles. Some promising directions I'm tracking: → World models that understand causality and physical interaction → Architectures designed for reasoning during inference rather than training → Systems that combine multiple specialized models rather than one large generalist Founders: While others chase marginal improvements through scale, focus on solving the fundamental problems to build the next $100B+ business (and I'll be your first check ;))
-
A lot has changed since my #LLM inference article last January—it’s hard to believe a year has passed! The AI industry has pivoted from focusing solely on scaling model sizes to enhancing reasoning abilities during inference. This shift is driven by the recognition that simply increasing model parameters yields diminishing returns and that improving inference capabilities can lead to more efficient and intelligent AI systems. OpenAI's o1 and Google's Gemini 2.0 are examples of models that employ #InferenceTimeCompute. Some techniques include best-of-N sampling, which generates multiple outputs and selects the best one; iterative refinement, which allows the model to improve its initial answers; and speculative decoding. Self-verification lets the model check its own output, while adaptive inference-time computation dynamically allocates extra #GPU resources for challenging prompts. These methods represent a significant step toward more reasoning-driven inference. Another exciting trend is #AgenticWorkflows, where an AI agent, a SW program running on an inference server, breaks the queried task into multiple small tasks without requiring complex user prompts (prompt engineering may see end of life this year!). It then autonomously plans, executes, and monitors these tasks. In this process, it may run inference multiple times on the model while maintaining context across the runs. #TestTimeTraining takes things further by adapting models on the fly. This technique fine-tunes the model for new inputs, enhancing its performance. These advancements can complement each other. For example, an AI system may use agentic workflow to break down a task, apply inference-time computing to generate high-quality outputs at each step and employ test-time training to learn unexpected challenges. The result? Systems that are faster, smarter, and more adaptable. What does this mean for inference hardware and networking gear? Previously, most open-source models barely needed one GPU server, and inference was often done in front-end networks or by reusing the training networks. However, as the computational complexity of inference increases, more focus will be on building scale-up systems with hundreds of tightly interconnected GPUs or accelerators for inference flows. While Nvidia GPUs continue to dominate, other accelerators, especially from hyperscalers, would likely gain traction. Networking remains a critical piece of the puzzle. Can #Ethernet, with enhancements like compressed headers, link retries, and reduced latencies, rise to meet the demands of these scale-up systems? Or will we see a fragmented ecosystem of switches for non-Nvdia scale-up systems? My bet is on Ethernet. Its ubiquity makes it a strong contender for the job... Reflecting on the past year, it’s clear that AI progress isn’t just about making things bigger but smarter. The future looks more exciting as we rethink models, hardware, and networking. Here’s to what the 2025 will bring!
-
This DeepSeek Chinese AI technical report is a technical masterpiece. DeepSeek, an AI research organization, focuses on advancing reasoning capabilities in LLMs. Their paper introduces DeepSeek-R1, a series of models designed to push the boundaries of reasoning through innovative reinforcement learning techniques. Here's a quick summary of the main points: 𝟭/ 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗙𝗼𝗰𝘂𝘀: Introduced DeepSeek-R1-Zero, trained entirely via reinforcement learning (RL) without supervised fine-tuning, showcasing advanced reasoning behaviors but struggling with readability and language mixing. 𝟮/ 𝗖𝗼𝗹𝗱-𝗦𝘁𝗮𝗿𝘁 𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗺𝗲𝗻𝘁𝘀: Developed DeepSeek-R1 with a multi-stage training pipeline incorporating cold-start data and iterative RL, achieving performance comparable to OpenAI's o1-1217 on reasoning tasks. 𝟯/ 𝗗𝗶𝘀𝘁𝗶𝗹𝗹𝗮𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗦𝗺𝗮𝗹𝗹𝗲𝗿 𝗠𝗼𝗱𝗲𝗹𝘀: Demonstrated effective distillation of reasoning capabilities from larger models to smaller dense models, yielding high performance with reduced computational requirements. 𝟰/ 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸 𝗔𝗰𝗵𝗶𝗲𝘃𝗲𝗺𝗲𝗻𝘁𝘀: Outperformed or matched state-of-the-art models on reasoning, mathematics, and coding benchmarks, with notable success in long-context and logic-intensive tasks. 𝟱/ 𝗙𝘂𝘁𝘂𝗿𝗲 𝗗𝗶𝗿𝗲𝗰𝘁𝗶𝗼𝗻𝘀: Plans include improving multi-language capabilities, addressing prompt sensitivity, and optimizing RL for software engineering and broader task generalization. The models are Open Source under MIT license, including DeepSeek-R1-Zero, DeepSeek-R1, and distilled variants. This openness aims to accelerate innovation and enable broader adoption of advanced reasoning models. - Link to paper: https://lnkd.in/gJQ5bsJS - Github link to the model: https://lnkd.in/gFWQRZrB
-
🏆 My curated list of the top Generative AI papers from January 2025 is now live on my repository! (A little late, but well 😅 ) I’ve compiled 45+ top papers with links and abstracts, catch up on the latest research in generative AI. This month’s research marks a clear shift from last year’s focus on the application layer—we’re seeing a return to more model-level advancements. Here are the key patterns: ⛳ Advanced Reasoning & Self-Correction: LLM research is moving toward active reasoning and self-correction, with reinforcement learning and process supervision improving accuracy and generalization. The focus is shifting from just producing answers to reasoning through problems. ⛳ Multi-Modal & Agentic Systems: An expected trend—more work on integrating text, vision, and interactivity, along with a rise in domain-specific and multi-agent research. ⛳ Scalable Inference & Efficient Computation: New techniques in test-time computing and scaling inference efficiently. This trend ties closely to reasoning models, optimizing compute without simply making models bigger. 💡 Compared to Q4 last year, which was heavily focused on agent applications, the current shift is toward reasoning, self-correction, and efficient inference. I see this trend sticking around for a while given that reasoning models have started this new wave of model-level optimization research. I’ll be sharing a deeper analysis on Substack soon. Link: https://lnkd.in/e229UbMa
-
Reasoning is at the core of human intelligence—it’s how we solve problems, make decisions, and navigate complex challenges. For AI to be truly transformative, it must do the same. DeepSeek is built to push the boundaries of reinforcement learning (RL) in LLM training, reducing reliance on supervised fine-tuning while equipping smaller models with advanced reasoning through innovative distillation techniques. The result? More accessible, efficient, and scalable AI. “Aha moments” occur when the model recognizes and corrects its own errors. Here’s why this matters, especially in healthcare and real-time applications: ✅ Faster Inference & Real-Time AI – Compact models deliver low-latency responses, ideal for clinical decision support, surgery, diagnostics, and patient monitoring. ✅ Reduced Dataset Dependence – RL and chain-of-thought reasoning minimize the need for large fine-tuning datasets, a game-changer for data-sensitive fields like healthcare. ✅ Democratizing AI – Smaller models with enhanced reasoning broaden access to powerful AI tools, making high-performance AI more inclusive. ✅ Scalability & Accessibility – Models that can run locally or on edge devices lower inference costs, benefiting rural or resource-limited healthcare settings. ✅ Energy Efficiency & Sustainability – Lower compute requirements reduce energy consumption, making AI deployment more sustainable at scale. By refining how AI learns and reasons, these technologies are paving the way for more efficient, scalable, and accessible AI solutions. The implications are huge—not just for healthcare but for any domain where real-time, cost-effective AI can make a difference. #AI #Opensource #LLMs #HealthcareInnovation #EdgeAI #ReinforcementLearning
-
𝗧𝗟;𝗗𝗥: AI agents (aka agentic) are increasingly viewed as the future of AI and technology, with "reasoning abilities" being crucial to their success. So, understanding reasoning in AI is crucial for developing a successful agentic strategy. 𝗛𝘂𝗺𝗮𝗻 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴: Before discussing AI, let's review human reasoning which is a huge & complex topic. Reasoning is the mental process of drawing conclusions & making judgments based on evidence, logic, and prior knowledge. It's how we process information to understand relationships between ideas, solve problems, and reach well-justified conclusions. https://bit.ly/3UloGoP When it comes to reasoning with AI, there are multiple approaches: 𝟭. 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗲𝗱 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 (𝗔𝗥): AR emerged in the 1950s alongside the birth of AI and it attempts to provide assurance about what a system or program will do or will never do. This assurance is done using mathematical, logic-based algorithmic verification methods to produce proofs of security or correctness for all possible behaviors. https://go.aws/4hlKomf from Amazon Web Services (AWS) While there were early attempts at reasoning in deep learning, it’s with the rise of LLMs that the interest in reasoning ballooned! 𝟮. 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 𝗴𝗮𝗶𝗻𝗲𝗱 𝗱𝘂𝗿𝗶𝗻𝗴 𝗟𝗟𝗠 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴: LLMs perform what appears to be "reasoning" through pattern matching, statistical prediction based on their training data vs true logical inference. Techniques like chain-of-thought prompting have emerged as a 𝗽𝗮𝗿𝘁𝗶𝗰𝘂𝗹𝗮𝗿𝗹𝘆 𝗲𝗳𝗳𝗲𝗰𝘁𝗶𝘃𝗲 𝗺𝗲𝘁𝗵𝗼𝗱 𝘁𝗵𝗮𝘁 𝗮𝗹𝗹𝗼𝘄𝘀 𝗟𝗟𝗠𝘀 𝘁𝗼 𝗯𝗿𝗲𝗮𝗸 𝗱𝗼𝘄𝗻 𝗰𝗼𝗺𝗽𝗹𝗲𝘅 𝗽𝗿𝗼𝗯𝗹𝗲𝗺𝘀 𝗶𝗻𝘁𝗼 𝘀𝗶𝗺𝗽𝗹𝗲𝗿 𝘀𝘂𝗯𝘁𝗮𝘀𝗸𝘀, 𝘀𝗶𝗺𝗶𝗹𝗮𝗿 𝘁𝗼 𝗵𝘂𝗺𝗮𝗻 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗲𝘀. The ability to decompose tasks and generate intermediate reasoning steps has proven crucial for solving arithmetic, commonsense, and symbolic reasoning challenges, marking a significant advancement in AI. https://lnkd.in/eQ2gpi6C 𝟯. 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗱𝘂𝗿𝗶𝗻𝗴 𝗟𝗟𝗠 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲: With the announcement of OpenAI O(1), reasoning during LLM inference has emerged as a promising direction for enhancing performance. Recent research has shown that 𝗮𝗹𝗹𝗼𝗰𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗰𝗼𝗺𝗽𝘂𝘁𝗲 𝗿𝗲𝘀𝗼𝘂𝗿𝗰𝗲𝘀 𝗱𝘂𝗿𝗶𝗻𝗴 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗰𝗮𝗻 𝗹𝗲𝗮𝗱 𝘁𝗼 𝘀𝘂𝗯𝘀𝘁𝗮𝗻𝘁𝗶𝗮𝗹 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁𝘀 𝘄𝗵𝗲𝗻 𝗴𝗶𝘃𝗲𝗻 𝗮𝗽𝗽𝗿𝗼𝗽𝗿𝗶𝗮𝘁𝗲 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲-𝘁𝗶𝗺𝗲 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻. https://bit.ly/4eVzVMX Each of the above techniques has its own pros and cons but they are complementary and can be used together. Reasoning is a highly debated topic: https://bit.ly/4dZdULC. (via the incredible Melanie Mitchell) This post is an introduction but lots of great research on this topic :https://bit.ly/40czTM4. It’s crucial to know the details and ensure we do not fall prey to agentic snake oil:https://bit.ly/48ht7Xd!
-
Happy Friday, this week in #learnwithmz lets talk about 𝐀𝐈 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠. Most of the focus in AI has been on scaling up and out: more data, longer context windows, bigger models. But in my opinion one of the most exciting shifts is happening in a different direction: Reasoning. 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐀𝐈 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠? Reasoning allows a model to: > Break problems into smaller steps > Compare options and evaluate outcomes > Combine facts logically > Review and improve its own outputs Language models are great with patterns, but they often struggle with logic, math, or planning. Reasoning techniques aim to make them smarter, not just bigger. 𝐊𝐞𝐲 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬 > Chain of Thought (CoT) The model thinks out loud, step by step. Example: “Let’s solve this carefully, one step at a time.” > Tree of Thoughts (ToT) The model explores multiple possible answers in parallel, like different paths. Useful for puzzles, planning, and creative writing. Paper (https://lnkd.in/gbJhTS6q) | Code (https://lnkd.in/g9vdA4qm) > Graph of Thoughts (GoT) The model builds and navigates a reasoning graph to compare and revise ideas. Paper (https://lnkd.in/gW2QcBZU) | Repo (https://lnkd.in/gC_QSFcQ) > Self-Refinement The model reviews and edits its own output to improve accuracy or quality. Works well for writing, code, and structured tasks. 𝐖𝐢𝐭𝐡 𝐨𝐫 𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐀𝐠𝐞𝐧𝐭𝐬: 𝐖𝐡𝐚𝐭’𝐬 𝐭𝐡𝐞 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞? Reasoning workflows can be used in both static and dynamic AI systems. > Without AI agents Reasoning happens in a single prompt or series of prompts. You ask the model to "think step by step" or use a CoT or ToT workflow manually. This works well for individual tasks like solving a math problem, drafting content, or analyzing a dataset. > With AI agents Reasoning becomes part of an ongoing process. Agents use tools, memory, and feedback loops to plan and adapt over time. They might use reasoning to decide which action to take next, evaluate outcomes, or retry when they fail. Reasoning becomes part of autonomous behavior. Simple way to think is reasoning is the brain. Agents are the body. You can use reasoning alone for smart responses or combine it with agents for end-to-end execution. 𝐔𝐬𝐞 𝐂𝐚𝐬𝐞𝐬 Writing tools that plan, draft, and edit content. Data agents that walk through logic and insights. Tutoring tools that teach by showing reasoning. Business agents that plan tasks and retry failures. Copilots that reason about which tool or API to use next. Reasoning workflows are helping smaller models solve bigger problems. They make AI more reliable, interpretable, and useful. This is how we move from chatbots to actual collaborators. #AI #AIReasoning #ChainOfThought #LLMEngineering #AIAgents #ArtificialIntelligence #learnwithmz