Google introduces Nested Learning (NL), a new paradigm for continuous learning. NL is a serious attempt to open a new axis in model design: continual, self-modifying systems with structured multi-timescale memory. On the roadmap to AGI, this is a meaningful step on the axis of “sustainable and self-improving learners.”
Google introduces Nested Learning, a new paradigm for continuous learning.
More Relevant Posts
-
The problem: Machine learning models struggle to retain knowledge over time. Even large language models, with billions of parameters, lose context when fine-tuned or retrained. This is the same issue that prevents continual learning, the ability to learn new skills without erasing old ones. The solution: Google’s Hope model uses Nested Learning to structure memory across multiple timescales. Instead of treating architecture and training as separate, it unifies them into one system of layered optimization problems. Isn't this cool? https://lnkd.in/ete9UXiF
To view or add a comment, sign in
-
Last week, Google launched the new paper about “Nested Learning”. This aims to rip apart the old “one-task, one-model, forget yesterday when I learn today” mindset. Current LLM learning techniques treat a machine-learning model as a monolithic beast, but with NL they view it as a hierarchy of nested optimisation problems. It’s more like how our brain functions, with different regions with different functions (learning speeds). In this new model, each chunk updating at its own rhythm, swirling in its own “context flow” and in this way it is effectively mimicking the brain’s neuroplasticity so the model can learn new tasks without catastrophically forgetting the old ones (The Catastrophic Failure problem) They even built a proof-of-concept architecture, dubbed Hope, that reportedly outperforms standard transformers on long-context and continual-learning task. Why is this important? This is a step closer to a continuously learning model and a tiny bit closer to the AGI goal of the US frontier models. https://lnkd.in/dSVVubda
To view or add a comment, sign in
-
Interesting work from Google Research, “Nested Learning” views a model as many small learners that update at different speeds. This supports continuous learning without forgeting past skills and makes long context easier to handle. Simple idea, strong impact for #AI builders. Think of it as nested optimization, each part of the network has its own context flow. Fast parts adapt quickly, slow parts keep knowledge stable. Architecture and optimizer are designed as one system, which opens clear paths for self improvement and better #memory across time. They built a proof of concept called #Hope, a self modifying recurrent model with a continuum memory system. On language modeling and common sense tests, Hope shows higher accuracy than strong recurrent baselines and a similar size transformer, and it handles long documents well. Authors, Ali Behrouz, Meisam Razaviyayn, Peiling Zhong, Vahab Mirrokni, Google Research, USA. Paper title, Nested Learning, The Illusion of Deep Learning Architectures, accepted at #NeurIPS2025. You can design memory as a spectrum, let the model refine its own update rules, and keep new learning from erasing what works. Where would you try this first, evolving knowledge bases, agent memory, very long documents?
To view or add a comment, sign in
-
-
🚀 Google just introduced a new way to train reasoning-capable LLMs — and it beats both SFT & RL LLMs are great at speaking, but struggle with thinking. Traditional post-training hit a ceiling: ❌ SFT: Copies demos → overfits long chains, weak generalization ❌ RL with verifiable reward (RLVR): Only rewards final answer → fails when model never reaches it Google’s answer? Supervised Reinforcement Learning (SRL) — a hybrid training strategy that gives models structured guidance and reward shaping. 🧠 What’s new SRL trains models to: ✅ Break problems into actions ✅ Think in inner monologue first ✅ Get step-wise rewards based on action similarity — not just final answer In other words, the model learns how to reason, not just what to output. 📌 This fixes the failure mode where RL can't learn when there are zero correct rollouts. 📊 Results SRL significantly improves small LLMs on hard reasoning tasks: 🔥 Outperforms SFT and RL individually 🔥 Best performance when combined: SRL → RLVR 🔥 Large gains on math benchmarks & agentic software tasks (Up to 2× improvement in coding agents!) 💡 Why it matters This is huge for open-source reasoning models: ▫️Enables deep reasoning without huge RL scale ▫️Scales to math, coding, and agent workflows ▫️Moves us closer to deliberate, reflective AI systems Think: Chain-of-Thought + RL + curriculum learning → in one training method. This could become the standard recipe for training small, smart models. 📄 Paper: https://lnkd.in/gQQ6iH2P 🔁 Please re-share to help more AI engineers see this 👥 Follow me for the latest frontier-AI breakthroughs #AI #LLMs #ReinforcementLearning #DeepLearning #MachineLearning #AIAgents #ReasoningModels #GoogleAI #NLP #RLAI #OpenSourceAI #FutureOfAI #ChainOfThought #TechResearch #AIMath #SoftwareAgents #MLOps #AITraining
To view or add a comment, sign in
-
🚀 Exploring Google DeepMind’s “Mixture of Recursions (MoR)”: A Step Toward Smarter, More Efficient LLMs As part of my Deep Learning class last week, we were tasked with reading, understanding, and presenting recent research papers in front of the entire class. The paper my team and I presented was DeepMind’s “Mixture of Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation.” What fascinated me was how this work tackles a core limitation in Transformers and LLMs, treating every token equally, even when some require deeper reasoning. MoR changes that, allowing each token to decide how deeply it should be processed. The result? Large language models that “think more about hard things” while saving computation and memory on simpler ones. 💡 Key highlights: - Adaptive token-level recursion for efficient computation. - Smart routing and parameter-sharing strategies that cut FLOPs and memory by ~25%. - Achieves 2× faster inference with comparable or better accuracy. Reading and understanding cutting-edge GenAI research like this is helping me build a deeper understanding of how LLMs are evolving from static architectures to adaptive, efficient reasoning systems. In this fast-moving AI era, staying curious and connected to innovation is key. 📄 Read the paper here: https://lnkd.in/dFF_ZxN8 #GenAI #LLMs #DeepLearning #AIResearch #GoogleDeepMind #Transformers #AdaptiveAI #MachineLearning #Innovation
To view or add a comment, sign in
-
🦝 For those curious about AI and eager to start learning, AI-For-Beginners by Microsoft is a free, open-source 12-week, 24-lesson curriculum that teaches the fundamentals of artificial intelligence through hands-on Jupyter notebooks, examples, and exercises. It covers key topics such as neural networks, computer vision, natural language processing, reinforcement learning, and AI ethics using frameworks like PyTorch and TensorFlow. Designed for beginners and educators alike, it focuses on foundational concepts rather than advanced mathematics or production-level AI. 💻 https://lnkd.in/emzeGg83 #AI #AILearning #SelfLearning #AIForBeginners #MicrosoftAI #MachineLearning #OpenSource
To view or add a comment, sign in
-
💡💡💡Nested Learning, a new approach to machine learning that views models as a set of smaller, nested optimization problems, each with its own internal workflow, in order to mitigate or even completely avoid the issue of “catastrophic forgetting”, where learning new tasks sacrifices proficiency on old tasks. #ContinualLearning
To view or add a comment, sign in
-
The authors present a large‐scale empirical study—over 400,000 GPU-hours—of reinforcement learning (RL) applied to large language models (LLMs), with the goal of establishing a predictive scaling framework for RL training akin to what has existed for pre-training. They fit a sigmoidal compute-performance curve that characterizes how rewards increase with compute, and they probe how various design choices (loss aggregation, normalization, curriculum, off-policy algorithm) affect asymptotic performance (the performance ceiling) versus compute-efficiency (how fast you approach the ceiling). Their key findings: (1) Different recipes lead to different asymptotic ceilings—so not all RL methods scale to the same top performance. (2)Many “tuning” design choices mostly alter the efficiency (how quickly you climb) but not the ultimate ceiling. (3) They propose a “best-practice” recipe called ScaleRL, demonstrate it scales predictably up to ~100,000 GPU-hours, and show that with this framework one can extrapolate from smaller runs to anticipate large-compute performance. https://lnkd.in/gXtX8Jr9
To view or add a comment, sign in
-
𝗚𝗼𝗼𝗴𝗹𝗲 𝗹𝗮𝘂𝗻𝗰𝗵𝗲𝘀 𝗡𝗲𝘀𝘁𝗲𝗱 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴, 𝗮 𝗻𝗲𝘄 𝗯𝗿𝗮𝗶𝗻-𝗶𝗻𝘀𝗽𝗶𝗿𝗲𝗱 𝗺𝗲𝘁𝗵𝗼𝗱 𝘁𝗼 𝗳𝗶𝘅 𝗔𝗜’𝘀 𝗰𝗮𝘁𝗮𝘀𝘁𝗿𝗼𝗽𝗵𝗶𝗰 𝗳𝗼𝗿𝗴𝗲𝘁𝘁𝗶𝗻𝗴 For years, deep learning has relied on a simple assumption: a model is one big optimization problem that learns from data all at once. But that simplicity has a price, catastrophic forgetting. Train a model on new data, and it often forgets what it already knew. Google Research wants to fix that with a new paradigm called Nested Learning, unveiled with a proof-of-concept architecture named Hope. 𝗧𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺: 𝗺𝗲𝗺𝗼𝗿𝘆 𝘁𝗵𝗮𝘁 𝗱𝗼𝗲𝘀𝗻’𝘁 𝗹𝗮𝘀𝘁 Machine learning models struggle to retain knowledge over time. Even large language models, with billions of parameters, lose context when fine-tuned or retrained. This is the same issue that prevents continual learning, the ability to learn new skills without erasing old ones. 𝗧𝗵𝗲 𝗶𝗻𝘀𝗶𝗴𝗵𝘁: 𝗺𝗼𝗱𝗲𝗹𝘀 𝗮𝗿𝗲 𝗺𝗼𝗿𝗲 𝗹𝗶𝗸𝗲 𝗯𝗿𝗮𝗶𝗻𝘀 𝘁𝗵𝗮𝗻 𝘄𝗲 𝘁𝗵𝗼𝘂𝗴𝗵𝘁 Nested Learning reimagines a model not as one giant learner but as a collection of smaller learners nested inside each other. Each one runs its own optimization loop and updates at its own speed, just like different brain regions processing information at distinct rhythms. 𝗧𝗵𝗲 𝗯𝗿𝗲𝗮𝗸𝘁𝗵𝗿𝗼𝘂𝗴𝗵: 𝘀𝗲𝗹𝗳-𝗺𝗼𝗱𝗶𝗳𝘆𝗶𝗻𝗴 𝗮𝗻𝗱 𝗰𝗼𝗻𝘁𝗲𝘅𝘁-𝗮𝘄𝗮𝗿𝗲 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 Google’s Hope model uses Nested Learning to structure memory across multiple timescales. Instead of treating architecture and training as separate, it unifies them into one system of layered optimization problems. This means the model can decide which parts to update quickly and which to keep stable. 𝗞𝗲𝘆 𝗰𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁𝘀 𝗗𝗲𝗲𝗽 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗿𝘀: Treats standard algorithms like Adam as memory systems that compress gradient updates. 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝘂𝗺 𝗠𝗲𝗺𝗼𝗿𝘆 𝗦𝘆𝘀𝘁𝗲𝗺 (𝗖𝗠𝗦): Extends Transformer memory into layers that update at different frequencies. 𝗦𝗲𝗹𝗳-𝗠𝗼𝗱𝗶𝗳𝘆𝗶𝗻𝗴 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: The Hope model learns to adjust its own learning rules over time. Results Hope outperforms Titans, Samba, and Transformers on language modeling and reasoning tasks. It achieves lower perplexity, higher accuracy, and handles long-context reasoning better, solving “Needle-in-a-Haystack” tasks with more stability and precision. Implementation We can simulate nested optimization today in PyTorch or JAX by assigning different update rates to model components or optimizers. This lets you fine-tune large models while preserving old knowledge, a practical first step toward continual learning. #Learn&Share
To view or add a comment, sign in
-
-
One from a few days ago - Google’s new AI and the Future of Learning document. https://lnkd.in/eAihFqkJ It doesn’t say much that we don’t know already, and of the major LLM developers, Google seem to be the ones making the most effort to create a model specifically focused on learning, with LearnLM. What I still don’t quite get is what they actually think future learning will look like. They write: “We have a desire not to replace instruction, but to help human curiosity reach new heights.” But then there’s a contradiction: much of what they describe sounds like AI doing a lot of the instructing. It’s the familiar tension between, on one hand, a push for personalised learning, and on the other, saving teacher time for “human activities.” The personalisation seems to mean adaptive learning paths, which ultimately points towards individual rather than social learning. If that's not what they mean, well, they don’t really give a view on what that looks like in practice, unless I'm missing something. They do acknowledge the long history of these ideas and how we can learn from the past, but it’s still hard to shake the image of rows of students sitting in front of modern-day Skinner learning machines
To view or add a comment, sign in