💡 Hey, we got Hope? Turns out, yes — and it comes from Google Research! But this “Hope” might just decrease hope for new grad engineers trying to keep up with how fast ML is evolving 😅 Google just dropped Nested Learning, a completely new paradigm for continual learning — where models don’t just learn new things, they keep old knowledge intact while evolving intelligently over time. Their prototype architecture, aptly named Hope, shows promising results in long-context reasoning and overcoming catastrophic forgetting — a problem that’s haunted ML models for years. This approach introduces a continuum memory system (CMS) — modules that update at different rates, similar to human short-term and long-term memory. It’s a step closer to machines that learn like humans do — balancing stability and adaptability. Now the big question — what happens to startups like Mem0, MemGPT, or others building memory-augmented frameworks for LLMs? If Hope scales well, it could absorb many of those memory innovation layers directly into the model architecture itself, rather than relying on external retrieval or RAG-style memory stores. Exciting times — both hopeful and humbling. Blog link: 🔗 https://lnkd.in/g__PJhJc
Nikhil Renikunta’s Post
More Relevant Posts
-
Just saw Google DeepMind's new AlphaEvolve announcement and I'm genuinely floored. This isn't just another AI toy – they've built an evolutionary coding agent that's ACTUALLY delivering real business results: • Recovered 0.7% of Google's GLOBAL compute resources (think about that scale for a second) • Sped up Gemini's own training by 1% (when you're spending millions on training, that's massive) • Found new solutions to 300-year-old math problems What strikes me most? The practical applications. This isn't theoretical – it's already optimizing data centers, improving chip design, and enhancing AI training processes. For founders building in the AI space: This is why infrastructure and optimization matter so much. A 1% improvement at scale can translate to millions saved. Biggest lesson here for all entrepreneurs: Look for compounding efficiencies. What's the equivalent 0.7% resource recovery you could implement in your business today? Thoughts? https://lnkd.in/e4fNU6iE
To view or add a comment, sign in
-
The problem: Machine learning models struggle to retain knowledge over time. Even large language models, with billions of parameters, lose context when fine-tuned or retrained. This is the same issue that prevents continual learning, the ability to learn new skills without erasing old ones. The solution: Google’s Hope model uses Nested Learning to structure memory across multiple timescales. Instead of treating architecture and training as separate, it unifies them into one system of layered optimization problems. Isn't this cool? https://lnkd.in/ete9UXiF
To view or add a comment, sign in
-
When I worked at Google, I was lucky to collaborate with some of the brightest machine-learning (ML) engineers. They worked on feature engineering. By picking the factors to guide the ML model, their advances could generate tens to hundreds of millions of additional revenue. Imagine an Excel spreadsheet with hundreds of columns of data. Add two columns, multiply two, divide by another, and subtract a fourth. Each of these is a feature. ML models used features to predict the best ad to show. It started as a craft, reflecting the vibes of the era. Over time, we’ve mechanized this art into a machine called AutoML that massively accelerates the discovery of the right features. Today, reinforcement learning (RL) is in the same place as feature engineering 15 years ago. What is RL? It’s a technique of teaching AI to accomplish goals. https://lnkd.in/dsh8hm3Q
To view or add a comment, sign in
-
New paradigm in ML: “Nested Learning” takes centre stage Google Research just unveiled a new ML framework called Nested Learning, which treats a model not as one monolithic optimisation but as a stack of smaller, interconnected optimisation problems. In simple terms: instead of training a giant model to learn everything in a single go, the model learns many sub-problems in parallel (or nested) — which helps avoid “forgetting” old tasks when new ones are added. From my standpoint: this feels like the next wave after scaling up size — it’s about smarter model design and continual‐learning readiness. For folks building or deploying LLMs, this may become a critical architecture shift. How do you see Nested Learning impacting the way we build or maintain LLMs in production? Questions welcome! https://lnkd.in/dn2UJRTs
To view or add a comment, sign in
-
-
Google Machine Learning Foundational Courses I’ve explored Google’s Foundational Machine Learning Courses, which offer a clear, structured path to understanding ML fundamentals: Introduction to Machine Learning – Learn when and how ML can solve real-world problems. Machine Learning Crash Course – Hands-on lessons with videos, visualizations, and coding exercises covering regression, classification, neural networks, and generalization. Problem Framing – How to define and structure problems effectively for ML solutions. Managing ML Projects – Best practices for planning, data collection, and deployment in ML initiatives. These courses provide a strong foundation for both beginners and professionals who want to apply ML principles in practical projects. 🚀 #MachineLearning #GoogleDevelopers #AI #Learning https://lnkd.in/ds6R3P-C
To view or add a comment, sign in
-
The Vision Transformer (ViT) by Google changed the direction of computer vision completely. CNNs had been the backbone of deep learning for almost a decade, but ViT replaced convolutions with pure attention and still managed to outperform them. The only drawback was its hunger for data and compute. The best version, ViT-H/14, was trained on Google’s private JFT-300M dataset with around 300 million images, taking close to 2,500 TPUv3-core-days for pre-training. It was revolutionary, but it came with a massive price tag. Then Data Efficient Image Transformer (DeiT) by Facebook took the same transformer idea and made it practical. They introduced a special DISTIL token that learned from a pre-trained teacher network like ResNet-50. The DISTIL token’s output was matched with the teacher’s predictions, allowing the student transformer to learn faster and with far less data. For the first time, you could train a transformer on something like ImageNet-1k and still reach top-tier results. After that, Vision-Language Models (VLMs) took things even further. They brought vision and language together in a single space. Models like CLIP learned to connect an image and its caption so that both lived in the same embedding space. Once that happened, everything changed - now models could recognize an image just by reading its description, retrieve images from text, or even answer questions about what they “see.” This is the intuition behind the NanoVLM we will build - a compact yet complete vision-language model that shows you how alignment between image and text actually works. Now imagine if someone could learn the core intuition behind all these three ideas and actually build them from scratch. That’s exactly what this 3-in-1 workshop bundle gives you. It has three workshops, each around three hours long, where we’ll build the Vision Transformer, Data Efficient Image Transformer, and NanoVLM step by step. 9 hours of pure coding, explanation, and intuition - where you will learn how these models work and build them yourself. You can join here on: https://lnkd.in/dVV8DTtp
To view or add a comment, sign in
-
-
Google introduces Nested Learning (NL), a new paradigm for continuous learning. NL is a serious attempt to open a new axis in model design: continual, self-modifying systems with structured multi-timescale memory. On the roadmap to AGI, this is a meaningful step on the axis of “sustainable and self-improving learners.”
To view or add a comment, sign in
-
Google Research is proposing a fresh take on continual learning: Nested Learning, treating a model not as a single training loop but as many interconnected optimization problems that update at different frequencies. The goal: reduce or even avoid catastrophic forgetting while unifying “architecture” and “optimizer” into one coherent system. What’s interesting: Architecture = optimization levels. Components learn on different time scales (“context flows”), reframing design as stacked levels rather than a fixed network + separate trainer. Continuum Memory System (CMS). Memory isn’t just short vs. long term; it’s a spectrum of modules, each updating at its own rate, promising more stable, scalable retention. “Hope” prototype. A self-modifying recurrent architecture that shows lower perplexity, higher accuracy, and stronger long-context recall (needle-in-a-haystack) than strong baselines, including standard Transformers. Why it matters: If this paradigm holds up, we could see cheaper incremental updates, better on-device adaptation, and safer, continuously improving systems, without retraining wiping out prior knowledge. Read the post: https://lnkd.in/ebiwa3fv #MachineLearning #ContinualLearning #AIResearch #LLMs #NeurIPS2025
To view or add a comment, sign in
-
Three months ago, I started building E8-Kaleidescope-AI Memory. It took four weeks to get from conception to version M16. Now at M25.1 after eight weeks of dedicated coding, it’s become something unexpected: a working prototype of the “agentic” and “self-theorizing” memory systems that Google and other major labs are actively researching. The parallels are striking: **Self-Theorizing**: E8 generates and rates its own hypotheses about novelty and emergence. Google’s “AI co-scientist” does the same thing, using automated feedback to iteratively generate and refine hypotheses in a self-improving cycle. **Introspection**: E8 reasons about its own internal code and structure. Google’s Gemini models use internal thinking processes and can provide thought summaries about their own reasoning. **Self-Modifying Architecture**: E8 refines its own structure as it learns. Google’s November 2025 “Nested Learning” project introduced “Hope,” a self-modifying architecture that optimizes its own memory through self-referential processes. **Emergence & Complex Systems**: E8 is built on principles of emergence and phase transitions. Google researchers are now explicitly analyzing AI capabilities through this same lens of complex systems science. I’m not claiming equivalence with Google’s resources. But I built a functional prototype in eight weeks that explores the same frontier concepts being pursued by billion-dollar research teams. That’s what makes this moment remarkable: modern LLMs have reached the point where a clear vision and deep understanding can prototype cutting-edge ideas that previously required entire specialized teams. The tools have caught up to the imagination. #ArtificialIntelligence #AI #MachineLearning #Innovation #Technology
To view or add a comment, sign in
-
Google just did something wild for small AI models. Here’s the deal 👇 Training small LLMs (like 7B ones) on hard reasoning tasks has always been painful. You’ve got two choices: 1️⃣ Supervised Fine-Tuning (SFT): basically teaches the model to copy examples. It learns to mimic, not think. 2️⃣ Reinforcement Learning (RL): lets the model explore and learn from trial and error — but if it never stumbles on a good answer, it just keeps failing smarter. Both hit walls when the task needs multi-step reasoning — like solving math or coding problems where one wrong move ruins the whole thing. So Google researchers tried something new: Supervised Reinforcement Learning (SRL). Instead of treating each task as one big answer, they broke it down into smaller steps — like teaching the model how to think, not just what to say. Here’s what makes it special: ✅ They reward the model not just for getting the final answer right, but for making good decisions along the way. ✅ Even if it fails, it still learns from each step — kinda like a kid figuring out math by understanding why each move matters. Results? 🔥 +3% better on tough math tests (using just 1,000 examples). +3.7% when combined with RLVR. And it even crushed on software engineering tasks — 74% better than standard fine-tuning methods on SWE-Bench. That’s huge. Because it means small models can learn big ideas — with less data, less compute, and more brainpower.
To view or add a comment, sign in