Pretrained time series models have enabled inference-only forecasting systems that produce accurate predictions without task-specific training. However, existing approaches largely focus on univariate forecasting, limiting their applicability in real-world scenarios where multivariate data and covariates play a crucial role. We present Chronos-2, a pretrained model capable of handling univariate, multivariate, and covariate-informed forecasting tasks in a zero-shot manner. Chronos-2 employs a group attention mechanism that facilitates in-context learning (ICL) through efficient information sharing across multiple time series within a group, which may represent sets of related series, variates of a multivariate series, or targets and covariates in a forecasting task. These general capabilities are achieved through training on synthetic datasets that impose diverse multivariate structures on univariate series. Chronos-2 delivers state-of-the-art performance across three comprehensive benchmarks: fev-bench, GIFT-Eval, and Chronos Benchmark II. On fev-bench, which emphasizes multivariate and covariate-informed forecasting, Chronos-2’s universal ICL capabilities lead to substantial improvements over existing models. On tasks involving covariates, it consistently outperforms baselines by a wide margin. Case studies in the energy and retail domains further highlight its practical advantages. The in-context learning capabilities of Chronos-2 establish it as a general-purpose forecasting model that can be used “as is” in real-world forecasting pipelines. Chronos-2 model card: https://amzn.to/3LwRwkp Deploy Chronos-2 on Amazon SageMaker: https://amzn.to/3LzIyD1 Chronos-2 technical report: https://amzn.to/3JrDsIs Chronos GitHub Repository: https://amzn.to/3Vop4E5
Introducing Chronos-2: A Pretrained Model for Multivariate Forecasting
More Relevant Posts
-
A Visual Explanation Of Why Do We Normalize Training Data? - If your model training keeps exploding or learning too slowly, normalization might be the missing fix. The Setup - I built a simple linear regression model to predict house prices from area: - Input (X): 1,000–3,000 square meters - Output (y): $ 50–90 million - Model: y_predict = m * X + c - Everything looked fine, until gradient descent started overflowing. What’s Happening Inside Gradient Descent? Let’s break it down - Prediction Error = Actual − Predicted - Example: 70 − (0.02 * 2000 + 10) = 20 million - Cost = Average(Error * Error) - 20² = 400 million => later 1,000,000,000 => OVERFLOW - Gradients: - - dm = −2 * (2000 × 20) = −80,000 - - dc = −2 * 20 = −40 - - The gradient for m is 2,000× larger than for c because X ≈ 2000. Without Normalization: See Image 1: “Narrow Canyon” Cost Surface (zigzag path) - The cost surface becomes a long, stretched valley. - Changing m slightly skyrockets cost. - Changing c barely affects it. - Ratio ≈ 2000:1 in sensitivity. - Gradient descent struggles, needs a tiny learning rate, moves microscopically, zigzags, and fails. The Fix: Normalize: See Image 2: “Round Bowl” Cost Surface (smooth descent) - Normalization: - - X_norm = (X - X.mean()) / X.std() - - y_norm = (y - y.mean()) / y.std() - - Now the new X and y are centered at 0 with std ≈ 1, all features contribute equally. - With Normalization - - X = (X − 1800)/500 => range −1.6 to 1.2 - - y = (y − 70)/15 => range −1.3 to 1.3 - - The cost surface becomes a smooth, round bowl where every direction changes cost proportionally. - - Gradients are balanced, updates stable, and learning is fast. - - Gradient descent now: - - - Learning rate = 0.01 - - - Smooth convergence in ~100 iterations Comparing Results - Without Normalization: - - Gradient ratio (m:c): 2,000 : 1 - - Learning rate: 0.0000001 - - Convergence: Fails - - Cost surface: Narrow canyon - With Normalization: - - Gradient ratio (m:c): ~1 : 1 - - Learning rate: 0.01 - - Convergence: ~100 iterations - - Cost surface: Round bowl Why It Works? - Think of m and c as two knobs controlling cost. - Without normalization: one knob is 2000× more sensitive. - With normalization: both respond equally, leading to smooth, predictable convergence. Key Takeaways - Unequal feature scales => imbalanced gradients => unstable training. - Normalization balances magnitudes => stable, faster learning. - It’s not about "smaller numbers", it’s about "balanced sensitivity". - Without it, you’re steering with one wheel 2,000× sharper than the other. - With it, you glide smoothly to the minimum. Question - Have you ever seen normalization instantly fix a training issue? Share your experience below.
To view or add a comment, sign in
-
-
One-size-fits-all learning is broken. 🎯 DataCamp just changed that with an AI-native experience that adapts as you learn. Not just a chatbot. The lesson reshapes itself around you in real time. 🤖📚 Why it matters: • 🧭 Personalised in real time based on your level, goals, and pace • 🔧 Relevant to your work whether you’re in finance, marketing, engineering, or studying • 🧠 Human first, it feels like learning with a great tutor who knows where you struggle and what to try next What you can try today: • ✅ Introduction to AI and Introduction to SQL are live for Free and Premium learners • 🔁 See how the path shifts when you get something right or wrong, and how practice mirrors real tasks The best learning I’ve had looked like this: fast feedback and targeted practice. The worst was static videos that aged the moment they were recorded. This fixes that. ⚡ For teams, this is a foundation for truly customised learning at scale, aligned to your tools and priorities. If you want a quick walkthrough or a demo, message me. 💬 #AINativeLearning #DataCamp #NewWayToLearn
To view or add a comment, sign in
-
𝗜𝗖𝗖𝗩 𝟮𝟬𝟮𝟱 Machine learning assumes the training data is a representative of the test-time observations; this is often violated as the test-time conditions differ from those of the training data, leading to a covariate shift. One can address this by a large-scale data collection, which requires making assumptions on the test-time observations by inherently baking them into the training data. Another way is to update the model to fit to the test examples or in a test-time adaptation manner, but this requires access to the test-time distribution or a dataset, which is unrealistic. Instead, we consider adapting instantaneously to test-time observations causally in a stream. But with only a single observation, how does one determine the update to align the model parameters to the testing distribution? Instead of modeling the test distribution, we model the distributional shift by the energy-based formulation, where the high energy indicates a large distributional shift and vice versa. 𝗘𝗻𝗲𝗿𝗴𝘆-𝗯𝗮𝘀𝗲𝗱 𝗺𝗼𝗱𝗲𝗹 allows us to reformulate the test-time adaptation process as energy minimization. The energy-based model trained on the source (training) data assigns scalar values (energy) to the regions of the model prediction, indicating the likelihood of error in each region of the output. We call our method Energy-based Test-time Adaptation (𝗘𝗧𝗔!) But how does one train the energy model without access to the testing distribution? We leverage 𝗮𝗱𝘃𝗲𝗿𝘀𝗮𝗿𝗶𝗮𝗹 𝗽𝗲𝗿𝘁𝘂𝗿𝗯𝗮𝘁𝗶𝗼𝗻𝘀 to explore the dataspace, to simulate the predictions under the distribution shift. We demonstrate our method on depth completion, a multimodal 3D reconstruction task. 🕒 When is the 𝗘𝗧𝗔 for more details? 🌺 Poster session 2 in Exhibition hall 1, poster #92, 3:00 pm HST Tuesday, Oct 21 (𝘁𝗼𝗱𝗮𝘆!). Thanks to my amazing co-authors, Younjoon Chung, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James Duncan, and my advisor, Alex Wong! Please feel free to stop by our poster! Patrick Rim and I would be happy to walk through the secret sauce of our method. Really looking forward to connecting, hearing your thoughts, and having some insightful discussions!
To view or add a comment, sign in
-
-
A Day in the Life of a Machine Learning Project Ever wondered what goes on behind the scenes of an ML project? It’s not just about writing algorithms — it’s a mix of creativity, data, and continuous learning. Here’s what a typical day in the life of an ML project looks like 1. Data Discovery: The journey begins with finding and collecting quality data. Because in ML, bad data = bad predictions. 2. Data Cleaning & Preprocessing: Handling missing values, removing duplicates, and normalizing data — the unsung heroes of every successful model. 3. Feature Engineering: We identify patterns, create new variables, and make data more meaningful for the model to understand. 4. Model Selection & Training: This is where the magic happens! Different algorithms are tested, trained, and tuned to find the perfect balance between accuracy and efficiency. 5. Evaluation & Validation: We measure performance using metrics like precision, recall, and F1 score — ensuring the model works beyond the test data. 6. Deployment & Monitoring: Once ready, the model moves from the lab to the real world — continuously monitored for performance, drift, and improvement. The truth is — an ML project is never “done.” It keeps evolving as the data and business goals change. At Keen Solution, we thrive on this process — building ML systems that learn, adapt, and deliver value every single day. #MachineLearning #ArtificialIntelligence #DataScience #AIProjects #MLEngineering #DigitalInnovation #KeenSolution
To view or add a comment, sign in
-
💡 Day 5 of 100: Underfitting vs Overfitting — The Tug of War in ML 🧠 Why This Matters Every Machine Learning journey hits this problem: ➡️ The model either learns too little (underfitting) or too much (overfitting). Understanding how to identify and resolve these issues is the key to creating models that effectively work in the real world. 🔹 Underfitting — The Model That Doesn’t Learn Enough Definition: Underfitting happens when a model is too simple to capture the underlying data pattern. Cause: High bias — The model ignores important relationships in the data. Symptoms: Poor accuracy on training and test data Low variance in predictions Model predictions are too “flat” Example: Using Linear Regression to predict stock prices — It fails to capture sudden fluctuations because it assumes a straight-line trend. 💬 Think of underfitting as a student who didn’t study enough. 🔹 Overfitting — The Model That Learns Too Much Definition: Overfitting happens when the model memorizes the training data instead of learning general patterns. Cause: High variance — The model becomes too sensitive to training data. Symptoms: Excellent training accuracy but poor test accuracy Captures noise and outliers Fails to generalize to unseen data Example: A deep decision tree that fits every small detail in training data, even the irrelevant ones. 💬 Think of overfitting as a student who memorized every example but panics during exams. ⚖️ Visualizing It: Underfitting → Model too simple → High bias, Low variance Just Right → Balanced model → Low bias, Low variance Overfitting → Model too complex → Low bias, High variance 📈 Imagine three curves: Underfitting: A straight line missing key patterns. Just Right: A smooth curve following the trend. Overfitting: A wiggly line hugging every data point. 🔧 How to Fix It If your model is underfitting: ✅ Increase model complexity (use deeper networks or more features) ✅ Reduce regularization strength ✅ Train longer or tune hyperparameters If your model is overfitting: ✅ Use regularization (L1/L2, dropout) ✅ Collect more training data ✅ Apply data augmentation ✅ Use early stopping ✅ Try ensemble methods (Random Forest, Bagging, etc.) 🧩 Real-World Analogy A chef learning recipes: 🍳 Underfitting → Learns one recipe and uses it everywhere. 🍽 Overfitting → Memorizes every spice ratio exactly but fails when using new ingredients. 👨🍳 Balanced → Understands flavor principles and adapts to new dishes easily. That’s your ideal ML model — adaptable, not memorizing. ✨ Key Takeaways ✔ Underfitting → High bias, low variance → Not enough learning ✔ Overfitting → Low bias, high variance → Too much learning ✔ Balance both for optimal generalization #100DaysOfMLxGenAI #MachineLearning #DataScience #AI #Underfitting #Overfitting #MLBasics #LearningJourney #MridulLearnsAI #ML Department of CEA, GLA University, Mathura
To view or add a comment, sign in
-
-
Just wrapped up the Udemy course "The Complete Generative AI for Business Analysis" – and wow, it's a game-changer for any BA professional! 🚀 Here's what blew my mind and how it's leveling up my workflow: - **Stakeholder Requirements Analysis**: Feed meeting transcripts directly into AI prompts to evaluate and spot missing requirements in minutes. No more endless reviews! - **Effortless Visuals**: Generate polished workflow diagrams on the fly – saving hours of manual diagramming. - **Prompting Mastery**: The course's prompt cards are gold. Mastering prompts means getting precise outputs tailored to your needs. And if you're stuck? AI can even help craft better prompts for you. 🤯 But remember, AI is your ultimate sidekick, not the boss. It's a sophisticated prediction engine – it doesn't "think" like we do. Final decisions? That's on us. We're accountable for the output, not just blaming "ChatGPT said so." This tool is like upgrading from a handsaw to a Sawzall: faster, sharper, and way more efficient. Excited to apply this to deliver even better business analysis! Who's integrating AI into their BA toolkit? Share your tips below. 👇 #GenerativeAI #BusinessAnalysis #Udemy #AIforProfessionals #ProductivityBoost P.S. Did AI help refine this post? Absolutely – but I steered the ship! 😎
To view or add a comment, sign in
-
<<GEN-θ is an embodied foundation model trained on high fidelity raw physical interaction data, not simulation or internet video, and it uses Harmonic Reasoning to think and act simultaneously under real world physics. Scaling experiments show an intelligence threshold around 7B parameters, where smaller models ossify under high data load and larger models keep improving with more pretraining. GEN-θ exhibits clear scaling laws, where downstream post training performance follows a power law in the amount of pre-training data, which lets teams predict how much data and compute are needed for target error levels. The system is trained on more than 270,000 hours of real world manipulation data, growing by about 10,000 hours per week, supported by custom multi cloud infrastructure that can absorb 6.85 years of experience per training day. Large scale ablations over 8 pretraining datasets and 10 long horizon task sets show that data quality and mixture design, measured with validation MSE and reverse KL, are as important as scale, since different mixtures yield models better suited for supervised finetuning or reinforcement learning. >>
To view or add a comment, sign in
-
**Breaking the Data Silos in Federated Learning: The Rise of Adaptive Federated Learning** As the adoption of federated learning (FL) continues to grow, one significant challenge that organizations face is the lack of robustness in learning across diverse data distributions. Traditional FL approaches aim to minimize the impact of non-IID (non-Independent and Identically Distributed) data by using techniques such as client sampling or weighted aggregation. However, these methods often lead to suboptimal results, especially in scenarios with limited available data. A novel approach known as Adaptive Federated Learning (AFL) emerges as a potential solution to this issue. AFL integrates concepts from transfer learning and meta-learning to enable the model to adapt to new data distributions without significant degradation in performance. By leveraging a pre-trained model and updating it based on new data, AFL facilitates faster convergence and increased robustness. The key takeaway is that Adaptive Federated Learning (AFL) offers a more effective solution to the data silo problem in federated learning by enabling models to adapt to diverse data distributions, ultimately resulting in better overall performance and reduced overfitting. This innovation opens up new possibilities for collaborative learning on distributed data and marks a significant step forward in addressing the challenges associated with data heterogeneity in federated settings.
To view or add a comment, sign in
-
📘 Machine Learning System Foundations — Harvard ML Systems (Chapter2) Modern AI isn’t just about building models — it’s about designing systems that learn, scale, and adapt in the real world. Chapter 2 of Harvard’s Machine Learning Systems lays down the pillars of what makes an ML system reliable, efficient, and deployable beyond the research notebook. This chapter explains the full lifecycle of a machine learning system — from data pipelines and training workflows to deployment and continuous improvement. It highlights why ML in production requires software engineering rigor, data engineering discipline, and ML theory combined. 🧠 Key Concepts Covered ✅ What defines a machine learning system vs a standalone model ✅ System architecture for ML applications (data → model → serving) ✅ Data collection, storage, and preprocessing for scalable pipelines ✅ Model training, evaluation, and versioning workflows ✅ Deployment patterns — batch vs real-time inference ✅ Monitoring drift, performance degradation & feedback loops ✅ Automation & MLOps principles for continuous learning systems 💡 Why This Matters The chapter emphasizes that strong ML systems don’t happen by accident — they are engineered. For students and practitioners, this means: Learn to think in systems, not models Build data-first pipelines, not only notebooks Prioritize robustness, reproducibility, and monitoring Measure success not by accuracy alone, but by real-world impact Companies hire ML engineers who can take models to production, maintain them, and evolve them — and this chapter explains the fundamentals behind that journey. 🎯 Who Should Read It ML & Data Science Students ML Engineers & MLOps Practitioners Software Engineers transitioning into AI Anyone looking to build production-grade AI systems #MachineLearning #MLOps #AIEngineering #DataEngineering #Harvard #MLSystems #AIinProduction #ModelDeployment #DataPipelines #MLInfrastructure #ContinuousLearning #TechEducation
To view or add a comment, sign in
Enthusiastic about AI/ML driven innovation and digital transformation
1wForecasting improvements has solid positive impact on businesses The c2 presents a revolution for many industries worldwide https://medium.com/@atabarezz/foundation-models-now-own-the-future-of-forecasting-037e55aa640d?sk=765b9e4c3522bced142f6947a1c81250