Evaluating The Accuracy Of Economic Models

Explore top LinkedIn content from expert professionals.

Summary

Evaluating the accuracy of economic models involves assessing how well these models predict economic outcomes or reflect real-world patterns. This process is crucial for identifying flaws, improving forecast reliability, and ensuring informed decision-making in fields like finance, policymaking, and business planning.

  • Test with fresh data: Avoid over-relying on data within the model's training period and validate outcomes with new, unseen data to ensure genuine predictive capability.
  • Address biases: Differentiate between forecast accuracy (how close predictions are to reality) and bias (consistent over- or under-estimation) to avoid skewed results.
  • Focus on validation: Conduct experiments and use diverse datasets with robust metrics to verify the true performance and reliability of economic models.
Summarized by AI based on LinkedIn member posts
  • View profile for Alejandro Lopez-Lira

    Assistant Professor of Finance at University of Florida - Warrington College of Business

    4,236 followers

    🚨 New research: “The Memorization Problem—Can We Trust LLMs’ Economic Forecasts?” with Yuehua Tang, Mingyin Zhu Link: https://lnkd.in/dd6tGxRP Large‑language models are astonishing, but our new paper shows they also come with a hidden trap: perfect photographic memory of much of the economic data they were trained on. What we found • GPT‑4o recalls the exact S&P 500 close for many dates before its October 2023 training cut‑off—with < 1 % average error. After the cut‑off? Errors explode. • 99 % directional accuracy for unemployment and 10‑year Treasury yields inside the training window; barely a coin‑flip outside. • Mask a firm’s name in an earnings call and the model still identifies Ethan Allen (ETH), Q1 2018. Masking ≠ safety. • Fake “please ignore post‑2010 data” prompts? The model still sneaks in future knowledge. Why it matters • In‑sample “AI beats the market” back‑tests may just be history regurgitated, not genuine foresight. • Regulators, asset managers and researchers need post‑cut‑off data—or temporally consistent models—to evaluate LLM‑driven strategies. Takeaways for finance & econ pros 1️⃣ Test your LLMs only on periods outside their training horizon. 2️⃣ Document and share prompts; subtle wording can leak future info. 3️⃣ Treat masking and date‑shifting as, at best, partial fixes. Read the full paper & replication code: 👉 [SSRN link] Feedback, replications and critiques welcome—let’s keep the conversation (and our models) honest. Alejandro Lopez‑Lira, Yuehua Tang & Mingyin Zhu #AI #Finance #MachineLearning #FinTwit #EconResearch #LLM #ModelRisk

  • View profile for Abhyuday Desai, Ph.D.

    AI Innovator | Founder & CEO of Ready Tensor | 20+ Years in Data Science & AI |

    15,946 followers

    We are happy to share the results of our exhaustive benchmarking study on forecasting models, where we assessed 87 models across 24 varied datasets. This project aimed to evaluate the performance of univariate forecasting models ranging from naive baselines to sophisticated neural networks, using a comprehensive set of metrics such as RMSE, RMSSE, MAE, MASE, sMAPE, WAPE, and R-squared. The 24 datasets contained a wide range of frequencies, including hourly (4 datasets), daily (5), weekly (2), and monthly (4), quarterly (2), yearly (3). Additionally, there are 4 synthetic datasets without a specific frequency. Some of the datasets also contain covariates (exogenous features) of static, past, and/or future nature. For each model, we aimed to identify hyperparameters that were effective on a global level, across all datasets. Dataset specific hyperpameter tuning for each model was not performed due to budget constraints on this project. We use a simple train/test split along the temporal dimension, ensuring models are trained on historical data and assessed on unseen future data. The attached chart shows a heatmap of the average RMSSE scores for each model, grouped by dataset frequency. The results are filtered to 43 models for brevity, excluding noticeably inferior models and redundant implementations. RMSSE is a scaled version of RMSE, where a model's RMSE score is divided by the RMSE of a naive model. With RMSSE, the lower the score, the better the model's performance. A score of 1.0 indicates performance on par with the naive baseline. Key Findings: - Machine-Learning Dominance: Extra trees and random forest models demonstrate the best overall performance. - Neural Network Success: Variational Encoder, PatchTST, and MLP emerged as top neural network models, with Variational Encoder showing the best results, notably including pretraining on synthetic data. - Efficacy of Simplicity: DLinear and Ridge regression models show strong performance, highlighting efficiency in specific contexts. - Statistical Models' Relevance: TBATS stands out among statistical models for its forecasting accuracy. - Yearly Datasets Insight: On yearly datasets, none of the advanced models surpassed the performance of the naive mean model, highlighting the difficulty of forecasting with datasets that lack conspicuous seasonal patterns. - Pretraining Advantage: The improvement in models like Variational Encoder and NBeats through pretraining on synthetic data suggests a promising avenue for enhancing neural networks' forecasting abilities. All models and datasets are open-source. For a detailed examination of models, datasets, and scores, visit https://lnkd.in/d6mMSudJ. Registration is free, requiring only your email. Our platform is open to anyone interested in benchmarking their models. Any feedback or questions are welcome. Let's raise the state of the art in forecasting!

  • View profile for Marcia D Williams

    Optimizing Supply Chain-Finance Planning (S&OP/ IBP) at Large Fast-Growing CPGs for GREATER Profits with Automation in Excel, Power BI, and Machine Learning | Supply Chain Consultant | Educator | Author | Speaker |

    97,163 followers

    Because forecast accuracy is NOT the same as forecast bias ... This infographic compares forecast accuracy % and forecast bias %: ✅ Objective 👉 Forecast Accuracy %: measures how close the forecast is to the actual values 👉 Forecast Bias %: measures the tendency to over-forecast or under-forecast ✅ Focus 👉 Forecast Accuracy %: on minimizing the magnitude of errors 👉 Forecast Bias %: on the direction of the error (too high or too low) ✅ Ideal Value 👉 Forecast Accuracy %: 100% (perfect accuracy) 👉 Forecast Bias %: 0% (no bias) ✅ Calculation Example 👉 Forecast Accuracy %: Forecast Accuracy % = 1 – Abs (Sales – Forecast)/ Sales 👉 Forecast Bias %: Forecast Bias % = (Sales/ Forecast) - 1 ✅ Timeframe 👉 Forecast Accuracy %: can be short-term (daily, weekly) or long- term (monthly, quarterly) 👉 Forecast Bias %: needs multiple periods to detect patterns of over- or under-forecasting ✅ Interpretation 👉 Forecast Accuracy %: for example, forecast accuracy of 85% means the forecast was close to the actual demand 👉 Forecast Bias %: positive bias indicates the forecast consistently overestimated demand; negative bias indicates underestimation ✅ Common Metrics 👉 Forecast Accuracy %: MAPE (Mean Average Percentage Error), WMAPE (preferred; Weighted Mean Average Percentage Error), MAE (Mean Absolute Error) 👉 Forecast Bias %: Mean Forecast Bias, Tracking Signal ✅ Good levels 👉 Forecast Accuracy %: over 85%, but it can be around 70% in highly unpredictable environments 👉 Forecast Bias %: tracking signal between -4 and +4 ✅ Business Impact 👉 Forecast Accuracy %: low accuracy leads to stockouts or excess inventory, directly affecting service levels and inventory costs 👉 Forecast Bias %: persistent bias leads to consistent overstocking or stockouts, impacting customer satisfaction and inventory planning Any other aspects to add to the comparison? #supplychain #salesandoperationsplanning #integratedbusinessplanning #procurement

  • View profile for John Wallace

    Founder & CEO at LiftLab | Data-driven Innovator | Passionate about Marketing that Drives both Growth and Profitability | Changing the Nature of Measurement

    4,672 followers

    Your modelling team has just built a mixed model that’s 95,74% accurate. Here’s why that’s bad news (or, at the very least, why you should be highly suspicious): In statistical modeling (what we now call data science), there are 2 use cases: 1. Prediction: forecasting specific outcomes 2. Inference: understanding general patterns and their impacts You're exposed to inference whenever you read something in a medical journal such as: every additional cigarette you smoke takes 2 minutes off your life. It’s not predicting your lifespan, it’s generalizing about what happens if/when you smoke. The mixed models we're using in marketing kind of straddle this a little bit, but what the statistics and back testing in marketing are really talking about is overall model fit. We're trying to make an inference about the true impact of specific marketing channels like Connected TV (CTV) or paid search. A high accuracy percentage of 95.74 % can be a misleading false security because you might not know how well you’ve done on your media plan. For example, the model might achieve high overall accuracy by: - Overestimating the impact of paid search - Underestimating the impact of CTV But these 2 might cancel each other out so that you don’t see it in a 5% error reading, essentially skewing the result to reach that impressive accuracy number. Just because the model looks good mathematically doesn't mean it's representing the true performance of different marketing channels. You might be off by a factor of 5 or 10 for one of the paid channels. At LiftLab, we isolate data sets or portions of the data set that are suspicious so that we can tell advertisers: here are some parts of your media plan with low signal data; here's where there’s some error in these models. Here’s what we recommend: 1. Carefully examine the quality of data going into the model.  2. Be willing to acknowledge that the model might be off by a significant margin for specific channels. 3. Run additional experiments to validate and improve the model. 4. Create new, more reliable data sets that have more signal to test and refine the model. Takeaway: You need to distinguish between an error in the model and an error in the marketing channel you’re studying. Be skeptical of high percentages in models. Instead, focus on dissecting the model so you can better understand the real-world impacts of different marketing channels. 

Explore categories