Modeling something like time series goes past just throwing features in a model. In the world of time series data, each observation is associated with a specific time point, and part of our goal is to harness the power of temporal dependencies. Enter autoregression and lagging - concepts that taps into the correlation between current and past observations to make forecasts. At its core, autoregression involves modeling a time series as a function of its previous values. The current value relies on its historical counterparts. To dive a bit deeper, we use lagged values as features to predict the next data point. For instance, in a simple autoregressive model of order 1 (AR(1)), we predict the current value based on the previous value multiplied by a coefficient. The coefficient determines the impact of the past value on the present one only one time period previous. One popular approach that can be used in conjunction with autoregression is the ARIMA (AutoRegressive Integrated Moving Average) model. ARIMA is a powerful time series forecasting method that incorporates autoregression, differencing, and moving average components. It's particularly effective for data with trends and seasonality. ARIMA can be fine-tuned with parameters like the order of autoregression, differencing, and moving average to achieve accurate predictions. When I was building ARIMAs for econometric time series forecasting, in addition to autoregression where you're lagging the whole model, I was also taught to lag the individual economic variables. If I was building a model for energy consumption of residential homes, the number of housing permits each month would be a relevant variable. Although, if there’s a ton of housing permits given in January, you won’t see the actual effect of that until later when the houses are built and people are actually consuming energy! That variable needed to be lagged by several months. Another innovative strategy to enhance time series forecasting is the use of neural networks, particularly Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks. RNNs and LSTMs are designed to handle sequential data like time series. They can learn complex patterns and long-term dependencies within the data, making them powerful tools for autoregressive forecasting. Neural networks are fed with past time steps as inputs to predict future values effectively. In addition to autoregression in neural networks, I also used lagging there too! When I built an hourly model to forecast electric energy consumption, I actually built 24 individual models, one for each hour, and each hour lagged on the previous one. The energy consumption and weather of the previous hour was very important in predicting what would happen in the next forecasting period. (this model was actually used for determining where they should shift electricity during peak load times). Happy forecasting!
Best Practices In Economic Forecasting Models
Explore top LinkedIn content from expert professionals.
Summary
Economic forecasting models use historical and current data to predict future economic trends, such as inflation, energy consumption, or market behavior. Adopting best practices ensures these models remain accurate and reliable, especially in the face of complex data and evolving techniques.
- Incorporate time-based patterns: Use methods like autoregression and lagging to account for historical data, enabling the model to recognize seasonal trends and temporal dependencies accurately.
- Experiment with advanced techniques: Consider exploring neural networks like LSTMs or tree-based models such as XGBoost, which are effective for managing large datasets and identifying nuanced patterns.
- Leverage diverse data sources: Combine structured datasets with insights from unstructured sources like news or social media to produce more comprehensive and accurate forecasts.
-
-
We are happy to share the results of our exhaustive benchmarking study on forecasting models, where we assessed 87 models across 24 varied datasets. This project aimed to evaluate the performance of univariate forecasting models ranging from naive baselines to sophisticated neural networks, using a comprehensive set of metrics such as RMSE, RMSSE, MAE, MASE, sMAPE, WAPE, and R-squared. The 24 datasets contained a wide range of frequencies, including hourly (4 datasets), daily (5), weekly (2), and monthly (4), quarterly (2), yearly (3). Additionally, there are 4 synthetic datasets without a specific frequency. Some of the datasets also contain covariates (exogenous features) of static, past, and/or future nature. For each model, we aimed to identify hyperparameters that were effective on a global level, across all datasets. Dataset specific hyperpameter tuning for each model was not performed due to budget constraints on this project. We use a simple train/test split along the temporal dimension, ensuring models are trained on historical data and assessed on unseen future data. The attached chart shows a heatmap of the average RMSSE scores for each model, grouped by dataset frequency. The results are filtered to 43 models for brevity, excluding noticeably inferior models and redundant implementations. RMSSE is a scaled version of RMSE, where a model's RMSE score is divided by the RMSE of a naive model. With RMSSE, the lower the score, the better the model's performance. A score of 1.0 indicates performance on par with the naive baseline. Key Findings: - Machine-Learning Dominance: Extra trees and random forest models demonstrate the best overall performance. - Neural Network Success: Variational Encoder, PatchTST, and MLP emerged as top neural network models, with Variational Encoder showing the best results, notably including pretraining on synthetic data. - Efficacy of Simplicity: DLinear and Ridge regression models show strong performance, highlighting efficiency in specific contexts. - Statistical Models' Relevance: TBATS stands out among statistical models for its forecasting accuracy. - Yearly Datasets Insight: On yearly datasets, none of the advanced models surpassed the performance of the naive mean model, highlighting the difficulty of forecasting with datasets that lack conspicuous seasonal patterns. - Pretraining Advantage: The improvement in models like Variational Encoder and NBeats through pretraining on synthetic data suggests a promising avenue for enhancing neural networks' forecasting abilities. All models and datasets are open-source. For a detailed examination of models, datasets, and scores, visit https://lnkd.in/d6mMSudJ. Registration is free, requiring only your email. Our platform is open to anyone interested in benchmarking their models. Any feedback or questions are welcome. Let's raise the state of the art in forecasting!