Modeling something like time series goes past just throwing features in a model. In the world of time series data, each observation is associated with a specific time point, and part of our goal is to harness the power of temporal dependencies. Enter autoregression and lagging - concepts that taps into the correlation between current and past observations to make forecasts. At its core, autoregression involves modeling a time series as a function of its previous values. The current value relies on its historical counterparts. To dive a bit deeper, we use lagged values as features to predict the next data point. For instance, in a simple autoregressive model of order 1 (AR(1)), we predict the current value based on the previous value multiplied by a coefficient. The coefficient determines the impact of the past value on the present one only one time period previous. One popular approach that can be used in conjunction with autoregression is the ARIMA (AutoRegressive Integrated Moving Average) model. ARIMA is a powerful time series forecasting method that incorporates autoregression, differencing, and moving average components. It's particularly effective for data with trends and seasonality. ARIMA can be fine-tuned with parameters like the order of autoregression, differencing, and moving average to achieve accurate predictions. When I was building ARIMAs for econometric time series forecasting, in addition to autoregression where you're lagging the whole model, I was also taught to lag the individual economic variables. If I was building a model for energy consumption of residential homes, the number of housing permits each month would be a relevant variable. Although, if there’s a ton of housing permits given in January, you won’t see the actual effect of that until later when the houses are built and people are actually consuming energy! That variable needed to be lagged by several months. Another innovative strategy to enhance time series forecasting is the use of neural networks, particularly Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks. RNNs and LSTMs are designed to handle sequential data like time series. They can learn complex patterns and long-term dependencies within the data, making them powerful tools for autoregressive forecasting. Neural networks are fed with past time steps as inputs to predict future values effectively. In addition to autoregression in neural networks, I also used lagging there too! When I built an hourly model to forecast electric energy consumption, I actually built 24 individual models, one for each hour, and each hour lagged on the previous one. The energy consumption and weather of the previous hour was very important in predicting what would happen in the next forecasting period. (this model was actually used for determining where they should shift electricity during peak load times). Happy forecasting!
Economic Modeling Techniques For Better Forecasts
Explore top LinkedIn content from expert professionals.
Summary
Economic modeling techniques for better forecasts involve using statistical and machine learning models to analyze data trends, patterns, and seasonality to predict future outcomes, especially in areas like finance, business, and energy consumption.
- Understand your data: Identify key patterns like seasonality and trends within your data to provide valuable insights into future forecasts.
- Leverage historical information: Use techniques like autoregression or lagged variables to establish relationships between past and present data points for improved predictions.
- Incorporate machine learning: Enhance forecasting models with advanced tools like ARIMA or neural networks (e.g., RNNs and LSTMs) to capture and analyze complex data patterns more accurately.
-
-
Want to use machine learning for better forecasting? Your models must learn whether seasonality exists in your business and successfully predict it. Here's how. First up, we need a working definition of trend: Patterns that appear at regular intervals (e.g., weekly or monthly). Think of seasonality as a factor that modifies the KPI you are trying to forecast: - Retailers make more sales in November and December. - Customer service receives fewer calls on weekends. - Airlines carry more passengers around holidays. - Website visits are higher in the morning. As always, the key to building a powerful machine learning model is knowledge of the business process. For this post, the business knowledge takes on two forms: 1 - Knowing that seasonality is part of the business process. 2 - Understanding the nature of the seasonality. For this post, we'll assume that seasonality exists and that its nature aligns with the calendar year - for example, the classic seasonality of brick-and-mortar retail (i.e., Black Friday). As with any machine learning model, you must provide the algorithm with enough data so that patterns can be learned. I will cover one aspect of this in a later post, when I discuss lagged features. A powerful way to help ML forecasting models learn seasonality is to provide features that explicitly detail seasonal aspects of the business process. This is a bit abstract, so let's explore the scenario of seasonality manifesting within a calendar year. Let's say you're trying to build an ML forecasting model for a monthly KPI (e.g., sales). Since you are aware that the business process exhibits seasonality within each calendar year, providing the month name as a feature often helps the algorithm learn this seasonality. For example, the resulting ML forecasting model can learn: - Sales are highest in November and December. - Sales are lowest in January and February. - Sales bump in August. However, keep this in mind. Months are categorical data, and you need to handle them correctly in your ML forecasting models. While you can use month numbers instead (e.g., January = 1), I prefer to use month names explicitly. Regardless of whether you use month numbers or month names, be sure to encode the data as needed to ensure that the ML algorithm treats it as categorical. For example, when using Python's scikit-learn library, be sure to use a OneHotEncoder on the month data before training your model. BTW - Millions of professionals now have access to the tools to craft powerful ML forecasting models. Python in Excel is included with M365 subscriptions and provides access to libraries such as scikit-learn and statsmodels. Everything you need to go far beyond Microsoft Excel's forecast worksheet.
-
We are happy to share the results of our exhaustive benchmarking study on forecasting models, where we assessed 87 models across 24 varied datasets. This project aimed to evaluate the performance of univariate forecasting models ranging from naive baselines to sophisticated neural networks, using a comprehensive set of metrics such as RMSE, RMSSE, MAE, MASE, sMAPE, WAPE, and R-squared. The 24 datasets contained a wide range of frequencies, including hourly (4 datasets), daily (5), weekly (2), and monthly (4), quarterly (2), yearly (3). Additionally, there are 4 synthetic datasets without a specific frequency. Some of the datasets also contain covariates (exogenous features) of static, past, and/or future nature. For each model, we aimed to identify hyperparameters that were effective on a global level, across all datasets. Dataset specific hyperpameter tuning for each model was not performed due to budget constraints on this project. We use a simple train/test split along the temporal dimension, ensuring models are trained on historical data and assessed on unseen future data. The attached chart shows a heatmap of the average RMSSE scores for each model, grouped by dataset frequency. The results are filtered to 43 models for brevity, excluding noticeably inferior models and redundant implementations. RMSSE is a scaled version of RMSE, where a model's RMSE score is divided by the RMSE of a naive model. With RMSSE, the lower the score, the better the model's performance. A score of 1.0 indicates performance on par with the naive baseline. Key Findings: - Machine-Learning Dominance: Extra trees and random forest models demonstrate the best overall performance. - Neural Network Success: Variational Encoder, PatchTST, and MLP emerged as top neural network models, with Variational Encoder showing the best results, notably including pretraining on synthetic data. - Efficacy of Simplicity: DLinear and Ridge regression models show strong performance, highlighting efficiency in specific contexts. - Statistical Models' Relevance: TBATS stands out among statistical models for its forecasting accuracy. - Yearly Datasets Insight: On yearly datasets, none of the advanced models surpassed the performance of the naive mean model, highlighting the difficulty of forecasting with datasets that lack conspicuous seasonal patterns. - Pretraining Advantage: The improvement in models like Variational Encoder and NBeats through pretraining on synthetic data suggests a promising avenue for enhancing neural networks' forecasting abilities. All models and datasets are open-source. For a detailed examination of models, datasets, and scores, visit https://lnkd.in/d6mMSudJ. Registration is free, requiring only your email. Our platform is open to anyone interested in benchmarking their models. Any feedback or questions are welcome. Let's raise the state of the art in forecasting!