Beyond Machine Learning: Advantages of Ensemble Models for Interpretable Time Series Forecasting

Beyond Machine Learning: Advantages of Ensemble Models for Interpretable Time Series Forecasting

Time series forecasting continues to be a critical task in many industries, including retail, finance, healthcare, and manufacturing. Traditional forecasting methods have been successful, but advancements in machine learning (ML) have sparked interest in using ML algorithms for time series forecasting. However, the complexity of exogenous events such as a pandemic and inclement weather, can make time series forecasting challenging.

Using a single ML algorithm for time series forecasting may not always provide the best results, particularly when dealing with non-linear data correlations. ML algorithms such as artificial neural networks (ANNs), support vector machines (SVMs), and random forests (RFs) have been utilised for time series forecasting with varying degrees of success. But the issue of interpretability remains, and the accuracy of these algorithms can be affected by the choice of hyperparameters and model complexity.

Why are Ensemble Models Powerful?

Ensemble learning is a technique that combines the predictions of multiple models to improve the accuracy and stability of the forecasts. A popular technique used in ensemble learning for time series forecasting is exponential smoothing (ES), a method that gives more weight to recent observations in the dataset in combination with other Auto-Regressive time series models such as Box Jenkins and Multiple Linear Regression.

ES can be particularly effective for time series that exhibit trend and seasonality, but it may not account for complex relationships, non-linear data correlations and complexity of exogenous events such as pandemics and weather events, which can have a significant impact on time series forecasting. These events can create unexpected patterns and correlations in the data that traditional forecasting methods may not capture.

Neural basis expansion analysis (NBEA) can be effectively used for time series forecasting in such cases. NBEA is a nonparametric method that expands the basic functions in a neural network to capture the underlying structure of the data. This technique can be particularly effective for time series that exhibit complex patterns and nonlinear relationships.

NBEA uses a neural network to learn the weights of a linear combination of ML models. The neural network is trained on a set of data that has been labeled with the correct predictions. Once the neural network is trained, it can be used to generate a weight vector for each ML model. The predictions of the ML models are then weighted according to the weight vector, and the weighted predictions are combined to create a single prediction.

  • Linear regression: This is a simple model that uses a linear combination of the predictions of the ML models.
  • Ridge regression: This is a more complex model that penalises the weights of the ML models, which helps to reduce overfitting.
  • Lasso regression: This is another complex model that penalises the weights of the ML models, but it also shrinks the weights toward zero, which helps to improve interpretability.

An ensemble model that incorporates ES and NBEA with exogenous variables such as weather, local events and promotional events can help increase the accuracy and interpretability of the forecasts, especially when dealing with non-linear data correlations and events. ES can provide a simple and interpretable forecast, NBEA can capture complex patterns and non-linear relationships, and exogenous variables can account for external factors that may impact the time series.

Once an ensemble model has been created, it can be used to make predictions on new data. The ensemble model's predictions are typically more accurate than those of any individual ML model. Additionally, the ensemble model is typically more robust to noise in the data and is less likely to be biased.

In addition to using ensemble models, there are other ways to handle the disadvantages of pure ML models. These models can be further enhanced for accuracy by using post-processing and impact analysis techniques. Post-processing techniques such as exponential smoothing are applied to the predictions of the ML models after they have been made. These techniques can help to improve the accuracy and robustness of the predictions. ES can also help to improve the accuracy of predictions by smoothing out noise in the data.

Real Life Proof and Applications

The robustness of the ensemble models was further validated by the M5 competition. It was a forecasting challenge that aimed to predict the sales of 3049 products across 10 different stores for 28 days. The competition attracted over 5,000 participants from around the world and was a test of the latest ML techniques for time series forecasting. The M5 competition upheld the conclusions of previous M competitions and various other research studies by illustrating that combining forecasts derived from different techniques can improve accuracy, even with simple methods.

The winning participant utilised a straightforward, equally weighted combination of six models, each utilising a different learning approach and training set. Similarly, the second most accurate model employed an equally weighted combination of five models, with each providing a distinct trend estimate, while the third-best method utilised an equally weighted ensemble of 43 neural networks, demonstrating the efficacy of ensemble models.

One of the most interesting aspects identified during the M5 competition was the performance of the pure ML models. In total, there were six submissions that used only ML techniques to forecast the sales. However, all six submissions performed poorly and were not more accurate than the statistical benchmarks.

The poor performance of the pure ML models can be attributed to several factors, including but not limited to:

  1. Rapidly changing daily, weekly and intra-day seasonality, and trends that varied across different products and stores
  2. Not accounting for advanced pre-processing and clustering techniques needed for better pattern detection between similar clusters.
  3. Not accounting for statistical post-processing techniques which will factor in impact of exogenous events, recurring events, rounding and smoothing errors. 

ML models tend to get complex, with many hyperparameters and complex architectures. This complexity made it difficult to interpret the models and understand how they were making their predictions. Statistical benchmarks are usually derived from simple and interpretable methods, such as seasonal autoregression (SARIMA), exponential smoothing and seasonal decomposition.

Although these highlight some limitations of pure ML models for time series forecasting, particularly when dealing with complex data and exogenous events, this does not mean that ML techniques are not useful for time series forecasting. Ensemble models, which combine different techniques including but not limited to exponential smoothing, neural basis expansion analysis, ML, deep-learning and multiple-linear regression, have shown promising results in real-world applications.

Statistical Ensemble Models have been used in real life applications that use timeseries predictions for a variety of purposes such as workforce optimisation, demand forecasting, supply chain management, manufacturing and operations where a volume driver, or a time series data point is predicted at a quarter hour granularity at a location or a SKU level. The time series metric is predicted using an autoregressive model such as multiple linear regression and then post processed for rounding and smoothing for every quarter hour interval using moving average or exponential Smoothing while preserving the day totals. The results were found to be more accurate than the statistical benchmarks of generic regression models. 

To further enhance the accuracy of the predictions, a more elaborate analysis of event impact and seasonality done on historic data and the presence of any markers such as a weather event or a planned promotion is also considered during post processing to add or remove additional impact for the forecasted timeframe. The impact associated is predicted using the historic seasonality of the data around similar timeframes in history, resulting in more accurate models.

This also demonstrates the importance of considering the complexity of the data and the impact of exogenous events when using ML techniques for time series forecasting. Pure ML models may struggle to capture all of the relevant patterns in the data and are often overly complex, leading to poor performance.

Ensemble models can be used for a variety of time series forecasting applications, including:

  • Demand forecasting: Ensemble models with exponential smoothing can be used to predict future demand for products or services. This information can be used to plan production, inventory, and marketing campaigns.
  • Production planning: Ensemble models with exponential smoothing can be used to plan production schedules. This information can be used to ensure that the right amount of product is produced at the right time.
  • Strategic planning: Ensemble models with exponential smoothing can be used to make strategic decisions about the future of a business. This information can be used to identify new opportunities and avoid potential risks.
  • Operations and Execution: These models can be effective in predicting customer traffic and sales in retail organisations thereby helping with workforce management, labor demand for retail operations and execution.


There are several advantages of ensemble models for accurate and interpretable time series forecasting in complex environments.

  • Accuracy: Ensemble models have been shown to be more accurate than traditional ML methods for time series forecasting. This is because ensemble models combine the strengths of multiple forecasting methods, which can help to reduce forecast errors.
  • Robustness: Ensemble models are also more robust than traditional ML methods. This means they are less likely to be affected by outliers or noise in the data.
  • Interpretability: Ensemble models are more interpretable than traditional ML methods. This means that it is easier to understand how the forecast was generated, which can be important for decision-making and AI governance.

Ensemble models that combine different techniques can help overcome various limitations of pure ML models and provide accurate and interpretable forecasts. The overall forecast accuracy can be enhanced for any time series dataset by incorporating exogenous variables, particularly when dealing with complex events such as pandemics, weather, promotion, local events etc. Ensemble learning, which combines different techniques such as ES, NBEA, and exogenous variables, can help to overcome limitations in traditional forecasting methods and provide accurate forecasts that are easier to interpret and explain.


Wang, S., Li, T., Li, J., & Li, B. (2021). TSTNet: A Time Series Transformer Network for Interpretable Financial Forecasting. Proceedings of the International Conference on Learning Representations (ICLR). Retrieved from

Suvarna Krishnan is director of software engineering, Workforce Management AI at Zebra Technologies, where she leads a team of specialist software engineers. She has over 15 years of experience, a master's in computer and information science from the Birla Institute of Technology and Science, Pilani India, and a certificate in big data analytics from Harvard Extension School. She has filed a number of patents and won four inventor awards at Zebra for her contributions to forecasting and the application of blockchain to supply chain management.