Describing the forecasting models in Power View
Microsoft Thursday, May 8, 2014
Power View performs advanced statistical analysis of the data in your line charts to generate forecasts that incorporate trends and seasonal factors. If you want to learn more about these methods and how to customize your forecasts to get the best results, read on!
We've also provided some references at the end of this article if you are interested in the background behind these methods.
In this article
Forecasting in Power View is based on an established suite of methods for time series prediction called exponential smoothing. Over the years many methods have been developed for the analysis of time series, depending on whether the data is strongly seasonal or has no seasonality, how much noise there is in the data, and whether the data contains “surprises” or irregular peaks. The exponential smoothing method has a good track record in both academia and business, and has the advantage that it suppresses noise, or unwanted variation that can distort the model, while efficiently capturing trends.
For Power View in Excel, we provided two versions of exponential smoothing, one for seasonal data (ETS AAA), and one for non-seasonal data (ETS AAN). Power View uses the appropriate model automatically when you start a forecast for your line chart, based on an analysis of the historical data.
The formulas are not output but the general method is widely accepted in academia, and we’ve described the details here:
The seasonal algorithm (ETS AAA) models the time series using an equation that accounts for additive error, additive trend, and additive seasonality. This algorithm is also popularly known as the Holt-Winters algorithm, after the researchers who described the characteristics of the model. The Holt-Winters method is widely used, for example, in predicting and planning demand in businesses.
For forecasting in Power View charts, we made several enhancements to the Holt Winters algorithm to make it more resistant to noise in the data. Specifically, we have made the following changes:
Use of validation window for optimal parameter selection
The classical Holt-Winters method finds the optimal smoothing parameters by minimizing the mean sum of squares of errors for predictions in the training window, looking only at predictions that are one-step ahead. However, the errors you get from looking just one step ahead might not be representative of the errors you get when you want a longer horizon forecast. Therefore, to improve long-range forecasting error, we introduced a validation window, which contains the last few points of the training window. Within this validation window, we do not adjust the state at each and every step, but instead, compute the sum of squares of prediction errors for the window as a whole. This has the effect of dampening variation and preserving trend across the validation window.
State vector correction at the end of training window when data is noisy
The original algorithm (ETS AAA) is a state-space-based forecasting method. Essentially, forecasts are weighted averages of past observations, with recent observations given more weight. A state vector is calculated throughout the training window and is used to compute the training fit. However, when the optimal smoothing parameters in the model are relatively high, the model can become sensitive to outliers. If the outliers appear in the latter part of the training window, this sensitivity is increased, because the most recent observations are weighted more heavily. In essence, an outlier in the wrong place can distort the model, pulling the training fit towards itself. As a result, forecasts can look very strange – for example, the forecast might move in a trend opposite to that in the input time series.
To avoid such distortions, we automatically track variations in the training state. When we detect large variations, we adjust the trend in the time window to more closely match the overall trend of the time series and adjust the forecast values accordingly.
The non-seasonal algorithm (ETS AAN) uses a simpler equation to model the time series, which includes only a term for additive trend and additive error, and does not consider seasonality at all. We assume data values increase or decrease in some way that can be described by a formula, but that the increase or decrease is not cyclical.
No, this is a different tool, and a different (but similar) algorithm.
- Charting and forecasting in Power View does not require a connection to Analysis Services, and instead runs on data stored in Excel, with processing done “behind the scenes” on the Excel Data Model.
- Forecasting on Power View charts does not use the ARTXP or ARIMA algorithms provided by SQL Server Analysis Services. Although those algorithms were developed by Microsoft Research in response to common theories and research on time series forecasting, and solve many of the same problems, they are implemented on a different code base and might yield different results.
We recommend that you experiment with various methods of forecasting and compare results. Typically data scientists will try out many models and evaluate their accuracy before accepting a prediction as valid for business application.
The forecasting chart in Power View lets you test the accuracy of forecasts by predicting past values. How does it work?
- The chart computes a data model based on the historical data preceding the currently selected point in time.
- The chart generates new predictions. Now you can see how your predictions would look like had you been predicting in some past point with only the information up to that point.
You can compare the predicted values against the real values and visually determine whether the model is doing a fair job of predicting.
It is important to understand that in Power View hindcasts are based only on the portion of the data preceding your selection, and not on the complete data. This can affect the quality of the predictions in several ways:
- Hindcasts will not necessarily mirror predictions based on the larger data set, simply because the data is different.
- The farther you go back in time, the less information you have, and the less it will represent your current forecast.
- If there have been any significant changes in the distribution of values throughout your historical data, you should take those into account when assessing the quality of the hindcasts.
In some cases, your timeline might be missing some historical values. Does this pose a problem?
Not usually – the forecasting chart can automatically fill in some values to provide a forecast. If the total number of missing values is less than 40% of the total number of data points, the algorithm will perform linear interpolation prior to performing the forecast.
If more than 40% of your values are missing, try to fill in more data, or perhaps aggregate values into larger time units, to ensure that a more complete data series is available for analysis.
Seasonality (or periodicity) is defined as the number of time steps that it takes for a full data cycle, after which the cycle repeats throughout the time series. For example, if you have sales data that changes over the course of a year but tends to look the same year over year, then the time series has a seasonality (or periodicity) of one year. If your historical data is presented in units of months, and of course 12 months comprise a year, then to get the best results you would set a seasonality value of 12, meaning 12 units makes one complete data cycle.
Seasonality estimation has a strong effect on time series forecasts, and is an important first step when predicting just about everything cyclical, from the weather to sales. Popular time series forecasting algorithms such as Exponential Time Smoothing (ETS) and Auto-Regressive Integrated Moving Average (ARIMA) require seasonality as an input and are often very sensitive to different input values.
Therefore, we’ve added algorithms to help detect seasonality accurately, and provided you with the ability to specify a seasonality value when you know what kind of data cycles to expect. Note that seasonality detection works best when there are several seasons of data.
Detecting seasonality in your data
When analyzing chart data to create a forecast, we used the following methods to detect seasonality and adjust the model appropriately:
- Preprocess data. We remove any linear trend we find, and normalize the time series values by using the Z normalization method – in other words, rather than using the raw values, we transform the input values to get a mean centered on 0 and standard deviation of close to 1.
- Identify candidate periods from the power spectrum. We compute the power spectrum, which is a way of decomposing a signal into its frequencies. From the spectrum, we identify the top 3 peaks and define intervals around each peak by including the adjacent bins in the frequency domain. Further, we modify the interval boundaries to include seasonality values that are common in business applications (such as months or fiscal quarters), assuming the interval boundary is close to the business seasonality. We use these intervals to find the candidate periods from the auto-correlation function.
- Rank candidate periods. Given a set of possible periods based on the autocorrelation function, we find the best one by applying the following procedure:
- We compute the Pearson and Spearman auto-correlation of the preprocessed data.
- We find peaks in the Pearson and Spearman auto-correlation and match those to the intervals found from the power spectrum. We treat the highest value of the auto-correlation function (acf ) in the interval as the peak.
- We test for hills in this interval by fitting two line segments between interval boundaries. If we don’t detect a hill we check the next harmonic for a hill, and if we detect a hill we report the peak as a harmonic peak.
- We arrange the detected peaks in decreasing order of power, using only the top k peaks, where power_peakk / power_peakk+1 > 2.
- Among these top peaks, we pick the one with the highest correlation value.
- Choose which peak is best, from among the power spectrum and auto-correlation.
- If the value from the autocorrelation function is 1, we use the candidate from the power spectrum as the final seasonality.
- If the value from the autocorrelation function is less than 6, we calculate the powers of both candidates normalized by length, and use the value with the higher power.
- In all other cases, we use the candidate derived from the autocorrelation function
- Look for specified or common seasonality values. Additionally, the algorithm includes a term that gives weight to a specified common seasonality value, if selected by the user. For example, if the methods described above identify the best seasonality as 11, 12, or 6, but you specify a yearly (12 month) seasonality that provides better confidence intervals, we apply the 12-month seasonality. In other words, the algorithm gives priority to user selections and to common seasonality values.
Note: Automatic detection of seasonality works best when the cycles being evaluated are not too long. The problem with very long data is that the auto-correlation values tend to have less information value (are more variable depending on adjacent sequences). The Pearson correlation coefficient that we use also tends to work best on linear relationships and in a long data series, linearity can be easily broken. Various heuristics have been proposed by researchers for assessing residuals in long time series, but in general, better results sometimes come from constraining the data series to a smaller range, or identifying probable data cycles by specifying a seasonality value.
Generally, you can get better results if you have at least 3-4 cycles or seasons of data to work with
Specifying a seasonality value manually
If you know your data and domain well, you might be able to provide a hint to the algorithm to help it calculate seasonality more accurately. You do this by looking at the units of time used in the data, and determining how many units make up a complete data cycle.
For example, let’s say you have weekly rainfall data, and you expect that patterns of rainfall repeat on a yearly basis, more or less. In that case, you would set your seasonality value to 52, because 1 week = 1 unit of time, and there are 52 units of time in a data cycle.
When you specify a seasonality value, the number is incorporated in the formula that calculates trends overall, making the forecast better able to compensate for outliers such as extremely rainy days. However, if you have any doubt about what the data cycle is, you should leave it to the algorithm to test all possibilities and determine the most likely one.
Hindcasting and adjusting confidence intervals are two good ways evaluate the quality of the forecast.
Hindcast is one way to verify whether the model is doing a good job If the observed value doesn’t exactly match the predicted value, it does not mean the forecast is all wrong – instead, consider both the amount of variation and the direction of the trend line. Predictions are a matter of probability and estimation, so if the predicted value is fairly close to but not exactly the same as the real value, it could be a better indicator of prediction quality than if the value exactly matched the real result. In general, when a model too closely mirrors the values and trends within the input dataset, it might be overfitted, meaning it likely won’t provide good predictions on new data.
Another way to assess accuracy is to view the probability error in the forecast by adjusting the confidence intervals. The shaded area shows you the range of predicted values at different confidence levels. Depending on your domain, you might require that values meet a very high confidence interval, or that possible predictions fall within a standard deviation of 0.05. However, in other cases, variations of plus or minus 30% might represent plausible scenarios.
You are the best judge of how reliable the input data is, and what the real range of possible predictions might be.
If you are interested in learning more about time series models and the methods that have been developed over the years for forecasting, we recommend the following resources.
Box and Jenkins
This is the classic book on time series forecasting. Here you will find a detailed description of the many different types of problems posed by time series data, along with proposed solutions and formulas used by the researchers. Much later work in this field is based on Box-Jenkins and their work remains pertinent and valuable.
The Holt-Winters method is a specific implementation of exponential smoothing that is widely used in business and now has many variants. To get an idea of the arc of research, see Dr. Gardner’s published papers, Exponential smoothing: State of the Art (Part 1 and Part 2).
Exponential smoothing (Wikipedia)
This Wikipedia article describes the history and basic mechanisms of smoothing as applied to time series.
Overview of time series methods
Still confused about the terms and choices? This article by Microsoft Technical Support Specialist Tyler Chessman provides an overview of the different forecasting options, what they try to accomplish, and how they compare. He also walks through creation of a forecasting model in Excel.