Potential pitfalls with hindcasting as a proving ground for computer models, Part 3
Hindcasting is a method to validate the predictive ability of computer models. In Part 3, we will look at why even rigorous hindcasting may fail due to the world reacting unpredictably to the results.
Hindcasting is one method to test the ability of a computer model to forecast the future.
Ideally, for a credible hindcasting check that tests forecasting ability you would require a period of time where you have real-world, observed data of the system you are modeling. You divide this time period into two: a “training data” period and a “testing data” period. The training data is used to tune and refine the computer model. The testing data is used to test the accuracy of the resultant model. The application of the computer model to the testing data is called hindcasting—basically, a simulation of forecasting except with the benefit of actually having the past data to do the check.
In Part 1 of this series, we examined the concept of blinded experiments as applied to hindcasting runs. In Part 2, we explored how hindcasting proof as a proxy for forecasting proof may break down when the system that is being modeled is complex, chaotic, and ever-changing. In Part 3, let’s take a look at when hindcasting validation breaks down when used for forecasting because the world is not blinded to the results.
Potential Pitfall: What if the world is not blinded to the results of the computer model?
During the creation of the computer model, researchers would tune the model using a series of adjustable parameters. The tuning is done using the training data set. By comparing the real-world data to the model’s predictions, a researcher can tune the adjustable parameters so that the model fits the observed data. After the hindcasting check is done with the testing data set, then the model is validated and can be used for forecasting the future, within some degree of credibility. Using the model as a forecasting tool depends on the model conditions staying reasonably constant in the future.
One major condition that may change after the model is released is the reaction of the public. If you have a hindcasted model that involves humans in the system, the model may actually perturb the system you are trying to predict when it becomes publicly known.
Say for example it is possible to scientifically create a predictive model of stock market returns in a computer (this is clearly impossible, but let's just roll with the example). A researcher cracks the code and has trained a market predictor using training data, and then uses hindcasting to evaluate the predictive performance. The hindcasting demonstrates near perfect accuracy in the testing data time period. “Yes! I’m rich!” goes the researcher. He then logically goes out and partners with a hedge fund, since his predictive model is best leveraged with a business entity that can wield a huge amount of capital. For the next month, the researcher and the hedge fund earn a lot of money, as they know when to buy and sell. That is, until the success gets noticed. Everyone in the game sees what the hedge fund is doing, and, through corporate espionage, starts emulating its trades. Soon, the computer model can’t predict anything because it has actually changed reality itself. In other words, the world was not blinded to the results of the predictive computer model, screwing up the forecast.
Another example is with COVID-19 computer modeling. COVID forecasts are an attempt at predicting the severity of the disease spread so as to inform health authorities how and when to deploy mitigation strategies. The modeling is usually released to the public. But what if the public reacts to the model results? Perhaps many people are spooked by the modeling and double down on personal efforts to contain the spread. Perhaps a deeply skeptical sector of the population scoffs at the model results and rebels against mitigation measures. In either case, it is absolutely possible that the publication of the computer model prediction actually impacts the public and consequently skews the future outcome, thus invalidating the computer model.
These scenarios where humans are inherently part of the modeled system may be a fatal flaw for a computer model that relies on hindcasting for validation. By publicizing an attempt at future predictions, the computer model itself may actually effect changes that it did not predict. In such a situation, if a correct forecast is considered scientific proof, these computer models are unfalsifiable! And it is through no fault of the scientists!
---
All of this is not to say that hindcasting is useless. In many cases it can be quite useful. But there are many potential problems that may arise, as outlined in this three-part series.
The credibility of hindcasting is not binary, that is, hindcasting is not a silver bullet proof for computer models. Hindcasting validation is on a spectrum of credibility.
It seems to me that there needs to be a reasonably straightforward way to evaluate a computer model's ability to predict the future. Anybody can cook up a computer model and say that it “works”. It is quite another thing to credibly demonstrate its accuracy.
Perhaps I will end by commenting on one of my favorite books of all time, which happens to be relevant to this topic: Isaac Asimov’s Foundation. In Asimov’s futuristic novel, Dr. Hari Seldon created a new branch of mathematics called psychohistory, which purported to calculate the future of humanity out to tens of thousands of years. Seldon predicted the fall of the Galactic Empire, followed by a brutal, chaotic Dark Age that would last 30,000 years. The creation of a Foundation consisting of a dedicated group of Seldon’s followers would shorten the chaos to a mere 1,000 years. Basically, Hari Seldon created a computer model to predict the future.
The funny thing is, Seldon’s computer model ended up being not that predictive. After a few hundred years events deviated significantly from the predictions (largely due to a psychic mutant called the Mule, but also for other reasons). Perhaps this is because psychohistory was not properly validated with hindcasting. For example, how is it possible for Dr. Seldon to have predicted out to 30,000 years when the available hindcasting time period was the lifespan of the Galactic Empire, which was only 12,067 years old at the time? Also, unpredictable changes in the conditions of the computer model would occur over time (such as the Mule)—how could he have worked that into his model? Finally, Hari Seldon was galactically famous, as was his psychohistory theory. Public knowledge of psychohistory would impact the results in unpredictable ways because humans would react to the prognostications, thus rendering psychohistory an unfalsifiable scientific theory. No wonder psychohistory failed as a computer model!
Hindcasting is a useful tool, but tools can be misused. I hope that this three-part series gave you some insight into how to evaluate the credibility of a claim that a given computer model is a digital crystal ball that can tell the future—which really is a rather large claim to be making.
— — —
Check out Part 1 and Part 2 of this series on computer model hindcasting, and please offer constructive comments below.