There have been (only) three other major historical validations done that I am aware of that we were not involved in. They are 1) the validation of the Atlas model to the France 1940 campaign done in the 1970s, 2) the validation of the Vector model using the Golan Heights campaign of 1973, and 3) the validation of SIMNET/JANUS using 73 Easting Data from the 1991 Gulf War. I am not aware of any other major validation efforts done in the last 25 years other than what we have done (there is one face validation done in 2017 that I will discuss in a later post).
I have never seen a validation report for the ATLAS model and never seen a reference to any of its research or data from the France 1940 campaign. I suspect it does not exist. The validation of Vector was only done for unit movement. They did not validate the attrition or combat functions. These were inserted from the actual battle. The validation was done in-house by Vector, Inc. I have seen the reports from that effort but am not aware of any databases or special research used. See Chapter 18 of War by Numbers for more details and also our newsletter in 1996 on the subject: http://www.dupuyinstitute.org/pdf/v1n4.pdf
So, I know of only one useful validation database out there that is not created by us. This is the Battle of 73 Easting. It was created under contract and used for validation of the JTLS (Joint Theater-Level Simulation).
But, the Battle of 73 Easting is a strange one-sided affair. First, it was fought in a sandstorm. Visibility was severely limited. Our modern systems allowed us to see the Iraqis. The Iraqis could not see us. Therefore, it is a very one-sided affair where the U.S. had maybe 6 soldiers killed, 19 wounded ant lost one Bradley fighting vehicle. The Iraqi had been 600-1,000 casualties and dozens of tanks lost to combat (and dozens more lost to aerial bombardment in the days and weeks before the battle). According to Wikipedia they lost 160 tanks and 180 armored personnel carriers. It was a shooting gallery. I did have a phone conversation with some of the people who did the veteran interviews for this effort. They said that this fight devolved to the point that the U.S. troops were trying to fire in front of the Iraqi soldiers to encourage them to surrender. Over 1,300 Iraqis were taken prisoner.
This battle is discussed in the Wikipedia article here: https://en.wikipedia.org/wiki/Battle_of_73_Easting
I did get the validation report on this and it is somewhere in our files (although I have not seen it for years). I do remember one significant aspect of the validation effort, which is that while it indeed got the correct result (all the Iraqi’s were destroyed), it did so having the Americans use four times as much ammunition as they did historically. Does this mean that the models attrition calculation was off by a factor of four?
Anyhow, I gather the database and the validation report are available from the U.S. government. Of course, it is a very odd battle and doing a validation to just one odd one-sided battle runs the danger of the “N=1” problem. Probably best to do validations to multiple battles.
A more recent effort (2017) that included some validation effort is discussed in a report called “Using Combat Adjudication to Aid in Training for Campaign Planning.” I will discuss this in a later blog post.
Now, there are a number of other database out there addressing warfare. For example the Correlates of War (COW) databases (see: COW), databases maintained by Stockholm International Peace Research Institute (SIPRI) (see: SIPRI) and other such efforts. We have never used these but do not think by their nature that they are useful for validating combat models at division, battalion or company level.
It’s fascinating how these models are used but not checked!
Isn’t it. These models are used extensively across the DOD for helping to analyze force structure, weapons effectiveness, logistics requirements, training, etc.
Here’s a supporting view of validation, checking, and so on.
https://fivethirtyeight.com/features/when-we-say-70-percent-it-really-means-70-percent/
Here are a couple of excerpts.
I don’t want to make it sound like we’ve had a rough go of things overall.1 But we do think it’s important that our forecasts are successful on their own terms — that is, in the way that we have always said they should be judged. That’s what our latest project — “How Good Are FiveThirtyEight Forecasts?” — is all about.
That way is principally via calibration. Calibration measures whether, over the long run, events occur about as often as you say they’re going to occur. For instance, of all the events that you forecast as having an 80 percent chance of happening, they should indeed occur about 80 out of 100 times; that’s good calibration. If these events happen only 60 out of 100 times, you have problems — your forecasts aren’t well-calibrated and are overconfident. But it’s just as bad if they occur 98 out of 100 times, in which case your forecasts are underconfident.
and
The catch about calibration is that it takes a fairly large sample size to measure it properly. If you have just 10 events that you say have an 80 percent chance of happening, you could pretty easily have them occur five out of 10 times or 10 out of 10 times as the result of chance alone. Once you get up to dozens or hundreds or thousands of events, these anomalies become much less likely.
Yep, need lots of cases. Large databases with lots of cases.
I think we have made a good start on that.
I think if one understands the principles of warfare, better prognoses and models can be made (even in the face of limited data) than merely relying on the information from individuals who never saw more than the end of their office table.
I’m glad to see the reference to FiveThirtyEight. I think they do a good job of making their methodology transparent, especially for a private sector company. Also that they—unlike the vast majority of other pollsters—represent future outcomes as probabilities. This is key for combat models as well, which we treat all too often as binary “answer machines” (paraphrase of Dr. Paul Davis at RAND).