Category Validation

Data from the First Validation of the QJM

Christopher A. Lawrence
May 26, 2026

I do state in War by Numbers that there were three validations of the QJM/TNDM although the first was not published. That is not entirely correct. It was not published as a stand alone validation, but significant parts of it was published. The actual engagements were all published as part of the Combat Data Subscription Service. It was eight volumes, with the first volume published in 1975. In there it listed all the engagements used by the QJM. For example (page 6):

Now, the actual results of these test runs was published in his 1977 book Numbers, Predictions and War (NPW). As Trevor Dupuy specifically notes on page 58 of his book:

Appendix B contains a consolidated summary of HERO’s QJM Engagement Data Base. The first 8l examples in this consolidated statistical comparison show the theoretical results and actual results of these 81 World War II engagements (60 in the Development Data Base, 21 in the Validating Data Base; 61 in ltaly, 19 in northwest Europe, and 1 in Russia). In all of these the P/P value reflects an average German combat effectiveness superiority factor of about 23 percent.”

This was done back in the day when Data Base was two words and could exist on paper, vice a computer.

Anyhow, in Appendix B of NPW is “HERO’s QJM Data Base (as of May 1977)”. The specific engagement in question is given as:

No. Year & Date Battle Designation Force X Designation Posture

1. 1943, Sep 9-11 Port of Salerno B 46 ID A

Air %:

Force Y Designation Posture Na Nd S/S W W P/P

G 16 PzD PD 12,917 4,250 1.83 0 22 0.73

% cas/day

Surp P/P PR/PR CEV x y I I SE SE

1.5 1.10 0.87 0.79 3.51 0.94 7.4 2.3 1.02 3.85

I left out the subscripts. But one can see Appendix B (pages 234-235) for these details.

Now, this does not still actually do a direct validation in that it compares model results to actual historical combat results, but one can see the data they used for their inputs and what the outcomes were of these engagements.

Dupuy's Theory of Combat Modeling, Simulation & Wargaming TNDM Validation War by Numbers

Two books

Christopher A. Lawrence
December 31, 2025

These two books are my two analytical books. Both quantitative in approach. Notice the use of the word “Understanding” in both titles.

American’s Modern Wars cover our analysis of insurgencies and counterinsurgencies based upon an analysis of 89 post-WWII cases. There has been very little quantitative analysis of insurgencies. This is the most extensive effort I am aware of. We were blessed with budget and a staff that at one point included ten people. It is amazing what you can do when you have manpower (read $$$).

War by Numbers is our analysis of conventional warfare. It was built from a series of studies we did over the years for the DOD and other contractors. Probably the most extensive qualitative analysis of aspects of conventional war that has been done in the last few decades. Again, helps to have budget.

These are my two “theoretical” books. I am halfway through a book called More War by Numbers. I have stopped work on it to concentrate on other tasks. May get back to in 2027.

The analysis for America’s Modern Wars was based upon 89 post-WWII insurgencies, interventions and peacekeeping operations. We did expand the database to well over 100 cases but never went back and re-shot the analysis due to budget cuts. It would be my desire to expand the database up to around 120 cases, update the 20 or so that were on-going (our data collection stopped in 2008). and then re-shoot and expand the analysis. This would be a good time to do this instead of again waiting until we are in another insurgency and yet again chasing our tail. Our track record on these have not been good, we lost Vietnam, we lost Afghanistan and Iraq was touch-and-go for a while. While we are not in the middle of another insurgency is a good time to study and learn about them based upon real world experience (AKA history).

Sorry to get preachy, but I really don’t like losing wars.

Afghanistan Casualty estimation Combat Databases Conventional warfare DuWar Databases Estimating Insurgent Force Size Force Ratios Insurgency & Counterinsurgency Iraq Modeling, Simulation & Wargaming Urban Warfare Validation War by Numbers

Let’s talk about artillery shells

Christopher A. Lawrence
February 27, 2024

Now, when we first did the validation database the Ardennes Campaign Simulation Data Base, one of the fields we had to fill in for each division, and corps, and army was on the tons of ammunition used each day by four types.

This was because the combat models that were supposed to be validated using this database were used in part to determine the number of shells needed for a predicted upcoming war. Back in 1987, when we started this database, it was a war in Europe versus the Soviet Union and the Warsaw Pact.

Now, my history with the Concepts Analysis Agency (CAA), later renamed the Center for Army Analysis (still CAA), goes back to 1973, when it was founded in Bethesda, MD. I was in my junior year in high school, my mother had just been promoted to a school principal and my father had just finished his three-year assignment in the Pentagon. My mother did not want to move again, so, my father found another assignment in the DC area. This was with the newly forming CAA in 1973. He had just happened to have finished a master degree in Systems Analysis from USC, so was nominally qualified.

My father was working over at manpower in the Pentagon, under Col. John Brinkerhoff. They all reported to General Donn Starry (who I did have the pleasure of meeting). Those two transferred together over the CAA, with Col. (Dr.) John Brinkerhoff taking over a division with my father as his assistant. I therefore started hearing stories about CAA combat models in 1973 based upon my father’s hands-on experience. One of the stories he told was that the model tended to fire the longest-range weapons first as units were closing. This made the 8″ Howitzer very valuable. In fact, so valuable, that the best wargaming strategy was build an army of 8″ Howitzers and destroy the Warsaw Pact before they could ever get into engagement range. Obviously, there were a couple of flaws in that wargame.

But, the suite of models, some of which are still in use today, was used to determine the ammunition requirements for the U.S. Army. Therefore, a validation database needed to address these issues. The same fields also existed the Kursk Data Base (1993-1996), which ended up never being used to validate a combat model. It was used to create a big-ass book.

Anyhow, CAA combat models did determine our ammunition requirements until the end of the cold war (22 or 25 or 26 December 1991 when the Soviet Union fell). They were also used to determine the requirements for the 1991 Gulf War. According to the story I heard in a meeting, CAA provided the Army general staff with the requirements for the Gulf War. The general staff doubled the figures CAA gave and then we stacked every dock in the Gulf with ammunition. Luckily none of Hussian’s missiles hit those docks. At the end of the war, it turns out we shipped at least ten times the ammunition we needed. As this was old dumb munitions dating back to World War II, it was cheaper to destroy them there then ship them back, which is what we did. Don’t have a count of what was destroyed in the Gulf, but guessing it was millions of rounds.

After that, I do not know what OSD PA&E or CAA or the U.S. Army did to determine ammunition requirements. We no longer had a neatly canned scenario like the Fulda Gap. We no longer had a clear enemy. How much ammunition is needed is driven by both the combat model used (which tends to “run hot”) and more significantly, the scenarios used. If all the scenarios used a four-day combat scenario (like the Gulf War) then one will end up with very different needs then if one is planning for a 90-day or 180-day war (or three-year war in the case of Ukraine). I have no idea what scenarios were used, and it is probably classified. But, the end result, is that our production of ammunition over the decades since 1991 has dropped considerably while a lot of our reserves were destroyed in the Gulf.

This, of course, harkens back to a complaint I have made over the years, which is that we tend to focus on the missions and wars we think are most likely now, and not the entire spectrum of wars and conflicts that we can see are possible if one looks wider and deeper into history. Clearly, we were not ready for extended war in Ukraine, and this is not the first time in recent times we have not been properly prepared for certain conflicts. I do discuss this issue in America’s Modern Wars.

P.S. The West is underestimating Ukraine’s artillery needs – Defense One

Ardennes Campaign Simulation Data Base Combat Databases Kursk Data Bases Modeling, Simulation & Wargaming Validation

Dupuy Mentioned in Dispatches – 1

Christopher A. Lawrence
March 5, 2021

A friend just sent me a recent article from the Strategy Page that mentions Trevor Dupuy’s work: Leadership: Meaningful Measures of Military Might

Trevor Dupuy is mentioned four times in the eleventh paragraph of the article:

“One notable practitioner of this was military historian and World War II artillery officer Trevor Dupuy.”
“For example, Trevor Dupuy undertook a closer examination of combat records and found, and documented, that German troops generally outfought their opponents.”
“If it hadn’t been for the research of American historian Trevor Dupuy in the 1970s and 80s, these critical differences might still sit unnoticed in musty archives.”
“Dupuy’s calculations brought forth the reasons why some allied, German, Russian and Japanese divisions were better than others:”

Anyhow, don’t know who the author is, but appreciate the mention. The Strategy Page is run by Jim Dunnigan, Austin Bay, Al Nofi, Dan Masterson and Stephen V. Cole and others.

Lessons of History Net Assessment Validation

Time and the TNDM

Shawn Woodford
January 2, 2020

[The article below is reprinted from December 1996 edition of The International TNDM Newsletter. It was referenced in the recent series of posts addressing the battalion-level validation of Trevor Dupuy’s Tactical Numerical Deterministic Model (TNDM).]

Time and the TNDM
by Christopher A. Lawrence

Combat models are designed to operate within their design parameters, but sometimes we forget what those are. A model can only be expected to perform well in those areas for which it was designed in and those areas where it has been tested (meaning validated). Since most of the combat models used in the US Department of Defense have not been validated, this leaves open the question as to what their parameters might be. In the cue of the TNDM, if the model is not giving a reasonable result, then you must ask, is it because the model is being operated outside of its parameters? The parameters of the model are pretty well deﬁned by the 149 engagements of the QJM Database to which it was validated.

One of the areas where there is a problem with the TNDM is that while the analyst is capable of running a battle over any time period, the model was fundamentally validated to run 1 to 3 days engagements. This means that there should be a reduced conﬁdence in the results of any engagement of less than 24 hours or over three days. The actual number of days used for each engagement in the original QJM data base is shown below:

By comparison, the 75 battalion level engagements that we are using to validate the TNDM for battalion-level engagements occur over the following time periods:

Three of the engagements used in the battalion-level validation are from the QJM database.

We did run sample engagements of 24 hours, 12 hours, 6 hours and 3 hours. The results of the 12-hour run was literally 1/2 the casualties and 1/2 of the advance for the 24-hour run. The same straight dividing effect was true for the 3- and 6-hour runs. For increments less than 24 hours the model just divided the results by the number of hours. As Dave Bongard pointed out to me, there are various lighting choices, including daylight and night, and these could vary the results some if used. But the impact for daylight would be 1.1 additional casualties and the reduction for night is .7 or .8.

The problem is that briefer battles will result in higher casualties per hour than extended battles. Also, in any extended battle, there are intense periods and un-intense periods, with the model giving the average result of those periods. For battles of less than 24 hours, there tends to be only intense periods. Therefore, it should be expected that battles lasting 3 hours should have more than 1/6 the losses of a 24 hours battle. This will be tested during the battalion-level validation.

For battles in excess of one day, there is a table in the TNDM that reduces the overall casualties and advance rate over time to account for fatigue.

Modeling, Simulation & Wargaming TNDM Validation

TDI Friday Read: Battalion-Level Combat Model Validation

Shawn Woodford
November 1, 2019

Today’s Friday Read summarizes a series of posts detailing a validation test of the Tactical Numerical Deterministic Model (TNDM) conducted by TDI in 1996. The test was conducted using a database of 76 historical battalion-level combat engagements ranging from World War I through the post-World War II era. It is provided here as an example of how such testing can be done and how useful it can be, despite skepticism expressed by some in the U.S. operations research and modeling and simulation community.

Validating A Combat Model

Validating A Combat Model (Part II)

Validating A Combat Model (Part III)

Validating A Combat Model (Part IV)

Validating A Combat Model (Part V)

Validating A Combat Model (Part VI)

Validating A Combat Model (Part VII)

Validating A Combat Model (Part VIII)

Validating A Combat Model (Part IX)

Validating A Combat Model (Part X)

Validating A Combat Model (Part XI)

Validating A Combat Model (Part XII)

Validating A Combat Model (Part XIII)

Modeling, Simulation & Wargaming TNDM Validation

Validating A Combat Model (Part XIII)

Shawn Woodford
October 28, 2019

Gun crew from Regimental Headquarters Company, U.S. Army 23rd Infantry Regiment, firing 37mm gun during an advance against German entrenched positions, 1918. [Wikipedia/NARA]

[The article below is reprinted from June 1997 edition of The International TNDM Newsletter.]

The Second Test of the Battalion-Level Validation:
Predicting Casualties Final Scorecard
by Christopher A. Lawrence

While writing the article on the use of armor in the Battalion-Level Operations Database (BLODB), I discovered that l had really not completed my article in the last issue on the results of the second battalion-level validation test of the TNDM, casualty predictions. After modifying the engagements for time and fanaticism. I didn’t publish a ﬁnal “scorecard” of the problem engagements. This became obvious when l needed that scorecard for the article on tanks. So the “scorecards” are published here and are intended to complete the article in the previous issue on predicting casualties.

As you certainly recall, amid the 40 graphs and charts were six charts that showed which engagements were “really off.” They showed this for unmodified engagements and CEV modified engagements. We then modified the results of these engagements by the formula for time and “casualty insensitive” systems, we are now listing which engagements were still “off” after making these adjustments.

Each table lists how far each engagement was off in gross percent of error. For example, if an engagement like North Wood I had 9.6% losses for the attacker, and the model (with CEV incorporated) predicted 20.57%, then this engagement would be recorded as +10 to +25% off. This was done rather than using a ratio, for having the model predict 2% casualties when there was only 1% is not as bad of an error as having the model predicting 20% when there was only 10%. These would be considered errors of the same order of magnitude if a ratio was used. So below are the six tables.

Seven of the World War I battles were modified to account for time. In the case of the attackers we are now getting results with plus or minus 5% in 70% of the cases. In the case of the defenders, we are now getting results of plus or minus 10% in 70% of the cases. As the model doesn’t ﬁt the defender‘s casualties as well as the attacker‘s, I use a different scaling (10% versus 5%) for what is a good ﬁt for the two.

Two cases remain in which the predictions for the attacker are still “really off” (over 10%), while there are six (instead of the previous seven) cases in which the predictions for the defender are “really off” (over 25%).

Seven of the World War II battles were modified to account for “casualty insensitive” systems (all Japanese engagements). Time was not an issue in the World War II engagements because all the battles lasted four hours or more. In the case of the attackers, we are now getting results with plus or minus 5% in almost 75% of the cases. In the case of the defenders, we are now getting results of plus or minus 10% in almost 75% of the cases. We are still maintaining the different scaling (5% versus 10%) for what is a good ﬁt for the two.

Now in only two cases (used to be four cases) are the predictions for the attacker really off (over 10%), while there are still ﬁve cases in which the predictions for the defender are “really off” (over 25%).

Only 13 of the 30 post-World War II engagements were not changed. Two were modified for time, eight were modified for “casualty insensitive” systems, and seven were modified for both conditions.

In the case of the attackers we are now getting results within plus or minus 5% in 60% of the cases. In the case of the defenders, we are now getting results within plus or minus 10% in around 55% of the cases. We are still maintaining the different scaling (5% versus 10%) for what is a good ﬁt for the two.

We have seven cases (used to be eight cases) in which the attacker‘s predictions are “really off” (over 10%), while there are only ﬁve cases (used to be 10) in which the defender‘s casualty predictions are “really off” (over 25%).

Repetitious Conclusion

To repeat some of the statistics from the article in the previous issue, in a slightly different format:

Modeling, Simulation & Wargaming TNDM Validation

Validating A Combat Model (Part XII)

Shawn Woodford
October 23, 2019
6 Comments

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

The Second Test of the TNDM Battalion-Level Validations: Predicting Casualties
by Christopher A. Lawrence

FANATICISM AND CASUALTY INSENSITIVE SYSTEMS:

It was quite clear from looking at the battalion-level data before we did the validation runs that there appeared to be two very different loss patterns, based upon—dare I say it—nationality. See the article in issue 4 of the TNDM Newsletter, “Looking at Casualties Based Upon Nationality Using the BLODB.” While this is clearly the case with the Japanese in WWII, it does appear that other countries were also operating in a manner that produced similar casualty results. So, instead of using the word fanaticism, let’s refer to them as “casualty insensitive” systems. For those who really need a deﬁnition before going forward:

“Casualty Insensitive” System: A social or military system that places a high priority on achieving the objective or fulfilling the mission and o low priority on minimizing casualties. Such systems lend to be “mission obsessive” versus using some form of “cost benefit” method of weighing whether the objective is worth the losses suffered to take it.

EXAMPLES OF CASUALTY INSENSITIVE SYSTEMS:

For the purpose of the database, casualty sensitive systems were deﬁned as the Japanese and all highly motivated communist-led armies. These include:

Japanese Army, WWII
Viet Mihn
Viet Cong
North Vietnamese
Indonesian

We have included the Indonesians in this list even though it was based upon only one example.

In the WWII and post-WWII period, one would expect that the following armies would also be “casualty insensitive”

Soviet Army in WWII
North Korean Army
Communist Chinese Army in Korea
Iranian “Pasdaran“

Data can certainly be found to test these candidates.

One could postulate that the WWI attrition multiplier of 4 that we used also incorporates the 2.5 “casualty insensitive” multiplier. This would imply that there was only a multiplier of 1.6 to account for other considerations, like adjusting to the impact of increased firepower on the battleﬁeld. One could also postulate that certain nations, like Russia, have had “casualty insensitive” systems throughout their last 100 years of history. This could also be tested by looking of battles over time of Russians versus Germans compared to Germans versus British, U.S. or French. One could easily carry this analysis back to the Seven Years’ War. If this was the case, this would establish a clear cultural basis for the “casualty insensitive” multiplier, but to do so would require the TNDM to be validated for periods before 1900. This would all be useful analysis in the future, but is not currently budgeted for.

It was expected that the “casualty insensitive” multiplier of 2.5 derived from the Japanese data would be too high to apply directly to the armies. Much to our surprise, we found that this did not appear to be the case. This partially or wholly explained the under-prediction of the 15 of our 20 significantly under-predicted post-WWII engagements. Time would explain another one. And four were not explained.

The model noticeably underestimated all the engagements under nine hours except Bir Gifgafa I (2 hours). Pearls AFB (4.5) and Wireless Ridge (8 hours). It noticeably under-estimated all the 15 “fanatic” engagements. If the formulations derived from the earlier data were used here (engagements less than 4 hours and fanatic), then there are 17 engagements in which one side is “casualty insensitive” or in which the engagement time is less than 4 hours. Using the above formulations then 17 engagements would have their casualty ﬁgures changed.

The modified percent loss ﬁgures are the CEV predicted percent loss times the factor for “casualty insensitive” systems (for those 15 cases where it applies) and times the formulation for battles less than 4 hours (for those 9 cases where it applies).

Looking at the table at the top of the next page, it would appear that we are on the correct path. But to be safe, on the next page let’s look at the predictive value of the 13 engagements for which we didn’t redefine the attrition multipliers.

The 13 engagements left unchanged:

So, we are deﬁnitely heading in the right direction now. We have identiﬁed two model changes—time and “casualty insensitive.” We have developed preliminary formulations for time and for “casualty insensitive” forces. Unfortunately, the time formulation was based upon seven WWI engagements. The “casualty insensitive” formulation was based upon seven WWII engagements. Let’s use all our data in the ﬁrst validation database here for the moment to come up with ﬁgures with which we can be more comfortable:

The highlighted entries in the table above indicate “casualty insensitive” forces. We are still struggling with the concept that having one side being casualty insensitive increases both sides’ losses equally. We highlighted them in an attempt to ﬁnd any other patterns we were missing. We could not.

Now, there may be a more sophisticated measurement of this other than the brute force method of multiplying both sides by 2.5. This might include different multipliers depending on whether one is the fanatic vs non-fanatic side or different multipliers for attack or defense. First, I cannot ﬁnd any clear indication that there should be a different multiplier for the attacker or defender. A general review of the data conﬁrms that. Therefore, we are saying that the combat relationships between attacker and defender do not change in high intensity or casualty insensitive battles from those experienced in the norm.

What is also clear is that our multiplier of 2.5 appears to be about as good a ﬁt as we can get from a straight multiplier. It does not appear that there is any significant difference between the attrition multiplier for types of “casualty insensitive” systems, whether they are done because of worship of the emperor or because the commissar will shoot slackers. Apparently the mode of ﬁghting is more significant for measuring combat results than how one gets there, although certainly having everyone worship the emperor is probably easier to “administer.”

This still leaves us having to look at whether we should develop a better formulation for time.

Non-fanatic engagement of less than 4 hours:

For fairly obvious reasons, we are still concerned about this formulation for battles of less than one hour, as we have only one example, but until we conduct the second validation, this formulation will remain as is.

Now the extreme cases:

List of all engagements less than 4 hours where one side was fanatic:

It would appear that these formulations of time and “casualty insensitivity” have passed their initial hypothesis formulations tests. We are now willing to make changes to the model based upon this and run the engagements from the second validation data base to test it.

Next: Predicting casualties: Conclusions

Validating A Combat Model (Part XI)

Shawn Woodford
October 21, 2019

Dead Japanese soldiers lie on the sandbar at the mouth of Alligator Creek on Guadalcanal on 21 August 1942 after being killed by U.S. Marines during the Battle of the Tenaru. [Wikipedia]

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

The Second Test of the TNDM Battalion-Level Validations: Predicting Casualties
by Christopher A. Lawrence

SO WHERE WERE WE REALLY OFF? (WWII)

In the ease of the WWII results, we were getting results in the ball park in less than 60% of the cases for the attacker and in less than 50% of the eases in the case of the defenders. We were often significantly too low. Knowing that we were dealing with a number of Japanese engagements (seven), and they clearly fought in a manner that was different from most western European nations, we expected that they would be under-predicting, and some casualty adjustment would be necessary to reflect this.

We also examined whether time was an issue (it was not). The under-predicted battles are listed in the next table

We temporarily defined the Japanese mode of fighting as “fanaticism.” We decided to find a factor for fanaticism by looking at all the battles with the Japanese. They are listed below:

Looking at what multiplier was needed, one notes that .39 times 2.5 = .975 while .34 times 2.5 = .85. This argues for a “fanatic” multiplier of 2.5. The non-fanatic opponent attrition multiplier is also 2.5. There was no indication that both sides should not be affected by the same multiplier.

We had now tentatively identified two “ﬁxes” to the data. l am sure someone will call them “fudges,“ but I am comfortable enough with the logic behind them (especially the fanaticism) that I would dismiss such criticism. It was now time to look at the modern data, and see what would happen if these ﬁxes were applied to it.

SO WHERE WERE WE REALLY OFF? (Post-WWII)

A total of 20 battles were noticeably under-predicted. We examined them to see if there was a pattern in this under-prediction.

Next: “Casualty insensitive” systems

Modeling, Simulation & Wargaming TNDM Validation

Validating A Combat Model (Part X)

Shawn Woodford
October 16, 2019
2 Comments

French Army soldiers recover the dead from trenches during World War I [Library of Congress]

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

The Second Test of the TNDM Battalion-Level Validations: Predicting Casualties
by Christopher A. Lawrence

TIME AND THE TNDM:

Before this validation was even begun, I knew we were going to have a problem with the fact that most of the engagements were well below 24 hours in length. This problem was discussed in depth in “Time and the TNDM,” in Volume l, Number 3 of this newsletter. The TNDM considers the casualties for an engagement of less than 24 hours to be reduced in direct proportion to that time. I postulated that the relationship was geometric and came up with a formulation that used the square root of that fraction (i.e. instead of 12 hours being .5 times casualties. it was now .75 times casualties). Being wedded to this idea, l tested this formulation in all ways and for several days, I really wasn’t getting a better ﬁt. All I really did was multiply all the points so that the predicted average was closer. The top-level statistics were:

TF=Time Factor

I also looked out how the losses matched up by one of three periods (WWI, WWII. and post-WWII). When we used the time factor multiplier for the attackers, the WWI engagements average became too high, and the standard deviation increase, same with WWII, while the post-WWII averages were still too low, but the standard deviations got better. For the defender, we got pretty much the same pattern, except now the WWII battles were under-predicting, but the standard deviation was about the same. It was quite clear that all I had with this time factor was noise.

Like any good chef, my failed experiment went right down the disposal. This formulation died a natural death. But looking by period where the model was doing well, and where it wasn’t doing well is pretty telling. The results were:

Looking at the basic results. I could see that the model was doing just ﬁne in predicting WWI battles, although its standard deviation for the defenders was still poor. It wasn’t doing very well with WWII, and performed quite poorly with modem engagements. This was the exact opposite effect to our test on predicting winners and losers, where the model did best with the post-WWII battles and worst with the WWI battles. Recall that we implemented an attrition multiplier of 4 for the WWI battles. So it was now time to look at each battle, and ﬁgure out where were we really off. In this case. I looked at casualty ﬁgures that were off by a significant order of magnitude. The reason l looked at significant orders of magnitude instead of percent error, is that making a mistake like predicting 2% instead of 1% is not a very big error, whereas predicting 20%, and having the actual casualties 10%, is pretty significant. Both would be off by 100%.

SO WHERE WERE WE REALLY OFF? (WWI)

In the case of the attackers, we were getting a result in the ball park in two-thirds of the cases, and only two cases—N Wood 1 and Chaudun—were really off. Unfortunately, for the defenders we were getting a reasonable result in only 40% of the cases, and the model had a tendency to under-or over-predict.

It is clear that the model understands attacker losses better than defender losses. I suspect this is related to the model having no breakpoint methodology. Also, defender losses may be more variable. I was unable to ﬁnd a satisfactory explanation for the variation. One thing I did notice was that all four battles that were significantly under-predicted on the defender sides were the four shortest WWI battles. Three of these were also noticeably under-predicted for the attacker. Therefore. I looked at all 23 WWI engagements related to time.

Looking back at the issue of time, it became clear the model was clearly under-predicting in battles of less than four hours. I therefore came up with the following time scaling formula:

If time of battle less than four hours, then multiply attrition by (4/(Length of battle in hours)).

What this formula does is make all battles less than four hours equal to a four-hour engagement. This intuitively looks wrong, but one must consider how we deﬁne a battle. A “battle” is deﬁned by the analyst after the fact. The start time is usually determined by when the attack starts (or when the artillery bombardment starts) and end time by when the attack has clearly failed, or the mission has been accomplished, or the ﬁghting has died down. Therefore, a battle is not deﬁned by time, but by resolution.

As such, any battle that only lasts a short time will still have a resolution, and as a result of achieving that resolution there will be considerable combat experience. Therefore, a minimum casualty multiplier of 1/6 must be applied to account for that resolution. We shall see if this is really the case when we run the second validation using the new battles, which have a considerable number of brief engagements. For now, this seems to ﬁt.

As for all the other missed predictions, including the over-predictions, l could not ﬁnd a magic formula that connected them. My suspicion was that the multiplier of x4 would be a little too robust, but even after adjusting for the time equation, this left 14 of the attacker‘s losses under-predicted and six of the defender actions under-predicted. If the model is doing anything, it is under-predicting attacker casualties and over-predicting defender casualties. This would argue for a different multiplier for the attacker than for the defender (higher one for the attacker). We had six cases where the attacker‘s and defenders predictions were both low, nine where they were both high, and eight cases where the attackers prediction was low while the defender’s prediction was high. We had no cases where the attacker’s prediction was high and the defender’s prediction was low. As all these examples were from the western front in 1918, U.S. versus Germans, then the problem could also be that the model is under-predicting the effects of fortifications, or the terrain for the defense. It could also be indicative of a fundamental difference in the period that gave the attackers higher casualty rates than the defenders. This is an issue I would like to explore in more depth, and l may do so after l have more WWI data from the second validation.

Next: “Fanaticism” and casualties