Military History and Validation of Combat Models

Soldiers from Britain’s Royal Artillery train in a “virtual world” during Exercise Steel Sabre, 2015 [Sgt Si Longworth RLC (Phot)/MOD]

Military History and Validation of Combat Models

A Presentation at MORS Mini-Symposium on Validation, 16 Oct 1990

By Trevor N. Dupuy

In the operations research community there is some confusion as to the respective meanings of the words “validation” and “verification.” My definition of validation is as follows:

“To confirm or prove that the output or outputs of a model are consistent with the real-world functioning or operation of the process, procedure, or activity which the model is intended to represent or replicate.”

In this paper the word “validation” with respect to combat models is assumed to mean assurance that a model realistically and reliably represents the real world of combat. Or, in other words, given a set of inputs which reflect the anticipated forces and weapons in a combat encounter between two opponents under a given set of circumstances, the model is validated if we can demonstrate that its outputs are likely to represent what would actually happen in a real-world encounter between these forces under those circumstances

Thus, in this paper, the word “validation” has nothing to do with the correctness of computer code, or the apparent internal consistency or logic of relationships of model components, or with the soundness of the mathematical relationships or algorithms, or with satisfying the military judgment or experience of one individual.

True validation of combat models is not possible without testing them against modern historical combat experience. And so, in my opinion, a model is validated only when it will consistently replicate a number of military history battle outcomes in terms of: (a) Success-failure; (b) Attrition rates; and (c) Advance rates.

“Why,” you may ask, “use imprecise, doubtful, and outdated history to validate a modem, scientific process? Field tests, experiments, and field exercises can provide data that is often instrumented, and certainly more reliable than any historical data.”

I recognize that military history is imprecise; it is only an approximate, often biased and/or distorted, and frequently inconsistent reflection of what actually happened on historical battlefields. Records are contradictory. I also recognize that there is an element of chance or randomness in human combat which can produce different results in otherwise apparently identical circumstances. I further recognize that history is retrospective, telling us only what has happened in the past. It cannot predict, if only because combat in the future will be fought with different weapons and equipment than were used in historical combat.

Despite these undoubted problems, military history provides more, and more accurate information about the real world of combat, and how human beings behave and perform under varying circumstances of combat, than is possible to derive or compile from arty other source. Despite some discrepancies, patterns are unmistakable and consistent. There is always a logical explanation for any individual deviations from the patterns. Historical examples that are inconsistent, or that are counter-intuitive, must be viewed with suspicion as possibly being poor or false history.

Of course absolute prediction of a future event is practically impossible, although not necessarily so theoretically. Any speculations which we make from tests or experiments must have some basis in terms of projections from past experience.

Training or demonstration exercises, proving ground tests, field experiments, all lack the one most pervasive and most important component of combat: Fear in a lethal environment. There is no way in peacetime, or non-battlefield, exercises, test, or experiments to be sure that the results are consistent with what would have been the behavior or performance of individuals or units or formations facing hostile firepower on a real battlefield.

We know from the writings of the ancients (for instance Sun Tze—pronounced Sun Dzuh—and Thucydides) that have survived to this day that human nature has not changed since the dawn of history. The human factor the way in which humans respond to stimuli or circumstances is the most important basis for speculation and prediction. What about the “scientific” approach of those who insist that we cart have no confidence in the accuracy or reliability of historical data, that it is therefore unscientific, and therefore that it should be ignored? These people insist that only “scientific” data should be used in modeling.

In fact, every model is based upon fundamental assumptions that are intuitive and unprovable. The first step in the creation of a model is a step away from scientific reality in seeking a basis for an unreal representation of a real phenomenon. I have shown that the unreality is perpetuated when we use other imitations of reality as the basis for representing reality. History is less than perfect, but to ignore it, and to use only data that is bound to be wrong, assures that we will not be able to represent human behavior in real combat.

At the risk of repetition, and even of protesting too much, let me assure you that I am well aware of the shortcomings of military history:

The record which is available to us, which is history, only approximately reflects what actually happened. It is incomplete. It is often biased, it is often distorted. Even when it is accurate, it may be reflecting chance rather than normal processes. It is neither precise nor consistent. But, it provides more, and more accurate, information on the real world of battle than is available from the most thoroughly documented field exercises, proving ground less, or laboratory or field experiments.

Military history is imperfect. At best it reflects the actions and interactions of unpredictable human beings. We must always realize that a single historical example can be misleading for either of two reasons: (1) The data may be inaccurate, or (2) The data may be accurate, but untypical.

Nevertheless, history is indispensable. I repeat that the most pervasive characteristic of combat is fear in a lethal environment. For all of its imperfections, military history and only military history represents what happens under the environmental condition of fear.

Unfortunately, and somewhat unfairly, the reported findings of S.L.A. Marshall about human behavior in combat, which he reported in Men Against Fire, have been recently discounted by revisionist historians who assert that he never could have physically performed the research on which the book’s findings were supposedly based. This has raised doubts about Marshall’s assertion that 85% of infantry soldiers didn’t fire their weapons in combat in World War ll. That dramatic and surprising assertion was first challenged in a New Zealand study which found, on the basis of painstaking interviews, that most New Zealanders fired their weapons in combat. Thus, either Americans were different from New Zealanders, or Marshall was wrong. And now American historians have demonstrated that Marshall had had neither the time nor the opportunity to conduct his battlefield interviews which he claimed were the basis for his findings.

I knew Marshall, moderately well. I was fully as aware of his weaknesses as of his strengths. He was not a historian. I deplored the imprecision and lack of documentation in Men Against Fire. But the revisionist historians have underestimated the shrewd journalistic assessment capability of “SLAM” Marshall. His observations may not have been scientifically precise, but they were generally sound, and his assessment has been shared by many American infantry officers whose judgements l also respect. As to the New Zealand study, how many people will, after the war, admit that they didn’t fire their weapons?

Perhaps most important, however, in judging the assessments of SLAM Marshall, is a recent study by a highly-respected British operations research analyst, David Rowland. Using impeccable OR methods Rowland has demonstrated that Marshall’s assessment of the inefficient performance, or non-performance, of most soldiers in combat was essentially correct. An unclassified version of Rowland’s study, “Assessments of Combat Degradation,” appeared in the June 1986 issue of the Royal United Services Institution Journal.

Rowland was led to his investigations by the fact that soldier performance in field training exercises, using the British version of MILES technology, was not consistent with historical experience. Even after allowances for degradation from theoretical proving ground capability of weapons, defensive rifle fire almost invariably stopped any attack in these field trials. But history showed that attacks were often in fact, usually successful. He therefore began a study in which he made both imaginative and scientific use of historical data from over 100 small unit battles in the Boer War and the two World Wars. He demonstrated that when troops are under fire in actual combat, there is an additional degradation of performance by a factor ranging between 10 and 7. A degradation virtually of an order of magnitude! And this, mind you, on top of a comparable built-in degradation to allow for the difference between field conditions and proving ground conditions.

Not only does Rowland‘s study corroborate SLAM Marshall’s observations, it showed conclusively that field exercises, training competitions and demonstrations, give results so different from real battlefield performance as to render them useless for validation purposes.

Which brings us back to military history. For all of the imprecision, internal contradictions, and inaccuracies inherent in historical data, at worst the deviations are generally far less than a factor of 2.0. This is at least four times more reliable than field test or exercise results.

I do not believe that history can ever repeat itself. The conditions of an event at one time can never be precisely duplicated later. But, bolstered by the Rowland study, I am confident that history paraphrases itself.

If large bodies of historical data are compiled, the patterns are clear and unmistakable, even if slightly fuzzy around the edges. Behavior in accordance with this pattern is therefore typical. As we have already agreed, sometimes behavior can be different from the pattern, but we know that it is untypical, and we can then seek for the reason, which invariably can be discovered.

This permits what l call an actuarial approach to data analysis. We can never predict precisely what will happen under any circumstances. But the actuarial approach, with ample data, provides confidence that the patterns reveal what is to happen under those circumstances, even if the actual results in individual instances vary to some extent from this “norm” (to use the Soviet military historical expression.).

It is relatively easy to take into account the differences in performance resulting from new weapons and equipment. The characteristics of the historical weapons and the current (or projected) weapons can be readily compared, and adjustments made accordingly in the validation procedure.

In the early 1960s an effort was made at SHAPE Headquarters to test the ATLAS Model against World War II data for the German invasion of Western Europe in May, 1940. The first excursion had the Allies ending up on the Rhine River. This was apparently quite reasonable: the Allies substantially outnumbered the Germans, they had more tanks, and their tanks were better. However, despite these Allied advantages, the actual events in 1940 had not matched what ATLAS was now predicting. So the analysts did a little “fine tuning,” (a splendid term for fudging). Alter the so-called adjustments, they tried again, and ran another excursion. This time the model had the Allies ending up in Berlin. The analysts (may the Lord forgive them!) were quite satisfied with the ability of ATLAS to represent modem combat. (Or at least they said so.) Their official conclusion was that the historical example was worthless, since weapons and equipment had changed so much in the preceding 20 years!

As I demonstrated in my book, Options of Command, the problem was that the model was unable to represent the German strategy, or to reflect the relative combat effectiveness of the opponents. The analysts should have reached a different conclusion. ATLAS had failed validation because a model that cannot with reasonable faithfulness and consistency replicate historical combat experience, certainly will be unable validly to reflect current or future combat.

How then, do we account for what l have said about the fuzziness of patterns, and the fact that individual historical examples may not fit the patterns? I will give you my rules of thumb:

  1. The battle outcome should reflect historical success-failure experience about four times out of five.
  2. For attrition rates, the model average of five historical scenarios should be consistent with the historical average within a factor of about 1.5.
  3. For the advance rates, the model average of five historical scenarios should be consistent with the historical average within a factor of about 1.5.

Just as the heavens are the laboratory of the astronomer, so military history is the laboratory of the soldier and the military operations research analyst. The scientific basis for both astronomy and military science is the recording of the movements and relationships of bodies, and then analysis of those movements. (In the one case the bodies are heavenly, in the other they are very terrestrial.)

I repeat: Military history is the laboratory of the soldier. Failure of the analyst to use this laboratory will doom him to live with the scientific equivalent of Ptolomean astronomy, whereas he could use the evidence available in his laboratory to progress to the military science equivalent of Copernican astronomy.

Share this:
Shawn Woodford
Shawn Woodford

Shawn Robert Woodford, Ph.D., is a military historian with nearly two decades of research, writing, and analytical experience on operations, strategy, and national security policy. His work has focused on special operations, unconventional and paramilitary warfare, counterinsurgency, counterterrorism, naval history, quantitative historical analysis, nineteenth and twentieth century military history, and the history of nuclear weapon development. He has a strong research interest in the relationship between politics and strategy in warfare and the epistemology of wargaming and combat modeling.

All views expressed here are his and do not reflect those of any other private or public organization or entity.

Articles: 302


  1. A marginal note.
    First of all, most combat prediction models are flawed, frequently fail and the US Army’s criticism on Dupuys model does not differ substantially from the listed problems in your post (or at least their “data fitting” explanation).
    I have talked to “tank combat enthusiasts” about the correct assessement of their potential and literature (referring specifically to WW2 and cold war era) failed to do a correct assessement of their potential. Reasons for these are manifold, such as a general lack of understanding and bias (currently, I work on my own “tank potential and combat rating evaluator”), but it would take a long time to elaborate.
    If someone wants to analyse the conflict of 1940 and that merely from the narrow lens of focusing on tanks, then this specific approach can be already regarded as quite bad. Tanks played an overall small role and represent a small fraction of firepower and only inflict a small fraction of casualties anyway.
    Aside from that, technology is a matter of trade-offs with the respective system being specialized and calibrated to the armed forces individual demands. Technically speaking, observing artillery and its utilization would be a far better approach (at least for WW2).
    For Germany (at least initially) tanks played a rather insignificant role in their initial successes and general art of war, until vehicles with higher staying and combat power arrived. Other than that they always possessed a solid level of quality engineering, their projectiles were superior during the entire war (shatter gap, nose hardness, BC etc.), early utilization of radios, 3 men turret, superior coated optics/MILS, first strike capability etc etc. During WW2 the correletaion between plate and projectile was undoubtedly deceisive for “tank battles” or AT defense but that is hardly relevant for the entire outcome of WW2.
    Whenever I read an assessement throwing words around like “superiority” I quickly lose interest and they simply cannot be useful – they only exist to fuel chauvinistic, nationalistic and ethnocentric convictions anyway (Internet discussions are a clear testimony to that but to encounter this in military studies is simply embarassing and frustrating). The most important thing to remember is that there are ad- and disadvantages and an individual focus on certain areas, e.g. the Soviets had a high interest in AFVs, specifically tanks. One could argue that Russia suffers from a tank obsession, while being interested (and scouting) in the most modern tank concepts which exist on the world. They were (and the USSR was no exception) constrained by their underdeveloped industry and unskilled labour. Result: Poor quality of manufacturing, reliability, lack of adequate radios, optics and quality of projectiles despite introducing modern concepts and relying on license builds they acquired from countries like the US, France and Britain prewar. Despite this, WW2 literature is overfilled with Soviet “tank superiority”, a rather ridiculous assessement, reflected by their loss rates and overall chemical analysis, even carrying over into contigencies past WW2 (Korea or Six Days).
    To sum up: Paper characteristics and anecdotal evidence of individual weapon systems inherit various problems that are often of the same nature as found in combat prediction models.
    Everyone will state: The Katana is the perfect blade. How many will realize that Japanese steel was inferior and the preferred weapons of the battlefield were the yari and yumi (and matchlocks)?
    Everyone will state that a Kopis or Falx have more slashing power than a Gladius, yet the Romans conquered the mediterranean. Combat models will never be able to correctly assess this, because so far they barely tried. War is an orchestra and not for soloists.
    History has revealed that predicting the outcomes of conflicts is rather difficult. From the 2nd punic wars to the Nazi-Soviet war – there were many unexpected developments which could only be understood through the eye of “hindsight”.
    Overall (theoretical) warmaking potential is still the best indicator, being a product of military effectiveness/traditions, geostrategy (including resource availability or weather), the economic power (including development levels/infracstructure, GDP/GNP per capita) and population size with the maximum amount of potentially recruitable soldiers, but independent of diplomacy, mentality or ideologies and decision making.
    If I tell people that Italy could have had a substantial influence on the outcome of WW2 (judging by their performance of WW1), people immediately dismiss this statement as a bad joke. It is clear that it was a matter of decisions, something a model cannot predict.

    Another thing: “They had more tanks”.
    Weapons stand in relation to the resource allocations and choices of the respective faction. The number of weapons is based on the number of hands in the armed forces (labour force). A numerical advantage usually results in more tanks, should there be no limitation by the resource base and economy (and choices), ignoring qualitative factors for now. This is also a common misconception, frequently repeated over the net and most literature.

  2. Chris,

    I understand your post, but prediction is not the reason most Army simulations exist. They exist as a vehicle to enable students/staffs to maintain and improve readiness. They are used to let units improve their staff skills, SOPs, reporting procedures, and planning, specifically in the MDMP. As such, in terms of achieving a learning objective, it doesn’t really matter if you win a battle in a simulation or at the NTC so long as you identify mistakes and strong points and seek to minimize the former and maintain the latter. The US Army did pretty well in Desert Storm, defying most predictions, but many of those same units did not do so well at the National Training Center a year later in spite of their combat experiences, some leadership continuity, and training both in the field and in simulation.

    As the National Simulations Center Training with Simulations Handbook puts it:

    “C2 training simulations should not be employed to analyze plans in specific terms of outcome. They can be used in a training environment, however, to assist in learning about generalizations on maneuver and logistics, but they should not be relied on to provide specifics on how much and what type of ‘widgets’ to use in a particular scenario or operational situation.

    C2 training simulations should not be relied upon to validate war plans and they must not be used for that purpose. C2 training simulations simulate the real world and provide a simulation of the real-world conditions of approximately 80-85%, depending on which simulation is used. However, this 80-85% simulation is not 100% and should not be seen as an exact replication of the real-world; it is not. However, the 80-85% “solution” that C2 training simulations represent means that they can perform their assistance and support role to C2 elements very well if the exercise is designed around legitimate training objectives.

    It is a mistake, repeat mistake, and a misuse of these simulations to attempt to validate war plans. The algorithms used in training simulations provide sufficient fidelity for training, not validation of war plans. This is due to the fact that important factors (leadership, morale, terrain, weather, level of training of the units) and a myriad of human and environmental impacts are not modeled in sufficient detail to provide the types of plans analysis usually associated with warfighting decisions used to validate war plans.”

    —Chapter 3, page 13. Written in approximately 1997-1998. (This is from an old copy I have. I don’t know if it has been updated yet.)

    Thus, training simulations have a different purpose than an analytical simulation.

    As I am sure you know, prediction is hard. If it was easy, the National Hurricane Center would not be using large ensembles of simulations for hurricane prediction and the fact that as the storm lasts longer and nears land, they are able to narrow the “cone of death” for fairly accurate landfall predictions. But they frequently are unable to predict last minute turns prompted by some “burble” in the ocean or air that causes a storm that looks like it going to go up the West Coast of Florida, instead go up the center and exit on the East Coast.

    And the storm is not purposefully training to thwart the forecast, whereas in a battle prediction, we may have even less knowledge of enemy doctrine, command personalities, and even weapon capabilities than the NHS has of the weather.

    As I would say to commanders of units I evaluated, “Winning is more fun, but you learn more when you lose.”

    This does not mean the analytical side of simulations can not strive for more precision, and maybe one day, the two will converge, but I don’t think that day is here yet.

  3. Mike,

    See the blog post imaginatively called “response” for my response. I figured it was too important of a discussion to leave buried in the comments. Also, I would recommend that you take a look at Chapter 18 (Modeling Warfare) in my book War by Numbers.

Leave a Reply

Your email address will not be published. Required fields are marked *