Category Modeling, Simulation & Wargaming

Validating A Combat Model (Part III)

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

Numerical Adjustment of CEV Results: Averages and Means
by Christopher A. Lawrence and David L. Bongard

As part of the battalion-level validation effort, we made two runs with the model for each test case—one without CEV [Combat Effectiveness Value] incorporated and one with the CEV incorporated. The printout of a TNDM [Tactical Numerical Deterministic Model] run has three CEV figures for each side: CEVt CEVl and CEVad. CEVt shows the CEV as calculated on the basis of battlefield results as a ratio of the performance of side a versus side b. It measures performance based upon three factors: mission accomplishment, advance, and casualty effectiveness. CEVt is calculated according to the following formula:

P′ = Refined Combat Power Ratio (sum of the modified OLls). The ′ in P′ indicates that this ratio has been “refined” (modified) by two behavioral values already: the factor for Surprise and the Set Piece Factor.

CEVd = 1/CEVa (the reciprocal)

In effect the formula is relative results multiplied by the modified combat power ratio. This is basically the formulation that was used for the QJM [Quantified Judgement Model].

In the TNDM Manual, there is an alternate CEV method based upon comparative effective lethality. This methodology has the advantage that the user doesn’t have to evaluate mission accomplishment on a ten point scale. The CEVI calculated according to the following formula:

In effect, CEVt is a measurement of the difference in results predicted by the model from actual historical results based upon assessment for three different factors (mission success, advance rates, and casualties), while CEVl is a measurement of the difference in predicted casualties from actual casualties. The CEVt and the CEVl of the defender is the reciprocal of the one for the attacker.

Now the problem comes in when one creates the CEVad, which is the average of the two CEVs above. l simply do not know why it was decided to create an alternate CEV calculation from the old QJM method, and then average the two, but this is what is currently being done in the model. This averaging results in a revised CEV for the attacker and for the defender that are not reciprocals of each other, unless the CEVt and the CEVl were the same. We even have some cases where both sides had a CEVad of greater than one. Also, by averaging the two, we have heavily weighted casualty effectiveness relative to mission effectiveness and mission accomplishment.

What was done in these cases (again based more on TDI tradition or habit, and not on any specific rule) was:

(1.) If CEVad are reciprocals, then use as is.

(2.) If one CEV is greater than one while the other is less than 1,  then add the higher CEV to the value of the reciprocal of the lower CEV (1/x) and divide by two. This result is the CEV for the superior force, and its reciprocal is the CEV for the inferior force.

(3.) If both CEVs are above zero, then we divide the larger CEVad value by the smaller, and use its result as the superior force’s CEV.

In the case of (3.) above, this methodology usually results in a slightly higher CEV for the attacker side than if we used the average of the reciprocal (usually 0.1 or 0.2 higher). While the mathematical and logical consistency of the procedure bothered me, the logic for the different procedure in (3.) was that the model was clearly having a problem with predicting the engagement to start with, but that in most cases when this happened before (meaning before the validation), a higher CEV usually produced a better fit than a lower one. As this is what was done before. I accepted it as is, especially if one looks at the example of Mediah Farm. If one averages the reciprocal with the US’s CEV of 8.065, one would get a CEV of 4.13. By the methodology in (3.), one comes up with a more reasonable US CEV of 1.58.

The interesting aspect is that the TNDM rules manual explains how CEVt, CEVl and CEVad are calculated, but never is it explained which CEVad (attacker or defender) should be used. This is the first explanation of this process, and was based upon the “traditions” used at TDI. There is a strong argument to merge the two CEVs into one formulation. I am open to another methodology for calculating CEV. I am not satisfied with how CEV is calculated in the TNDM and intend to look into this further. Expect another article on this subject in the next issue.

Validating A Combat Model (Part II)

[The article below is reprinted from October 1996 edition of The International TNDM Newsletter.]

Validation of the TNDM at Battalion Level
by Christopher A. Lawrence

The original QJM (Quantified Judgement Model) was created and validated using primarily division-level engagements from WWII and the 1967 and 1973 Mid-East Wars. For a number of reasons, we are now using the TNDM (Tactical Numerical Deterministic Model) for analyzing lower-level engagements. We expect, with the changed environment in the world, this trend to continue.

The model, while designed to handle battalion-level engagements, was never validated for those size engagements. There were only 16 engagements in the original QJM Database with less than 5,000 people on one side, and only one with less than 2,000 people on a side. The sixteen smallest engagements are:

While it is not unusual in the operations research community to use unvalidated models of combat, it is a very poor practice. As TDI is starting to use this model for battalion-level engagements, it is time it was formally validated for that use. A model that is validated at one level of combat is not validated to represent sizes, types and forms of combat to which it has not been tested. TDI is undertaking a battalion-level validation effort for the TNDM. We intend to publish the material used and the results of the validation in the International TNDM Newsletter. As part of this battalion-level validation we will also be looking at a number of company-level engagements. Right now, my intention is to simply just throw all the engagements into the same hopper and see what comes out.

By battalion-level, I mean any operation consisting of the equivalent of two or less reinforced battalions on one side. Three or more battalions imply a regiment or brigade—level operation. A battalion in combat can range widely in strength, but that usually does not have an authorized strength in excess of 900. Therefore, the upper limit for a battalion—level engagement is 2,000 people, while its lower limit can easily go below 500 people. Only one engagement in the original OJM Database fits that definition of a battalion-level engagement. HERO, DMSI, TND & Associates, and TDI (all companies founded by Trevor N. Dupuy) examined a number of small engagements over the years. HERO assembled 23 WWI engagements for the Land Warfare Database (LWDB), TDI has done 15 WWII small unit actions for the Suppression contract and Dave Bongard has assembled four others from that period for the Pacific, DMSI did 14 battalion-level engagements from Vietnam for a study on low intensity conflict 10 years ago, and Dave Bongard has been independently looking into the Falkland Islands War and other post-WWII sources to locate 10 more engagements, and we have three engagements that Trevor N. Dupuy did for South Africa. We added two other World War II engagements and the three smallest engagements from the list to the left (those marked with an asterisk). This gives us a list of 74 additional engagements that can be used to test the TNDM.

The smallest of these engagements is 220 people on both sides (100 vs I20), while the largest engagement on this list is 5,336 versus 3,270 or 8,679 vs 725. These 74 engagements consist of 23 engagements from WWI, 22 from WWII, and 29 post-1945 engagements. There are three engagements where both sides have over 3,000 men and 3 more where both sides are above 2,000 men. In the other 68 engagements, at least one side is below 2,000, while in 50 of the engagements, both sides are below 2,000.

This leaves the following force sizes to be tested:

These engagements have been “randomly” selected in the sense that the researchers grabbed whatever had been done and whatever else was conveniently available. It is not a proper random selection, in the sense that every war in this century was analyzed and a representative number of engagements was taken from each conflict. This is not practical, so we settle for less than perfect data selection.

Furthermore, as many of these conflicts are with countries that do not have open archives (and in many cases limited unit records) some of the opposing forces strength and losses had to be estimated. This is especially true with the Viet Nam engagements. It is hoped that the errors in estimation deviate equally on both sides of the norm, but there is no way of knowing that until countries like the People’s Republic of China and Vietnam open up their archives for free independent research.

TDI intends to continue to look for battalion-level and smaller engagements for analysis, and may add to this data base over time. If some of our readers have any other data assembled, we would be interested in seeing it. In the next issue we will publish the preliminary results of our validation.

Note that in the above table, for World War II, German, Japanese, and Axis forces are listed in italics, while US, British, and Allied forces are listed in regular typeface, Also, in the VERITABLE engagements, the 5/7th Gordons’ action continues the assault of the 7th Black Watch, and that the 9th Cameronians assumed the attack begun by the 2d Gordon Highlanders.

Tu-Vu is described in some detail in Fall’s Street Without Joy (pp. 51-53). The remaining Indochina/SE Asia engagements listed here are drawn from a QJM-based analysis of low-intensity operations (HERO Report 124, Feb 1988).

The coding for source and validation status, on the extreme right of each engagement line in the D Cas column, is as follows:

  • n indicates an engagement which has not been employed for validation, but for which good data exists for both sides (35 total).
  • Q indicates an engagement which was part of the original QJM database (3 total).
  • Q+ indicates an engagement which was analyzed as part of the QJM low-intensity combat study in 1988 (14 total).
  • T indicates an engagement analyzed with the TNDM (20 total).

Validating A Combat Model

The question of validating combat models—“To confirm or prove that the output or outputs of a model are consistent with the real-world functioning or operation of the process, procedure, or activity which the model is intended to represent or replicate”—as Trevor Dupuy put it, has taken up a lot of space on the TDI blog this year. What this discussion did not address is what an effort to validate a combat model actually looks like. This will be the first in a series of posts that will do exactly that.

Under the guidance of Christopher A. Lawrence, TDI undertook a battalion-level validation of Dupuy’s Tactical Numerical Deterministic Model (TNDM) in late 1996. This effort tested the model against 76 engagements from World War I, World War II, and the post-1945 world including Vietnam, the Arab-Israeli Wars, the Falklands War, Angola, Nicaragua, etc. It was probably one of the more independent and better-documented validations of a casualty estimation methodology that has ever been conducted to date, in that:

  • The data was independently assembled (assembled for other purposes before the validation) by a number of different historians.
  • There were no calibration runs or adjustments made to the model before the test.
  • The data included a wide range of material from different conflicts and times (from 1918 to 1983).
  • The validation runs were conducted independently (Susan Rich conducted the validation runs, while Christopher A. Lawrence evaluated them).
  • The results of the validation were fully published.
  • The people conducting the validation were independent, in the sense that:

a) there was no contract, management, or agency requesting the validation;
b) none of the validators had previously been involved in designing the model, and had only very limited experience in using it; and
c) the original model designer was not able to oversee or influence the validation. (Dupuy passed away in July 1995 and the validation was conducted in 1996 and 1997.)

The validation was not truly independent, as the model tested was a commercial product of TDI, and the person conducting the test was an employee of the Institute. On the other hand, this was an independent effort in the sense that the effort was employee-initiated and not requested or reviewed by the management of the Institute.

Descriptions and outcomes of this validation effort were first reported in The International TNDM Newsletter. Chris Lawrence also addressed validation of the TNDM in Chapter 19 of War by Numbers (2017).

What is Lethality?

Shawn Woodford did a blog post last month about Trevor Dupuy’s Definitions of Lethality:

Trevor Dupuy’s Definitions of Lethality

As he noted in a recent email to me:

I went back to look at the blog post on how TND defined lethality and it dawned on me that he actually stated it in at least two different ways:

AND

 

Well, I am not sure that Trevor invested a whole lot of time in the definition or discussion of the meaning of lethality. I did work directly with him for several years and I don’t recall it ever coming up in conversation.

I think lethality is both, the destructive power of weapons and the ability to injure and kill people. It depends on the weapon and what you are shooting at. Also, depends on the measuring construct you are using. Trevor Dupuy’s models, the QJM/TNDM, were focused on estimating human losses in combat. Other combat models are built around a SSPK (Single-Shot Probability of Kill) calculation and “lethal area” calculations. This certainly includes CAA’s  COSAGE/ATCAL/CEM and the RAND/CAA’s COSAGE/ATCAL/JICM hierarchy of models. This approach is oriented toward measuring weapons system losses. Their personnel casualties are then calculated from there. I think they are both trying to measure lethality, just using slightly different metrics.

Lethality is clearly not the same as combat effectiveness. There is a lot more to combat effectiveness then what comes out of the barrel of a gun.

Counting Holes in Tanks in Tunisia

M4A1 Sherman destroyed in combat in Tunisia, 1943.

[NOTE: This piece was originally posted on 23 August 2016]

A few years ago, I came across a student battle analysis exercise prepared by the U.S. Army Combat Studies Institute on the Battle of Kasserine Pass in Tunisia in February 1943. At the time, I noted the diagram below (click for larger version), which showed the locations of U.S. tanks knocked out during a counterattack conducted by Combat Command C (CCC) of the U.S. 1st Armored Division against elements of the German 10th and 21st Panzer Divisions near the village of Sidi Bou Zid on 15 February 1943. Without reconnaissance and in the teeth of enemy air superiority, the inexperienced CCC attacked directly into a classic German tank ambush. CCC’s drive on Sidi Bou Zid was halted by a screen of German anti-tank guns, while elements of the two panzer divisions attacked the Americans on both flanks. By the time CCC withdrew several hours later, it had lost 46 of 52 M4 Sherman medium tanks, along with 15 officers and 298 men killed, captured, or missing.

Sidi Bou Zid00During a recent conversation with my colleague, Chris Lawrence, I recalled the diagram and became curious where it had originated. It identified the location of each destroyed tank, which company it belonged to, and what type of enemy weapon apparently destroyed it; significant battlefield features; and the general locations and movements of the enemy forces. What it revealed was significant. None of CCC’s M4 tanks were disabled or destroyed by a penetration of their frontal armor. Only one was hit by a German 88mm round from either the anti-tank guns or from the handful of available Panzer Mk. VI Tigers. All of the rest were hit with 50mm rounds from Panzer Mk. IIIs, which constituted most of the German force, or by 75mm rounds from Mk. IV’s. The Americans were not defeated by better German tanks. The M4 was superior to the Mk. III and equal to the Mk. IV; the dreaded 88mm anti-tank guns and Tiger tanks played little role in the destruction. The Americans had succumbed to superior German tactics and their own errors.

Counting dead tanks and analyzing their cause of death would have been an undertaking conducted by military operations researchers, at least in the early days of the profession. As Chris pointed out however, the Kasserine battle took place before the inception of operations research in the U.S. Army.

After a bit of digging online, I still have not been able to establish paternity of the diagram, but I think it was created as part of a battlefield survey conducted by the headquarters staff of either the U.S. 1st Armored Division, or one of its subordinate combat commands. The only reference I can find for it is as part of a historical report compiled by Brigadier General Paul Robinett, submitted to support the preparation of Northwest Africa: Seizing the Initiative in the West by George F. Howe, the U.S. Army Center of Military History’s (CMH) official history volume on U.S. Army operations in North Africa, published in 1956. Robinett was the commander of Combat Command B, U.S. 1st Armored Division during the Battle of Kasserine Pass, but did not participate in the engagement at Sidi Bou Zid. His report is excerpted in a set of readings (pp. 103-120) provided as background material for a Kasserine Pass staff ride prepared by CMH. (Curiously, the account of the 15 February engagement at Sidi Bou Zid in Northwest Africa [pp. 419-422] does not reference Robinett’s study.)

Robinett’s report appeared to include an annotated copy of a topographical map labeled “approximate location of destroyed U.S. tanks (as surveyed three weeks later).” This suggests that the battlefield was surveyed in late March 1943, after U.S. forces had defeated the Germans and regained control of the area.

Sidi Bou Zid02The report also included a version of the schematic diagram later reproduced by CMH. The notes on the map seem to indicate that the survey was the work of staff officers, perhaps at Robinett’s direction, possibly as part of an after-action report.

Sidi Bou Zid03If anyone knows more about the origins of this bit of battlefield archaeology, I would love to know more about it. As far as I know, this assessment was unique, at least in the U.S. Army in World War II.

Peter Perla on Prediction

Col. Trevor Nevitt Dupuy Arlington, Virginia, 2 June 1995 Photograph by Gary S. Schofield

Peter Perla has been around the industry a while, although I have never intersected with him. He was the keynote speaker at the Connections Wargaming Conference in 2017. His presentation was “Peter Perla on Prediction,” which has great alliteration. It is here: https://paxsims.files.wordpress.com/2017/12/connections-us-2017-wargaming-conference-proceedings.pdf

Early in his presentation he quotes Trevor Dupuy (on page 5 of his presentation). He states that he said to Trevor Dupuy “Good grief, Trevor, we can’t even predict the outcome of a Super Bowl game much less that of a battle!” Trevor Dupuy responded “Well, if that is true, what are we doing? What’s the point?”

He then quotes Jim Dunnigan as saying (on page 7): “If you want your wargame to predict the future, you better make sure it can predict the past.”

Of course, this last point is why The Dupuy Institute has developed databases on the Battle of the Bulge, Kursk, Battle of Britain, some 1200 battles since 1600, and over 100 post-WWII insurgencies.

Now, I do happen to agree with those two gentlemen. Dr. Perla presentation then goes on for a while (and I have gotten into the shameful habit of speed reading most things now) and finally concludes (on page 43) in response to the question “Why do we wargame?” with  “We do it to help us all make more accurate predictions by leveraging all our combined knowledge, experience and creativity, so that we can make more effective decisions in complex and uncertain situations.”

Let quote his entire paragraph, so I don’t look like I am just cherry-picking the phrases I want (as opposed to how some people our using our report The Historical Combat Effectiveness of Lighter-Weight Armored Forces):

We do it to help us all make more accurate predictions by leveraging all our combined knowledge, experience and creativity, so that we can make more effective decisions in complex and uncertain situations. We do it to question, to learn and to understand. We do it because Wargames entertain; they stir the imagination. Wargame engage; they stimulate the intellect. And Wargames enlighten; the create synthetic experience. And it is experience, both real and synthetic, that makes abstract risks tangible and effective planning possible.

And as Matt Caffrey has said on so many occasions, we do it because wargames save money, and most importantly, wargames saves lives. That’s why I have been doing this for forty years. I hope you all will continue to do it for forty more.

I gather this is different than what he used to state.

Anyhow, the next Connections Wargaming Conference is up in Carlisle, PA on 13-16 August, 2019. See: https://connections-wargaming.com/. I probably will not be attending. Still, this is a worthwhile effort that has been run for decades by Matt Caffrey, now of the Air Force Research Laboratory, along many others.

 

P.S. In Peter Perla’s presentation he uses this picture of Trevor Dupuy. The photograph was taken by Gary. S. Schofield on 2 June 1995.

No Action on Validation In the 2020 National Defense Act Authorization

Well, I got my hopes up that the Department of Defense modeling and simulation community was finally going to be forced, kicking-and-screaming, to move forward; ensuring that their models were properly validated, not build upon a “base of sand” and making sure they are not assembled like some “house of cards.” This was to come about through four paragraphs in the Senate’s initial markup of the National Defense Authorization Defense Act (NDAA) of 2020 that instructed DOD to assemble a team “..to assess the quality of these models and make recommendations…not later than December 31, 2020.”

The original four paragraphs are here:

U.S. Senate on Model Validation

Well, it looks like this is not going to happen !!!

According to a little research done by Shawn Woodford, it turns out that that modeling and simulation validation proposal in the original Senate Armed Services Committee report for the 2020 NDAA dated 11 June 2019 did not make it to the final Senate 2020 NDAA bill, passed on 2 July 2020. The proposal was also not included in the House version. The House and Senate versions are now being reconciled in committee and the final 2020 NDAA will probably be approved soon now that there is a general bi-partisan overall budget agreement. There will be a defense budget, but, it appears that it won’t address validation. There is a slim possibility this could change if it is added back in by committee at the last minute.

The 2020 NDAA SASC Report, 11 June 2019:
https://www.congress.gov/116/crpt/srpt48/CRPT-116srpt48.pdf

The 2020 NDAA S. 1790 SASC final markup, 2 July 2019:
https://www.congress.gov/116/bills/s1790/BILLS-116s1790es.pdf

The 2020 NDAA HR 2500 HASC final markup, 12 July 2019:
https://www.congress.gov/bill/116th-congress/house-bill/2500/text

We would love to know who got those four paragraphs placed into the original Senate NDAA mark-up to start with; and afterwards, why they were then stripped out of the final bill. Clearly someone felt it was important enough to be put in there (as do we). We do not know who those “someone” are. And….who was it that stepped in from wherever and made sure those four paragraphs were removed?

If anyone knows anything further about this, please let us know.


P.S.

Source for 1991 RAND “Base of Sand” paper:                                                                 . https://www.rand.org/pubs/notes/N3148.html

The phrase “house of cards” we used in a report we did on casualty estimation methodologies (Casualty Estimation Methodologies Studies, 25 July 2005, The Dupuy Institute, page 32). To quote:

In 1991, Paul Davis and Donald Blumenthal employed the term “base of sand” to describe the essential modeling problem of the day. They described one of the core problems as a lack of a vigorous military science.

Unfortunately, this was the responsibility of the operations research community. Understanding military science was part of what ORO [Operations Research Office] was attempting to do in its early days. It was the operations research community who proposed the models, felt they could develop models, sold them to the military, spent the budget and finally produced the models built upon a “base of sand.” As such, they are the community that needs to correct the problem and produce this theory of combat. They are the scientists.

Yet, the problem is bigger than a “base of sand.” This phrase implies that there is a shortfall of data to start with. Yet every complex model (and most of these models are complex) is built from a number of interrelationships within the model. This is even more so for hierarchy of model structures. Each of these inter-relationships, which are often model unique constructs, is often built upon “expert judgment.” Therefore, the “base of sand” does not just start at the bottom, but carries through to each individual function within a model. As such, what has been built upon this “base of sand” is a “house of cards.”

Validation of Wargaming Simulation Models – Confusion !!

[Clinton Reilly has been a regular commentator on this blog. We present here a guest blog post from him]

Originally, I was heartened to read in your blog that the U.S. Congress is setting up a committee to oversee the validation of wargaming models, which were seen to be of doubtful validity. Validation is obviously a ‘good thing’ as it enhances models and so they produce more useful and reliable results. Your blog has put forward several articles to this effect.

I hastened to communicate this to a senior member of the Australian Defence Science and Technology (DST) group expecting an enthusiastic response, in anticipation of the Australian government following suit. However, much to my surprise, the said member addressed the MORS Modelling and Simulation Community of Practice (CoP) saying that this was a matter of some concern as the testing and validation may not be tailored to the objectives of the individual models. Members were asked to comment on this alarming prospect. There was no comment.

While this is a possibility, it seems to me that in such a rational field the logical argument that tests must be tailored to objectives would prevail. It seems hardly worth saying!

So, I replied with a more heartening email to CoP members that validation was only to be seen as a boon and should be welcomed and encouraged. Wargames would be improved. I also emailed the MORS Wargaming CoP with a similar message, again asking for comment on the posts in the MORS Modelling and Simulation CoP.

Now this is where the confusion sets in. In the weeks since the emails were sent no one has replied. There has been no direct comment to me either. Why I ask? In a military community where modelling and wargaming is a central activity – why has no one replied on either the validity of current models or the need for greater validation?

I am submitting this to your blog in the hope that someone in the worldwide wargaming community has an opinion. Is there a problem with extensive validation of existing models? Is a program of validation needed to improve a low level of validity?  Does anyone care about the standard of current models and their outputs?

Does anyone reading this blog have a comment?

Clinton Reilly

Managing Director

Computer Strategies Pty Ltd

Sydney, Australia

Status of Defense Act

A month ago, I flagged pages 253-254 of the report 116-48, supporting the National Defense Authorization Act for Fiscal Year 2020. This report is here: https://www.congress.gov/116/crpt/srpt48/CRPT-116srpt48.pdf

This kicker was the statement that “The committee is concerned that…these models…has not been adequately validated….using real world data….[and] are simplistic by comparison…” The entire four paragraphs are quoted in this blog post:

U.S. Senate on Model Validation

The current text of the actual Defense Act, dated 6/27/19 is here: https://www.congress.gov/116/bills/s1790/BILLS-116s1790es.pdf

Now, I don’t know how these two 609- and 1726-page documents connect, but I gather the requirements still exist to have a team “..to assess the quality of these models and make recommendations…not later than December 31, 2020.”

Does anyone know anything further about this effort?

Today – Speaking at Historicon in Lancaster, PA., Friday 12 July

I will be speaking at Historicon in Lancaster, PA., Friday 12 July, at 6 PM. Historicon is one of the three major annual wargaming conventions run by the Historical Miniatures Gaming Society (HMGS). It will be run from 10 July-14 July, 2019. Their website is here: https://www.hmgs.org/general/custom.asp?page=HconHome

As part of this large convention, they have organized a “War College.” This is an impressive effort that includes 18 lectures on Thursday, Friday and Saturday. I have the last lecture on Friday, from 6 – 7 PM. The speakers for this series include published authors Paul Westermeyer, Pete Panzeri, Steve R. Waddell and John Prados, among others. Lecture descriptions are here:                                                                               . https://cdn.ymaws.com/www.hmgs.org/resource/resmgr/historicon/hcon_19/pels/19_war_college_pel_6-19-2019.pdf

I will be doing a presentation similar to the one I did at the New York Military Affairs Symposium (NYMAS). It is based upon part of my book War by Numbers: Understanding Conventional Combat.