Mystics & Statistics

SMEs

Continuing my comments on the article in the December 2018 issue of the Phalanx by Alt, Morey and Larimer (this is part 6 of 7; see Part 1, Part 2, Part 3, Part 4, Part 5).

SMEs….is a truly odd sounding acronym that means Subject Matter Experts. They talk about it extensively in their article, and this I have no problem with. I do want to make three points related to that:

  1. A SME is not a substitution for validation.
  2. In some respects, the QJM (Quantified Judgment Model) is a quantified and validated SME.
  3. How do you know that the SME is right?

If you can substitute a SME for a proper validation effort, then perhaps you could just substitute the SME for the model. This would save time and money. If your SME is knowledgable enough to sprinkle holy water on the model and bless its results, why not just skip the model and ask the SME. We could certainly simplify and speed up analysis by removing the models and just asking our favorite SME. The weaknesses of this approach are obvious.

Then there is Trevor N. Dupuy’s Quantified Judgment Model (QJM) and Quantified Judgment Method of Analysis (QJMA). This is, in some respects, a SME quantified. Actually it was a board of SMEs, who working with a series of historical studies (the list of studies starts here: http://www.dupuyinstitute.org/tdipubs.htm ). These SMEs developed a set of values for different situations, and then insert them into a model. They then validated the model to historical data (also known as real-world combat data). While the QJM has come under considerable criticism from elements of the Operations Research community…..if you are using SMEs, then in fact, you are using something akin, but less rigorous, than Trevor Dupuy’s Quantified Judgment Method of Analysis.

This last point, how do we know that the SME is right, is significant. How do you test your SMEs to ensure that what they are saying is correct? Another SME, a board of SMEs? Maybe a BOGSAT? Can you validate SMEs? There are limits to SME’s. In the end, you need a validated model.

 

Historical Demonstrations?

Photo from the 1941 Louisiana Maneuvers

Continuing my comments on the article in the December 2018 issue of the Phalanx by Alt, Morey and Larimer (this is part 5 of 7; see Part 1, Part 2, Part 3, Part 4).

The authors of the Phalanx article then make the snarky statement that:

Combat simulations have been successfully used to replicate historical battles as a demonstration, but this is not a requirement or their primary intended use.

So, they say in three sentences that combat models using human factors are difficult to validate, they then say that physics-based models are validated, and then they say that running a battle through a model is a demonstration. Really?

Does such a demonstration show that the model works or does not work? Does such a demonstration show that they can get a reasonable outcome when using real-world data? The definition of validation that they gave on the first page of their article is:

The process of determining the degree to which a model or simulation with its associated data is an accurate representation of the real world from the perspective of its intended use is referred to as validation.

This is a perfectly good definition of validation. So where does one get that real-world data? If you are using the model to measure combat effects (as opposed to physical affects) then you probably need to validate it to real-world combat data. This means historical combat data, whether it is from 3,400 years ago or 1 second ago. You need to assemble the data from a (preferably recent) combat situation and run it through the model.

This has been done. The Dupuy Institute does not exist in a vacuum. We have assembled four sets of combat data bases for use in validation. They are:

  1. The Ardennes Campaign Simulation Data Base
  2.  The Kursk Data Base
  3. The Battle of Britain Data Base
  4. Our various division-level, battalion-level and company-level engagement database bases.

Now, the reason we have mostly used World War II data is that you can get detailed data from the unit records of both sides. To date….this is not possible for almost any war since 1945. But, if your high-tech model cannot predict lower-tech combat….then you probably also have a problem modeling high-tech combat. So, it is certainly a good starting point.

More to the point, this was work that was funded in part by the Center for Army Analysis, the Deputy Secretary of the Army (Operations Research) and Office of Secretary of Defense, Planning, Analysis and Evaluation. Hundreds of thousands of dollars were spent developing some of these databases. This was not done just for “demonstration.” This was not done as a hobby. If their sentence was meant to be-little the work of TDI, which is how I do interpret that sentence, then is also belittles the work of CAA, DUSA(OR) and OSD PA&E. I am not sure that is the three author’s intent.

Physics-based Aspects of Combat

Continuing my comments on the article in the December 2018 issue of the Phalanx by Alt, Morey and Larimer (this is part 4 of 7; see Part 1, Part 2, Part 3).

The next sentence in the article is interesting. After saying that validating models incorporating human behavior is difficult (and therefore should not be done?) they then say:

In combat simulations, those model components that lend themselves to empirical validation, such as the physics-based aspects of combat, are developed, validated, and verified using data from an accredited source.

This is good. But, the problem lies that it limits one to only validating models that do not include humans. If one is comparing a weapon system to a weapon system, as they discuss later, this is fine. On the other hand, if one is comparing units in combat to units in combat…then there are invariably humans involved. Even if you are comparing weapon systems versus weapon systems in an operational environment, there are humans involved. Therefore, you have to address human factors. Once you have gone beyond simple weapon versus weapon comparisons, you need to use models that are gaming situations that involved humans. I gather from the previous sentence (see part 3 of 7) and this sentence, that means that they are using un-validated models. Their extended discussions of SMEs (Subject Matter Experts) that follows just reinforces that impression.

But, TRADOC is the training and doctrine command. They are clearly modeling something other than just the “physics-based aspect of combat.”

Validating Attrition

Continuing to comment on the article in the December 2018 issue of the Phalanx by Alt, Morey and Larimer (this is part 3 of 7; see Part 1, Part 2)

On the first page (page 28) in the third column they make the statement that:

Models of complex systems, especially those that incorporate human behavior, such as that demonstrated in combat, do not often lend themselves to empirical validation of output measures, such as attrition.

Really? Why can’t you? If fact, isn’t that exactly the model you should be validating?

More to the point, people have validated attrition models. Let me list a few cases (this list is not exhaustive):

1. Done by Center for Army Analysis (CAA) for the CEM (Concepts Evaluation Model) using Ardennes Campaign Simulation Study (ARCAS) data. Take a look at this study done for Stochastic CEM (STOCEM): https://apps.dtic.mil/dtic/tr/fulltext/u2/a489349.pdf

2. Done in 2005 by The Dupuy Institute for six different casualty estimation methodologies as part of Casualty Estimation Methodologies Studies. This was work done for the Army Medical Department and funded by DUSA (OR). It is listed here as report CE-1: http://www.dupuyinstitute.org/tdipub3.htm

3. Done in 2006 by The Dupuy Institute for the TNDM (Tactical Numerical Deterministic Model) using Corps and Division-level data. This effort was funded by Boeing, not the U.S. government. This is discussed in depth in Chapter 19 of my book War by Numbers (pages 299-324) where we show 20 charts from such an effort. Let me show you one from page 315:

 

So, this is something that multiple people have done on multiple occasions. It is not so difficult that The Dupuy Institute was not able to do it. TRADOC is an organization with around 38,000 military and civilian employees, plus who knows how many contractors. I think this is something they could also do if they had the desire.

 

Validation

Continuing to comment on the article in the December 2018 issue of the Phalanx by Jonathan Alt, Christopher Morey and Larry Larimer (this is part 2 of 7; see part 1 here).

On the first page (page 28) top of the third column they make the rather declarative statement that:

The combat simulations used by military operations research and analysis agencies adhere to strict standards established by the DoD regarding verification, validation and accreditation (Department of Defense, 2009).

Now, I have not reviewed what has been done on verification, validation and accreditation since 2009, but I did do a few fairly exhaustive reviews before then. One such review is written up in depth in The International TNDM Newsletter. It is Volume 1, No. 4 (February 1997). You can find it here:

http://www.dupuyinstitute.org/tdipub4.htm

The newsletter includes a letter dated 21 January 1997 from the Scientific Advisor to the CG (Commanding General)  at TRADOC (Training and Doctrine Command). This is the same organization that the three gentlemen who wrote the article in the Phalanx work for. The Scientific Advisor sent a letter out to multiple commands to try to flag the issue of validation (letter is on page 6 of the newsletter). My understanding is that he received few responses (I saw only one, it was from Leavenworth). After that, I gather there was no further action taken. This was a while back, so maybe everything has changed, as I gather they are claiming with that declarative statement. I doubt it.

This issue to me is validation. Verification is often done. Actual validations are a lot rarer. In 1997, this was my list of combat models in the industry that had been validated (the list is on page 7 of the newsletter):

1. Atlas (using 1940 Campaign in the West)

2. Vector (using undocumented turning runs)

3. QJM (by HERO using WWII and Middle-East data)

4. CEM (by CAA using Ardennes Data Base)

5. SIMNET/JANUS (by IDA using 73 Easting data)

 

Now, in 2005 we did a report on Casualty Estimation Methodologies (it is report CE-1 list here: http://www.dupuyinstitute.org/tdipub3.htm). We reviewed the listing of validation efforts, and from 1997 to 2005…nothing new had been done (except for a battalion-level validation we had done for the TNDM). So am I now to believe that since 2009, they have actively and aggressively pursued validation? Especially as most of this time was in a period of severely declining budgets, I doubt it. One of the arguments against validation made in meetings I attended in 1987 was that they did not have the time or budget to spend on validating. The budget during the Cold War was luxurious by today’s standards.

If there have been meaningful validations done, I would love to see the validation reports. The proof is in the pudding…..send me the validation reports that will resolve all doubts.

Engaging the Phalanx

The Military Operations Research Society (MORS) publishes a periodical journal called the Phalanx. In the December 2018 issue was an article that referenced one of our blog posts. This took us by surprise. We only found out about thanks to one of the viewers of this blog. We are not members of MORS. The article is paywalled and cannot be easily accessed if you are not a member.

It is titled “Perspectives on Combat Modeling” (page 28) and is written by Jonathan K. Alt, U.S. Army TRADOC Analysis Center, Monterey, CA.; Christopher Morey, PhD, Training and Doctrine Command Analysis Center, Ft. Leavenworth, Kansas; and Larry Larimer, Training and Doctrine Command Analysis Center, White Sands, New Mexico. I am not familiar with any of these three gentlemen.

The blog post that appears to be generating this article is this one:

Wargaming Multi-Domain Battle: The Base Of Sand Problem

Simply by coincidence, Shawn Woodford recently re-posted this in January. It was originally published on 10 April 2017 and was written by Shawn.

The opening two sentences of the article in the Phalanx reads:

Periodically, within the Department of Defense (DoD) analytic community, questions will arise regarding the validity of the combat models and simulations used to support analysis. Many attempts (sic) to resurrect the argument that models, simulations, and wargames “are built on the thin foundation of empirical knowledge about the phenomenon of combat.” (Woodford, 2017).

It is nice to be acknowledged, although it this case, it appears that we are being acknowledged because they disagree with what we are saying.

Probably the word that gets my attention is “resurrect.” It is an interesting word, that implies that this is an old argument that has somehow or the other been put to bed. Granted it is an old argument. On the other hand, it has not been put to bed. If a problem has been identified and not corrected, then it is still a problem. Age has nothing to do with it.

On the other hand, maybe they are using the word “resurrect” because recent developments in modeling and validation have changed the environment significantly enough that these arguments no longer apply. If so, I would be interested in what those changes are. The last time I checked, the modeling and simulation industry was using many of the same models they had used for decades. In some cases, were going back to using simpler hex-games for their modeling and wargaming efforts. We have blogged a couple of times about these efforts. So, in the world of modeling, unless there have been earthshaking and universal changes made in the last five years that have completely revamped the landscape….then the decades old problems still apply to the decades old models and simulations.

More to come (this is the first of at least 7 posts on this subject).

Afghan Security Forces Deaths Top 45,000 Since 2014

The President of Afghanistan, Ashraf Ghani, speaking with CNN’s Farid Zakiria, at the World Economic Forum in Davos, Switzerland, 25 January 2019. [Office of the President, Islamic Republic of Afghanistan]

Last Friday, at the World Economic Forum in Davos, Switzerland, Afghan President Ashraf Ghani admitted that his country’s security forces had suffered over 45,000 fatalities since he took office in September 2014. This total far exceeds the total of 28,000 killed since 2015 that Ghani had previously announced in November 2018. Ghani’s cryptic comment in Davos did not indicate how the newly revealed total relates to previously released figures, whether it was based on new accounting, a sharp increase in recent casualties, or more forthrightness.

This revised figure casts significant doubt on the validity of analysis based on the previous reporting. Correcting it will be difficult. At the request of the Afghan government in May 2017, the U.S. military has treated security forces attrition and loss data as classified and has withheld it from public release.

If Ghani’s figure is, in fact, accurate, then it reinforces the observation that the course of the conflict is tilting increasingly against the Afghan government.

 

What Multi-Domain Operations Wargames Are You Playing? [Updated]

Source: David A. Shlapak and Michael Johnson. Reinforcing Deterrence on NATO’s Eastern Flank: Wargaming the Defense of the Baltics. Santa Monica, CA: RAND Corporation, 2016.

 

 

 

 

 

 

 

[UPDATE] We had several readers recommend games they have used or would be suitable for simulating Multi-Domain Battle and Operations (MDB/MDO) concepts. These include several classic campaign-level board wargames:

The Next War (SPI, 1976)

NATO: The Next War in Europe (Victory Games, 1983)

For tactical level combat, there is Steel Panthers: Main Battle Tank (SSI/Shrapnel Games, 1996- )

There were also a couple of naval/air oriented games:

Asian Fleet (Kokusai-Tsushin Co., Ltd. (国際通信社) 2007, 2010)

Command: Modern Air Naval Operations (Matrix Games, 2014)

Are there any others folks are using out there?


A Mystics & Statistic reader wants to know what wargames are being used to simulate and explore Multi-Domain Battle and Operations (MDB/MDO) concepts?

There is a lot of MDB/MDO wargaming going on in at all levels in the U.S. Department of Defense. Much of this appears to use existing models, simulations, and wargames, such as the U.S. Army Center for Army Analysis’s unclassified Wargaming Analysis Model (C-WAM).

Chris Lawrence recently looked at C-WAM and found that it uses a lot of traditional board wargaming elements, including methodologies for determining combat results, casualties, and breakpoints that have been found unable to replicate real-world outcomes (aka “The Base of Sand” problem).

C-WAM 1

C-WAM 2

C-WAM 3

C-WAM 4 (Breakpoints)

There is also the wargame used by RAND to look at possible scenarios for a potential Russian invasion of the Baltic States.

Wargaming the Defense of the Baltics

Wargaming at RAND

What other wargames, models, and simulations are there being used out there? Are there any commercial wargames incorporating MDB/MDO elements into their gameplay? What methodologies are being used to portray MDB/MDO effects?

An Administrative Weakness

Another post is response the comments to this blog post:

The Afghan Insurgents

The comment was “…the insurgents are one side of the coin and the other is the credibility of the government we are trying to create in Afghanistan…If the central government is seen as corrupt and self serving then this also inspires the insurgents and may in fact be the decisive factor….”

This immediately brought to mind David Galula’s construct, which was based upon four major points (see pages 210-211 of America’s Modern Wars):

  1. Insurgents need a cause
  2. A police and administrative weakness
  3. A non-hostile geographic environment
  4. Outside support in the middle to late states.

He specifically state that: “the first two are musts. The last is a help that may become a necessity.”

Now, the problem is that we never took the time to measure an “administrative weakness” or even define what it was. Nor did David Galula. Furthermore, there is also probably an “administrative weakness” or two on the guerilla side. If the culture of Iraq/Afghanistan/Vietnam make it difficult to create government structures and armed forces that are highly motivated, unified and not corrupted, well I suspect some of those same problems exist among the guerillas drawn from that same culture. Therefore, to measure this requires some way of defining what these “administrative weaknesses” are, but also quantifying them, and then determining how they affected both (or more) sides. Needless to say, this was not going to be done in the initial phase of our analysis. We were never funded to conduct follow-up analysis.

This is the problem with David Galula’s construct. There is no easy way to measure it or analyze it. Galula offers no definition of what an “administrative weakness” is. If he does not define it, then how do I define it for his “theory?”

One does note that Galula in his description of the Viet Cong in 1963 states that:

The insurgent has really no cause at all: he is exploiting the counterinsurgent’s weaknesses and mistakes….The insurgent’s program is simply: “Throw the rascals out.: If the “rascals” (whoever is in power in Saigon) amend their ways, the insurgents would lose his cause.

As I note on page 48 of my book:

This was a war that eventually resulted in over 2 million deaths and insurgent force in excess of 300,000. As it is, one could infer from Galula’s statement that he felt that the insurgency could be easily defeated since it was based upon “no real cause.”  We believe that this view has been proven incorrect by historical events.

Clearly identifying insurgent cause and administrative weakness was also a challenge for David Galula.

Hausser Wielding Chalk

The Battle of Prokhorovka took place on 12 July 1943 (and for several days after, depending on definition). The most famous part of the fighting was the attack from the Soviet XVIII Tank Corps and XXIX Tank Corps against the Leibstandarte SS Adolf Hitler Division.

Several stories posted on the web and I gather a few books mention something like: “Several German accounts mention that SS-Obergruppenführer Paul Hausser, commander of the SS Panzer Corps, had to use chalk to mark and count the huge jumble of 93 knocked-out Soviet tanks in the Leibstandarte sector alone.”

Now, this makes for an interesting scene: General Hausser, the 62-year old founder of the Waffen SS, is crawling around the battlefield marking up 93 tanks with chalk. With the Totenkopf SS Division having to continue the offensive on the 13th, and Das Reich SS Division in the days after that, I would think that the SS Panzer Corps commander would have a few more important things to do at this moment. Also suspect that significant parts of the battlefield were still under enemy observation. Its gets a little hard to imagine that Hausser was out there with chalk counting tanks.

Does anyone know the original source of this story?