Technology

The DoD Needs Strategy Robot’s Strategic Reasoning AI Capability

Most military settings are imperfect-information games. They are games not in the sense of being recreational but in the sense that there is at least one adversary. They are games of imperfect information in that there is fog of war: the player does not know the state of the world exactly—for example, an adversary’s resources, capabilities, locations, and readiness. Imperfect-information games are much more difficult than perfect-information games, and algorithms for perfect-information games (such as Deep Blue, AlphaGo, AlphaZero, and MuZero) do not apply at all because they cannot address issues such as deceiving effectively, understanding deception by others, and reveal/conceal. [1]

Revolutionizing Military Planning with Advanced Strategies

Today, military planning settings are not treated in a sophisticated way as imperfect-information games. Instead, in military planning, war gaming, command and control, and doctrine generation, plans (that is, strategies in game theory terminology) are generated manually based on gut feel, with simulation or table-top exercises sometimes as a support tool. In military settings, Red and Blue typically have more possible strategies than the number of atoms in the universe. Yet only a tiny number of strategies for Red (typically 1-5) and for Blue (typically 1-100) are tested. So, only a vanishingly small portion of the strategy space is evaluated, causing enormous risks and overestimation of the strength of Blue’s strategies (because Red’s strategy space is not fully explored) and leaving significant opportunities on the table (because Blue’s strategy space is not fully explored). Also, today’s military plans are typically deterministic, while randomization is sometimes needed to make strategies non-exploitable (as a simple example, consider Rock-Paper-Scissors). Furthermore, hiring people to play Red in training, and hiring humans to do planning, is costly and slow.

In much of DoD’s planning today, attention is paid to Blue’s logistical constraints, timing, etc., but very little attention is given to Red’s strategy, which affects Blue’s outcomes. The weaknesses of that approach have only been exposed to a limited extent because we have been fighting against adversaries that are much smaller and have drastically weaker assets. However, now with attention appropriately shifting to peer adversaries, that approach is woefully inadequate.

Pioneering Superhuman Decision Making in High-Stakes Settings

In contrast, Strategy Robot’s computational game theory tools compute optimal—and usually novel—strategies for Blue and Red simultaneously, taking into account that Blue’s optimal strategy depends on Red’s strategy and Red’s optimal strategy depends on Blue’s strategy. This is called strategic reasoning in game theory because one side has to think about the other side’s strategy in order to intelligently decide what to do.

Blue’s best strategy depends on Red’s strategy and Red’s best strategy depends on Blue’s strategy.

Unlike in simulation, the players’ strategies are output rather than input. There is significant opportunity to improve planning, replanning, training, and command and control using modern computational game theory AI techniques. This kind of AI technology can also be used to evaluate given strategies for Blue and/or for Red, regardless of whether the strategies were generated by hand or computationally. These techniques can thus also be used to evaluate and train officers playing against them.

Computational game theory AI techniques have enjoyed a dramatic increase in computational scalability recently. With his team, Dr. Sandholm has developed the fastest algorithms for most game classes, including extensive-form imperfect-information games, normal-form games, games with just simulator access, and many structured DoD games of significance. The team that he leads is the multi-time world champion [Association for the Advancement of Artificial Intelligence (AAAI) Annual Poker Competition] in AI-versus-AI heads-up no-limit Texas hold’em, which was the main benchmark and decades-open challenge problem for testing algorithms for solving imperfect-information games. That game has 10¹⁶¹ situations that a player can face. Then, in well-recognized AI milestone events, his AIs reached superhuman level in the two-player [Science, 2018] and multiplayer setting [Science, 2019]. These milestones were recognized by many of the leading awards.

That was a tipping point. There are two broad reasons why AI is deployed: labor savings and superhuman decision making. In high-stakes settings, the main reason is the latter. Now that this kind of AI has reached superhuman level, there is a great opportunity to apply this technology broadly into the DoD, Department of State, and other government agencies. Strategy Robot is focused on such applications. The technology, productized into Strategy Robot’s Game Solving System (GSS), has been developed with over $60M over 22 years.

Typically other players have information that we do not have, and vice versa. Therefore, a player must consider what others’ actions signal about the others’ private information. Conversely, the player needs to consider what his actions signal about his private information to others. He has to strike a tradeoff between exploiting his information and hiding some of it for future use. Game theory (the seminal Nash equilibrium solution concept and its refinements) provides the sound definitions of what optimal strategies for the players are and how the players’ actions should be interpreted as signals. A Nash equilibrium is a profile of strategies, one strategy per player, such that no player can increase his own expected utility by deviating to a different strategy. This is the only sound approach in the sense that it does not require any ad hoc assumptions about the meaning of signals.

Better than ML

Why is Strategy Robot’s Computational Game Theory AI Better than Machine Learning (Including Reinforcement Learning) for these Problems?

Doesn’t require training data: A major advantage of computational game theory AI over machine learning (specifically supervised learning such as deep learning) in these problems is that it does not necessarily require any data about how others have played in the past or how they will play in the future. In contrast, machine learning would typically require more data than is available.
Doesn’t assume enemies will act as in the past: Machine learning assumes that others will behave the same way as they have behaved in the past. The game-theoretic approach does not assume that.
Only way to generate nonexploitable strategies: Machine learning techniques (including both supervised learning and reinforcement learning) are frail: they generate strategies that the opponent can exploit. This has been proven analytically, and it has been observed in practice even in landmark cases in war games such as DOTA2 (by OpenAI) and Starcraft II (by Google DeepMind). In DOTA2, while a reinforcement-learning AI was able to beat a top human team, after a few weeks, even average humans were able to beat the AI once they found its weaknesses. [1] In Starcraft II, five AIs were generated by reinforcement-learning, hard-coded modules, and other techniques. When two top humans were allowed to play once against each of the AIs, the AI won 10 straight games. However, as soon as one of the humans was allowed to play one of the AIs a second time, the human found a hole in its strategy and won. [2] Since then, humans have found many further weaknesses in those Starcraft II strategies. [3] This is in sharp contrast to computational game theory. When Dr. Sandholm’s AI Libratus played against the top humans in heads-up no-limit Texas hold’em in January 2017, the match consisted of 120,000 repetitions of the game to get statistical significance, and the humans—even though they coordinated their extensive exploration—were not able to learn to beat the AI. Then, before a competition in April 2017, a team of strong Chinese poker players had scraped from a Twitch entertainment stream the 120,000 hands (that is, around 1,000,000 example actions of the AI’s play). Even with that substantial advantage, the Chinese players lost to Libratus very significantly. Furthermore, Dr. Sandholm’s more recent AI Pluribus beat the top humans at multi-player poker with statistical significance. These tests showed that the robustness of the game-theoretic approach (which is a proven theorem for two-player zero-sum games [von Neumann 1928]) applies to computational approximations of equilibrium in practice, and even in multi-player games.