Learning Automata with Artificial Reflecting Barriers in Games with Limited Information
Peer reviewed, Journal article
Published version
Date
2022-05-04Metadata
Show full item recordCollections
Original version
https://doi.org/10.32473/flairs.v35i.130850Abstract
This paper deals with the problem of solving stochastic games (which have numerous business and economic applications), using the interesting tools of Learning Automata (LA), the precursors to Reinforcement Learning (RL). Classical LA systems that possess properties of absorbing barriers, have been used as powerful tools in game theory to devise solutions that converge to the game's Nash equilibrium under limited information(Sastry, Phansalkar, and Thathachar 1994). Games with limited information are intrinsically hard because the player does not know the actions chosen of other players, neither their outcomes. The player might not be even aware of the fact that he/she is playing against an opponent. With the state-of-the-art, the numerous works in LA applicable for solving game theoretical problems, can merely solve the case where the game possesses a Saddle Point in a pure strategy. They are unable to reach mixed Nash equilibria when a Saddle Point is non-existent in pure strategies. Additionally, within the field of LA and RL in general, the theoretical and applied schemes of LA with artificial barriers are scarce, even though incorporating artificial barriers in LA has served as a powerful and yet under-explored concept, since its inception in the 1980’s. More recently, the phenomenon of introducing artificial non-absorbing barriers was pioneered, and this renders the LA schemes to be resilient to being trapped in absorbing barriers. In this paper, we devise a LA with artificial barriers for solving a general form of stochastic bimatrix games. The problem’s complexity has been augmented with the scenario that we consider games in which there is no Saddle Point. By resorting to the above-mentioned powerful concept of artificial reflecting barriers, we propose a LA that converges to an optimal mixed Nash equilibrium even though there may be no Saddle Point when a pure strategy is invoked.