New classes of load balancing learning automata methods: A dynamical system approach applied to wind turbine fatigue distribution
Abstract
New classes of Load Balancing Learning Automata methods, that equalize rewards or pay-offs (R) for all possible actions, are introduced in this study. The respective automata are designed to operate in a S-Model environment, which responds to any action with a binary domain bounded, continuous reward response ($R \in [0,1]$). Inspired by the momentum based stochastic gradient descent in deep learning, two methods namely the Momentum based Two Time Scale (MTTS) Type-1 and Type-2 approach are established. These methods, with different mathematical forms, incorporate the concept of momentum ($v$) as reinforcement to the action probability update process. This draws parallels with an earlier state of the art Two Time Scale (TTS) approach under a P-Model environment that incorporates the discrete binary reward or reinforcement ($R \in \{0,1\}$), in the action probability update, as the difference between the moving reward of a chosen action and the average of the moving reward of all actions. Similarly, the learning process of the MTTS method, is split into two parts. One where the action probabilities are updated with the momentum as reinforcement, after which the latter itself is updated. The other where the difference in reward gradient between two successive steps is used. It is shown that the MTTS approach converge to the optimal solution faster than both the simple, traditional S-Model based probability update and the state of the art, TTS load balancing methods, which are used as benchmarks. However, it is also discovered that if the hyper-parameters of TTS method are chosen in a certain manner that is against normal convention with which the method was initially defined, it outperforms the MTTS methods. Furthermore, within the MTTS Type-1 and Type-2 framework, an implementation that adjusts the learning rate based on action-reward history is also introduced. Similar approaches have been published in the field of Reinforcement Learning but such an application is missing in Learning Automata literature. Coined as the Prediction Enforced Adaptive Rate of Learning (PEARL) implementation, the automata before executing the chosen action, predicts using polynomial regression, the reward that would be gained and compares it with the actual reward observed after the action is executed. This difference between predicted and expected reward is used to augment the learning process. Essentially in PEARL, the automaton “learns the curve” for each action and uses the information gained to make decisions. Finally, the Jump To Optimal (JTO) method is presented. In this approach, the action probability-reward curves are constructed using historical iterative data to determine the action probability distribution that yield equal rewards for all actions at every iteration, which in turn is used to guide the update process. The results of all methods are compared against each other by assessing their and speed of convergence to the point of optimal action probability distribution and the stability at that point.
As a test of practical utility, they are applied to solve the Wind Farm turbine fatigue distribution problem. Turbines in Wind farms face the onslaught of wind stresses which are dynamical in nature, apart from other forms of dynamic and static stresses depending on whether they are onshore or offshore. The learning automaton methods are used to the achieve the optimal turbine power capture distribution in wind farms, in an effort to equalize thrust. This is done to equalize the fatigue on all turbines, thereby increasing the overall life span of the wind farm and reducing maintenance costs. In particular, the developed load balancing automatons are able to handle stochastic and non-stationary environments, which the turbines experience due to changing wind directions and rapdily fluctuating wind speeds, also called gusts. The environments are modelled using a publicly available dataset from the British company Shell and the wakes are modelled using the well known, Jensen model. The turbulence in the wind, is modelled using the Kaimal Wind Turbuelence Spectra.