This example uses discrete learning Automata to find the best bandit.
Displaying a selection probability over the bandits as they evolve.
Many different learning rules to update the probability distributions have been developed.
The learning rule used here is the linear reward-inaction one.
These are initially set to an equal selection probability for each action and evolve to select the optimal action
(which is shown with a black line underneath it).
I plan to add other options to select different learning rules and number of bandits in the near future.
For futher information on learning automata please refer to my publications
or contact me at mnwhowell@gmail.com.