Distilled from Sigevolution
magazine interview of with Stewart W. Wilson
He mentions four key points of classifier systems
My interest is in their ability to discover control actions that can solve real-world reinforcement learning problems
I have a collection of benchmark examples for control problems in Matlab.
It is useful to try different algorithms on these to determine their strengths and limitation.
The benchmarks I have are typically low dimensional nonlinear dynamic systems .
These have typically been used by many researchers for optimal control and reinforcement learning systems.
The include the acrobot; the inverted pendulum; ball and beam dynamics - I will evenually upload the m files for these
which in some cases include a moving graphic of the dynamics.
I have just found that Nathan Sturtevant an assistant professor in the Computer Science Department of the University of Denver has a benchmarks page with a large collection of mazes and commercial game benchmarks for pathfinding algorithm evaluation.
The Continuous Action Reinforcement Learning algorithm (CARLA) is now available im Matlab format to download.
It is also available as a demonstration written in Java so should run on most computer systems.
One of the demos shows a simple 2 parameter optimization problem and the other shows the effect of
the moving end point algorithm where the range over which the probability distribution covers can
change. This enables the algorithm to zoom in and out as the environment changes as the demo show.
CARLA Java Demo for function optimization
CARLA Java Demo for optimization in non-stationary environments
The code (in Matlab) is available here: CARLA.zip
Please email me ( mnwhowell@gmail.com) if you want the Java code.
I have recently extended it to work with multiple cost functions so that the algorithm finds the pareto optimal solution. ( I will share this soon)
Happy Christmas everybody
The book I am (still) reading
Out of Control: The new biology of machines, social systems and the economic world
by Kevin Kelly
describes an experiment carried out using the game of pong where Loren Carpenter
(see video here)
demonstrated that an audience of many thousand could make collective decisons to play the game of pong by
moving paddles. The only feedback to the player was through watching the game unfold.
I thought I would try this with learning automata with 2 actions each.
This can be achieved with just a single learning automata on each side or using a collection of learning automata.
I can even use the CARLA algorithm but discretize the continuous action set.
It will be interesting to see whether the decisions of multiple automata have a smaller average error and better performance than
a single automata.