My Blog



Learning Classifier Systems

Distilled from Sigevolution magazine interview of with Stewart W. Wilson

He mentions four key points of classifier systems

  1. Classifier Systems: A learning system based on evolving a population of condition-action rules. (Holland) The rules (classifiers) compete and/or cooperate to obtain maximum payoff (reinforcement) from the environment. They evolve based on their ability to get payoff, resulting in a system containing rules that better and better fit the environment, including an environment that changed with time.
  2. Bucket-brigade technique: A solution for delayed payoff from the environment, where payoff is not provided upon each action of the system, but only did so at the end of a sequence of actions. The bucket-brigade technique built on an idea of Samuels for checkers, and in effect passed payoff back to classifiers that set the stage for later payoff achieving actions. The bucket brigade helped inspire the field of reinforcement learning and was one of its first examples.
  3. Determining classifier fitness: Originally the payoff amount a classifier received determined its fitness. However, fitness basing on the accuracy of a classifier's prediction of payoff- instead of on the payoff itself-resulted in much better performance and led to the first systems that worked robustly and reliably. They also generalized accurately over environmental regularities.
  4. Introduction of classifiers that compute their payoff prediction instead of simply asserting a scalar value. The computation is usually a linear function of the input vector, but can be higher-order as well. This leads to stronger generalization ability as well as use of the system as an adaptive function approximation.

My interest is in their ability to discover control actions that can solve real-world reinforcement learning problems

Benchmarks

I have a collection of benchmark examples for control problems in Matlab. It is useful to try different algorithms on these to determine their strengths and limitation.
The benchmarks I have are typically low dimensional nonlinear dynamic systems . These have typically been used by many researchers for optimal control and reinforcement learning systems. The include the acrobot; the inverted pendulum; ball and beam dynamics - I will evenually upload the m files for these which in some cases include a moving graphic of the dynamics.

I have just found that Nathan Sturtevant an assistant professor in the Computer Science Department of the University of Denver has a benchmarks page with a large collection of mazes and commercial game benchmarks for pathfinding algorithm evaluation.

CARLA Algorithm

The Continuous Action Reinforcement Learning algorithm (CARLA) is now available im Matlab format to download.
It is also available as a demonstration written in Java so should run on most computer systems. One of the demos shows a simple 2 parameter optimization problem and the other shows the effect of the moving end point algorithm where the range over which the probability distribution covers can change. This enables the algorithm to zoom in and out as the environment changes as the demo show.

CARLA Java Demo for function optimization

CARLA Java Demo for optimization in non-stationary environments


The code (in Matlab) is available here: CARLA.zip


Please email me ( mnwhowell@gmail.com) if you want the Java code.

I have recently extended it to work with multiple cost functions so that the algorithm finds the pareto optimal solution. ( I will share this soon)

Our Christmas Tree

picture

Happy Christmas everybody

Pong

The book I am (still) reading

Out of Control: The new biology of machines, social systems and the economic world

by Kevin Kelly

describes an experiment carried out using the game of pong where Loren Carpenter (see video here) demonstrated that an audience of many thousand could make collective decisons to play the game of pong by moving paddles. The only feedback to the player was through watching the game unfold.
I thought I would try this with learning automata with 2 actions each. This can be achieved with just a single learning automata on each side or using a collection of learning automata. I can even use the CARLA algorithm but discretize the continuous action set. It will be interesting to see whether the decisions of multiple automata have a smaller average error and better performance than a single automata.