Location: Bowdoin / Computer Science / Student Projects / 2005 / Melissa Perrin

Computer Science

Melissa Perrin '05

Using Inductive Logic Programming to Improve the Effectiveness of Negative Advice in a Reinforcement Learning Framework

As a Surdna Undergraduate Research Fellow, Melissa spent the summer of 2004 working with Professor Stephen Majercik exploring the potential to improve the performance of a reinforcement learning agent by incorporating human advice in the learning process.  In particular, she studied possible techniques for incorporating negative advice (advice about what not to do).

An ordinary reinforcement learning agent learns how to perform a task by trial and error. By repeatedly choosing actions for the states it finds itself in and receiving feedback in the form of rewards (both positive and negative), the agent learns which actions lead to more favorable outcomes, and eventually, learns the optimal behavior for that environment. For large worlds (worlds with many states the agent could be in) with sparsely distributed rewards, the time it takes for the agent to converge on the correct behavior can become prohibitively long. In addition, the agent can harm, or even destroy, itself in the process of learning which actions to avoid.

Human advice has proved helpful in reducing the convergence time, but thus far, the bulk of the research on using advice has concerned positive advice. Much available human advice is negative (particularly advice that keeps an agent safe). As Lehigh Professor Mark Bickhard states, "negative information is a form of knowledge that requires less information than positive knowledge; it will tend to be an earlier aspect of knowledge construction. Knowledge of error precedes knowledge of how to avoid error." In the same way, providing negative advice is less difficult than providing positive advice, but gives the agent less guidance about how to act. Not surprisingly, preliminary results suggested that negative advice is significantly less effective than positive advice in accelerating the learning process.

The idea for this project began with the observation that a sufficient amount of negative advice about a particular state is equivalent to positive advice. Increasing the amount of negative advice by generalizing the negative advice provided (by using, for example, inductive logic programming) would increase its effectiveness by allowing it to be applied to a larger subset of states. For example, if action A is not a good idea in states 1, 2, and 3, perhaps it is not a good idea in other states that share certain characteristics with these states. To what extent can these generalizations of negative advice amplify its power? When are generalizations safe to use? Should a generalization be used in the same way as direct advice from the human?

% Advice file
% Melissa S. Perrin
% 26 January 2005