Reinforcement Learning for Selfish Load Balancing in a Distributed Memory Environment Stephen M. Majercik and Michael L. Littman December 1996 Abstract Load balancing is a difficult problem whose solution can greatly increase the speedup one achieves in a parallel distributed memory environment. The necessity for load balancing can arise not only from the structure or dynamics of one's problem, but from the need to compete for processor time with other users. Given a lengthy computation, the ability to exploit changes in processor loads when allocating work or deciding whether to reallocate work is critical in making the computation time-feasible. We show how this aspect of the load balancing problem can be formulated as a Markov decision process (MDP), and describe some preliminary attempts to solve this MDP using guided off-line Q- earning and a linear value-function approximator. In particular, we describe difficulties with value-function approximator divergence and techniques we applied to correct this problem.