Parallel Distributed Processing:

Chapter 1 — The Appeal of Parallel Distributed Processing

 

The publication of Parallel Distributed Processing in 1986 signaled the beginning of the connectionist revolution in cognitive science. Connectionism can actually trace its roots back to the work of Donald Hebb in the 40s (even to William James in the 1890s) and especially to McCulloch and Pitts’ work on perceptrons in the early 1940s. Unfortunately interest in connectionist style models all but died in 1969 when Marvin Minsky and Seymour Papert published Perceptrons , a book which showed the severe limitations of the state of the art at the time (e.g. perceptrons were not capable of calculating a simple exclusive OR). The PDP volumes showed conclusively that perceptrons were just the tip of the iceberg with regard to the power of connectionist architectures.

Perceptrons had always operated on a simple input-output model similar to the behaviorist stimulus-response paradigm. The PDP group showed that changing the structure of the networks can free the resulting systems from the limitations shown by Minsky and Papert. These changes came by adding additional levels to the network architecture (see figure below). These models (which are called "feed-forward" because information always travels from the input layer toward the output layer) in conjunction with a learning rule called "back-propagation," have been used in countless models of cognition and are what most people think of when discussing neural networks. PDP models are additionally seductive because of their supposed relationship to the brain’s architecture. Unfortunately, this correlation has been shown to be quite weak in the case of feed-forward networks. Do not make the same mistake that Minsky and Papert did however! Feed-forward networks represent only one type of connectionist architecture and their weaknesses do not necessarily apply to other connectionist systems. Regardless, the feed-forward paradigm remains the dominant form of neural modeling in use to this day. The algorithms have been improved, but the basic ideas remain the same. This chapter serves as a nice introduction to why PDP models have become so popular. As you read it (particularly computer scientists) think about how the processing styles described differ from traditional computer programs. The authors go on at length about the advantages of PDP models. What might some of the disadvantages be? If back-propagation how learning works in the brain are these models still worth studying?