Early knowledge based systems did not incorporate high-bandwidth I/O due to performance limitations of computers of that era. Today, intelligent agents and robots running on much more powerful computers can incorporate vision, sound, network, sonar and other modes of input. These additional inputs provide much more information about the environment, but bring additional problems related to control of perception. Perceptual input streams (called modes in the psychology literature) can have greatly varying bandwidth. In people, the sense of touch has a low bandwidth, while the sense of vision has a very high bandwidth. The human brain can not completely process all of the information from one high bandwidth mode, much less simultaneously process all the information available from all modes. To control the amount of perceptual input processed, humans use selective attention (Treisman 1993). Computer vision is a diffcult problem which, if solved, could provide robots with a large amount of useful information. However, visual input can not be processed effciently without using a top-down attention mechanism (Tsotsos 1987). Several computational models of visual selective attention have been developed (e.g. (Reece 1992)).