R. Bharat Rao, Scott Rickard, and Frans Coetzee
This paper describes our work in learning on-line models that forecast real-valued variables in a high-dimensional space. A 3GB database was collected by sampling 421 real-valued sensors in a cement manufacturing plant, once every minute, for several months. The goal is to learn models that, every minute, forecast the values of all 421 sensors for the next hour. The underlying process is highly non-stationary: there are abrupt changes in sensor behavior (time-frame: minutes), semi-periodic behavior (time-frame: hours--days), and slow long-term drift in plant dynamics (time-frame: weeks--months). Therefore, the models need to adapt on-line as new data is received; all learning and prediction must occur in real-time (i.e., one minute). The learning methods must also deal with two forms of data corruption: large amounts of data are missing, and what is recorded is very noisy. We have developed a framework with multiple levels of adaptation in which several thousand incremental learning algorithms that adapt on-line are automatically evaluated (also on-line) to arrive at the "best" predictions. We present experimental results to show that by combining multiple learning methods, we can automatically learn good models for time-series prediction without being provided with any physical models of the underlying dynamics.