Changhua Yu, Michael T. Manry, and Pramod Lakshmi Narasimha, The University of Texas at Arlington
In this paper, the effects of nonsingular affine transforms on various nonlinear network training algorithms are analyzed. It is shown that gradient related methods, are quite sensitive to an input affine transform, while Newton related methods are invariant. These results give a connection between pre-processing techniques and weight initialization methods. They also explain the advantages of Newton related methods over other algorithms. Numerical results validate the theoretical analyses.