Peter Tino and Barbara Hammer
We have recently shown that when initiated with ``small'' weights, many connectionist models with feedback connections are inherently biased towards Markov models, i.e. even prior to any training, dynamics of the models can be readily used to extract finite memory machines. In this study we briefly outline the core arguments for such claims and generalize the results to recursive neural networks capable of processing ordered trees. In the early stages of learning, the compositional organization of recursive activations has a Markovian structure: Trees sharing a top subtree are mapped close to each other. The deeper is the shared subtree, the closer are the trees mapped.