Translation by Confusion

Hinrich Schuetze

A new representational scheme for semantic information about words in different languages is introduced. Each word is represented as a vector in a multidimensional space. In order to derive the representations, basis vectors for one language are computed as linear approximations of 5,000 dimensional vectors of cooccurrence counts. Using an aligned corpus, the basis vectors of words occurring close to a target word in one of the languages under consideration are summed to compute the confusion vector of the target word. The paper describes the derivation of the representations for English and French and their application to identifying translation pairs.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.