Heidi H. T. Yeung and Peter W. M. Tsang
Representing lexicons and sentences with the subsymbolic approach (using techniques such as Self Organizing Map (SOM) or Artificial Neural Network (ANN)) is a relatively new but important research area in natural language processing. The performance of this approach however, is highly dependent on whether representations are well formed so that members within each cluster are corresponding to sentences or phrases of similar meaning. Despite the moderate success and the rapid advancement of contemporary computing power, it is still difficult to establish an efficient learning method so that natural language can be represented in a way close to the benchmark exhibited by human beings. One of the major problems is due to the general lack of effective method(s) to encapsulate semantic information into quantitative expressions or structures. In this paper, we propose to alleviate this problem with a novel technique based on Tensor Product Representation and Non-linear Compression. The method is capable of encoding sentences into distributed representations that are closely associated with the semantic contents, being more comprehensible and analyzable from the perspective of human intelligence.