Vladislav Kubon, Marketa Lopatkova, Martin Platek, Patrice Pognan
The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units, which may provide a basis for further processing of complex sentences. The method has been developed for Czech as a language representing languages with relatively high degree of word-order freedom. The paper introduces important terms, describes a segmentation chart, the data structure used for the description of mutual relationship between individual segments and separators. It contains a simple set of rules applied for the segmentation of a small set of Czech sentences. The issues of segment annotation based on existing corpus are also mentioned.
Subjects: 13. Natural Language Processing; Please choose a second document classification
Submitted: Feb 19, 2007