Matthew Miller, Alexander Stoytchev
This paper extends the Voting Experts (VE) algorithm to segment hierarchically structured sequences. The original algorithm was tested on text segmentation, and made use of two proposed characteristics of chunks, namely low internal entropy and high boundary entropy of segments. VE looks for these two properties, and uses them to segment sequences of tokens. It is surprisingly powerful given its simplicity, suggesting that the principle of segmenting based on low internal entropy and high boundary entropy is promising. Real world data often exhibits an inherently hierarchical structure, and it is well known that humans tend to chunk the world hierarchically. It is therefore interesting to explore the applicability of a modified version of VE on hierarchically structured data. We show that VE can be generalized to work on hierarchical data, and also that the higher order models can be used to improve the accuracy of the segmentation at lower levels.
Subjects: 13. Natural Language Processing; 12. Machine Learning and Discovery
Submitted: Apr 7, 2008