Robert Van Dam
ADtrees, a data structure useful for caching sufficient statistics, have been successfully adapted to grow lazily when memory is limited and to update sequentially with an incrementally updated dataset. For low arity symbolic features, ADtrees trade a slight increase in query time for a reduction in overall tree size. Unfortunately, for high arity features, the same technique can often result in a very large increase in query time and a nearly negligible tree size reduction. In the dynamic (lazy) version of the tree, both query time and tree size can increase for some applications. Here we present two modifications to the ADtree which can be used separately or in combination to achieve the originally intended space-time tradeoff in the ADtree when applied to datasets containing very high arity features.
Subjects: 12. Machine Learning and Discovery; 13. Natural Language Processing
Submitted: Apr 15, 2008