Layout and Language: A Corpus of Documents Containing Tables

Authors

Matthew Hurst

Track:

Contents

Downloads:

Abstract:

Though the field of information extraction (IE) has been one of the most successful to come from the NLP/CL community, it has thus far concentrated on documents with simple logical structure. It seems appropriate, however, to consider more complex documents -- not solely due to the improvement in extraction technologies, but also due to the content and meta-information that complex structure can offer the IE task. One often used but never exploited component of a more complex document model is the table. Its utility is compact information presentation, a deftnite boon for IE processes, however it can also offer more .information for discourse and domain knowledge sub-processes. A table processing system (TabPro) is being developed and part of that research requires the construction of a corpus of documents containing tables. This corpus is used for training classification processes and evaluating the performance of the system as a whole. A complete description of the model of tables mentioned in this paper can be found in Hurst (1999).

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.