One question, different annotation depths
A case study in Early Slavic
AbstractThis paper addresses some of the challenges of carrying out corpus-based linguistic analyses on historical corpora of different sizes and annotation depths. Data from the TOROT Treebank is collected to carry out a case study on Early Slavic dative absolutes, showing the extent to which methodology and results may change depending on the amount of data and the levels of linguistic annotation available. The analysis indicates that deeply-annotated treebanks of limited size can be exploited to establish a solid guideline to analyze a phenomenon in shallowly-annotated corpora and even new, unannotated texts. This is particularly encouraging for historical languages, such as Early Slavic, showing very high diatopic and diachronic variation, which significantly undermines corpus-annotation automation and therefore calls for alternative strategies to counteract data scarcity.
Copyright (c) 2022 Nilo Pedrazzini
This work is licensed under a Creative Commons Attribution 4.0 International License.
Articles appearing in Journal of Historical Syntax are published under a Creative Commons Attribution License. Authors retain copyright.