Towards a historical treebank of Middle and Modern Welsh

Syntactic parsing

  • Marieke Meelen University of Cambridge
  • David Willis University of Oxford
Keywords: Middle Welsh language, Historical corpora, Historical syntax


This article examines various issues involved in constructing a parsed Penn-style representative historical corpus of Middle and Modern Welsh. Specifically, it focuses on what structures to adopt for constituency-based structural descriptions in three case studies: (i) whether to adopt rel- atively more or less hierarchical structures at the phrasal level and above; (ii) how to deal with complex prepositional phrases, typically containing a grammaticalizing or grammaticalized noun as one of their elements; and (iii) how to deal with coordination of main clauses and omission of elements shared between clauses. In each case, we see how conventions need to be adopted that facilitate maximal ease of searching for potential users of the corpus; that are robust across many centuries of language change; and that permit efficient and consistent parsing by a team of annotators.