The Abkhaz National Corpus project

Paul Meurer (UiB) vil gi et seminar om følgende tema: The Abkhaz National Corpus project

Time: Friday March 29, 2019 at 10:15
Venue: HF building, room 400

The Abkhaz National Corpus is a comprehensive and open, grammatically annotated text corpus comprising more than 10 million words. It was built in a joint effort with partners in Abkazia, Georgia and Germany, where Paul Meurer was in charge of the linguistic and computational work in the project. The corpus is hosted at the CLARINO Bergen Centre.
Abkhaz is a lesser-resourced language; prior to this work virtually no computational resources for the language were available. As a member of the West-Caucasian language family, which is characterized by an extremely rich, polysynthetic morphological structure, Abkhaz poses serious challenges to morphosyntactic analysis, the main problem being the high degree of homonymy and morphological ambiguity.
In the talk it will be shown how these challenges can be met, where the main focus will be on the construction of the finite-state morphological analyzer, including the implementation of word stress, and on strategies to cope with ambiguity. The talk will conclude with a demonstration of the corpus as a tool for linguists and for learners of the language.

Leave a Reply