Forschung

A sketch grammar of Northeast Geg Albanian from southern Kosovo

Abstract

Northeast Geg is a variety of the Albanian language mainly spoken in Northeast Albania and Kosovo. Albanian is an isolate within the Indo-European language family. Spoken Geg is especially under-researched, also because of its almost absence in Standard Albanian, which is based on the southern Tosk dialect. The Geg dialect has infinitives consisting of the particle ‘me’ and a participle, where Tosk uses subjunctive constructions, a future tense built with the auxiliary ‘to have’ and the infinitive, where Tosk uses the auxiliary ‘to want’ and a subjunctive construction, and different uses of certain tenses like the imperfect (also with different forms than in Tosk). This sketch grammar describes the variety of the two cities Suhareka and Zaqishti in southern Kosovo based upon the data collection of twelve native speakers of said region, including monologic narrations as well as elicitations. These data also served for the publication of a trilingual online dictionary called Geglex in Albanian – German – English of about 2800 entries. This grammar contains information about phonetics and phonology, morphology, syntax, three transcribed sample texts, as well as further noun and verb paradigms.

Book available here: https://lincom-shop.eu

UD Gheg Pear Stories: An annotated treebank of Gheg Albanian as spoken in Switzerland

Abstract

This paper presents the Gheg Albanian Pear Stories treebank to be released in Nov. 2022, which is the first resource for Gheg in the Universal Dependencies (UD) treebank collection (Nivre et al. 2020). It also provides a special combination of spoken modality and heritage language, which both are underrepresented in UD and corpus resources in general. We provide a short description of the grammatical features of Gheg, and how they translate to categories in the UD annotation scheme in contrast with the Standard Albanian resources of Kote et al. (2019) and Toska, Nivre, and Zeman (2020). Special reference is given to the challenges arising from the spoken modality and the multi-lingual context, like disfluency, repair, and code-switching.

Full text available here: https://doi.org/10.21203/rs.3.rs-2056973/v1