The code-switching corpus consists of 5x30-minute conversations between four speakers (i.e. a total of 20 speakers). The speakers are bilingual speakers of Papiamento (a creole langauge spoken in the Dutch Antilles) and Dutch. In the course of their free conversations, they engage in code-switching, that is, they use both languages within the same utterance in systematic ways. The corpus is fully transcribed and glossed, coded for language and word class, in ELAN.
The presented Czech Named Entity Corpus 1.0 is the first publicly available corpus providing a large body of manually annotated named entities in Czech sentences, including a fine-grained classification. and 1ET101120503 (Integrace jazykových zdrojů za účelem extrakce informací z přirozených textů)
The dictionary is based on Lithuanian-Latvian dictionary (1995) by Jons Balkevičs, Laimute Balode, Apolonija Bojāte, Valters Subatnieks, ed. by Alberts Sarkanis. It contains ca. 60 00 lexical entries, inclusion of morphlogical analysis tools allows search for word forms.