The database contains audio and video material related to traditional culture - songs, folktales, legends, life stories and various collective or individual folklore related performances. The content has been either specifically contributed to the Archives of Latvian Folklore or collected by its staff members.
Parallel treebanks with annotation of syntax, discourse, coreference, morphology, and semantics. Version 3 also includes the Danish Dependency Treebank (version 1) and the Danish-English Parallel Dependency Treebank (version 2).
140 million words; Corpus of the Contemporary Lithuanian Language which comprises 160 million words is a collection of texts designed to represent current Lithuanian. The corpus is compiled from printed material during Lithuania's independence period (since 1990). The corpus is designed to represent as wide a range of contemporary written Lithuanian as possible. The largest part of the corpus is comprised of General Press (texts from regional and national newspapers), Popular Press, and Special Press (specialized newspapers and magazines). These texts have been intended for general readers, as well as specialists. The rest of the corpus consists of Fiction, Memoirs, other literature (scientific and popular), and various official texts. The larger part of the corpus is freely accessible for online search at http://donelaitis.vdu.lt.
written general monolingual synchronic (1959-) reference corpus archive; 5.4 billion words; structural information down to sentence level, rich bibliographic metadata, partial layout information, fully morpho-syntactically annotated
The dictionary is based on Lithuanian-Latvian dictionary (1995) by Jons Balkevičs, Laimute Balode, Apolonija Bojāte, Valters Subatnieks, ed. by Alberts Sarkanis. It contains ca. 60 00 lexical entries, inclusion of morphlogical analysis tools allows search for word forms.
Diachronic Corpus of Early Written Latvian Texts (16-18th c.). > 1 mill. running words (work is on-going). The main data are ecclesiastical texts, secular texts (laws, fiction) and some first bilingual (Latvian-German) dictionaries. A KWIC-based concordancer, as well as inverse vocabulary, frequency lists and word lists are provided. Some source facsimiles are available.