Format: application/octet-stream / Harvested from: LINDAT/CLARIAH-CZ repository

1. Audio and video database of Latvian folklore

Publisher:: Archives of Latvian Folklore, Institute of Literature, Folklore and Art, University of Latvia
Format:: application/octet-stream
Type:: corpus
Language:: Latvian
Description:: The database contains audio and video material related to traditional culture - songs, folktales, legends, life stories and various collective or individual folklore related performances. The content has been either specifically contributed to the Archives of Latvian Folklore or collected by its staff members.
Rights:: Not specified

2. Beaver corpus

Format:: application/octet-stream
Type:: corpus
Description:: Documentation of the Beaver project (DoBeS project)
Rights:: Code of conduct

3. Comparable Russian-Finnish corpus of juridical texts

Publisher:: University of Tampere
Format:: application/octet-stream
Type:: corpus
Language:: Finnish and Russian
Description:: Juridical texts in Russian and Finnish arranged as a comparable text corpus
Rights:: Not specified

4. Copenhagen Dependency Treebanks versions 1-3

Publisher:: Copenhagen Business School
Format:: application/octet-stream
Type:: corpus
Subject:: parallel treebank, POS annotation, discourse annotation, morphological annotation, syntactic annotation, and semantic annotation
Language:: Danish, English, German, Italian, and Spanish
Description:: Parallel treebanks with annotation of syntax, discourse, coreference, morphology, and semantics. Version 3 also includes the Danish Dependency Treebank (version 1) and the Danish-English Parallel Dependency Treebank (version 2).
Rights:: GNU General Public License

5. Corpus of the Contemporary Lithuanian Language

Publisher:: Center of Computational Linguistics, Vytautas Magnus University
Format:: application/octet-stream
Type:: corpus
Language:: Lithuanian
Description:: 140 million words; Corpus of the Contemporary Lithuanian Language which comprises 160 million words is a collection of texts designed to represent current Lithuanian. The corpus is compiled from printed material during Lithuania's independence period (since 1990). The corpus is designed to represent as wide a range of contemporary written Lithuanian as possible. The largest part of the corpus is comprised of General Press (texts from regional and national newspapers), Popular Press, and Special Press (specialized newspapers and magazines). These texts have been intended for general readers, as well as specialists. The rest of the corpus consists of Fiction, Memoirs, other literature (scientific and popular), and various official texts. The larger part of the corpus is freely accessible for online search at http://donelaitis.vdu.lt.
Rights:: Not specified

6. Croatian Dependency Treebank

Publisher:: University of Zagreb, Faculty of Humanities and Social Sciences
Format:: application/octet-stream
Type:: corpus
Language:: Croatian
Description:: Manually tagged dependency treebank, analytical layer according to the PDT formalism adapted for Croatian
Rights:: Not specified

7. Croatian Frequency Dictionary

Publisher:: University of Zagreb, Faculty of Humanities and Social Sciences
Format:: application/octet-stream
Type:: lexicalConceptualResource
Language:: Croatian
Description:: 38,573 lemmas, plain text; database file
Rights:: Not specified

8. Deutsches Referenzkorpus (DeReKo)

Publisher:: Institut für Deutsche Sprache
Format:: application/octet-stream
Type:: corpus
Subject:: Germanistik
Language:: German
Description:: written general monolingual synchronic (1959-) reference corpus archive; 5.4 billion words; structural information down to sentence level, rich bibliographic metadata, partial layout information, fully morpho-syntactically annotated
Rights:: non-commercial, non-download license, EULA: http://www.ids-mannheim.de/cosmas2/projekt/registrierung/ and http://www.ids-mannheim.de/cosmas2/projekt/registrierung/

9. Dictionaries of Luxembourgish

Publisher:: University of Luxembourg
Format:: application/octet-stream
Type:: corpus
Language:: Luxembourgish
Description:: Online database of three older dictionaries of Luxembourgish from 1849, 1905, and 1950
Rights:: Not specified

10. Digital Morphology Archives for Finnish Dialects

Publisher:: CSC - the Finnish IT Center for Science and University of Helsinki
Format:: application/octet-stream
Type:: corpus
Language:: Finnish
Description:: A morphologically annotated digital database of 159 Finnish parish dialects containing transcribed sentences of spontaneous dialectal speech
Rights:: Not specified

11. Estonian Dialect Corpus

Publisher:: University of Tartu
Format:: application/octet-stream
Type:: corpus
Language:: Estonian
Description:: Recordings of different Estonian dialects, 900000 words, transcribed and partly (400000 words) morphologically annotated
Rights:: Not specified

12. Estonian-Latvian dictionary

Publisher:: Tilde
Format:: application/octet-stream
Type:: lexicalConceptualResource
Language:: Estonian and Latvian
Description:: Estonian-Latvian dictionary is based on dictionary of K.Aben and suplemented with new lexical entries of modern lexica, ca. 26 000 lexical entries
Rights:: Not specified

13. Eurotermbank

Publisher:: Tilde and Eurotermbank consortium
Format:: application/octet-stream
Type:: lexicalConceptualResource
Language:: English, Estonian, French, German, Hungarian, Latvian, and Lithuanian
Description:: EuroTermBank is single access point to European multilingual terminology resources. It contains more than 1.9 million terms over 25 languages
Rights:: Not specified

14. Helsinki annotated corpus of Russian language HANCO

Publisher:: The Department of Modern Languages, University of Helsinki and University of Helsinki
Format:: application/octet-stream
Type:: corpus
Subject:: Coprus linguistics
Language:: Russian
Description:: Morphologically and syntactically annotated corpus of the modern Russian language.
Rights:: Not specified

15. HNC (Hellenic National Corpus)

Publisher:: Institute for Language and Speech Processing
Format:: application/octet-stream
Type:: corpus
Language:: Modern Greek (1453-)
Description:: General language corpus of standard Modern Greek; 47 MWs
Rights:: Not specified

16. Latvian-Lithuanian Web dictionary

Publisher:: Tilde
Format:: application/octet-stream
Type:: lexicalConceptualResource
Language:: Latvian and Lithuanian
Description:: The dictionary is based on Latvian-Lithuanian dictionary by A. Butkus, ~43 000 entries
Rights:: Not specified

17. Lithuanian-Latvian dictionary

Publisher:: Tilde
Format:: application/octet-stream
Type:: lexicalConceptualResource
Language:: Latvian and Lithuanian
Description:: The dictionary is based on Lithuanian-Latvian dictionary (1995) by Jons Balkevičs, Laimute Balode, Apolonija Bojāte, Valters Subatnieks, ed. by Alberts Sarkanis. It contains ca. 60 00 lexical entries, inclusion of morphlogical analysis tools allows search for word forms.
Rights:: Not specified

18. Luxogramm - Grammatisches Informationssystem zum Luxemburgischen

Publisher:: University of Luxembourg
Format:: application/octet-stream
Type:: languageDescription
Language:: Luxembourgish
Description:: Luxogramm provides grammatical information (paradigms, rules, categories) for all Luxembourgish verbs
Rights:: Not specified

19. Multilingual corpus of juridical texts

Publisher:: University of Tampere
Format:: application/octet-stream
Type:: corpus
Subject:: parallel corpus and multilingual
Language:: English, German, Russian, and Swedish
Description:: International conventions and treaties arranged as a paralell corpus aligned on paragraph level
Rights:: Not specified

20. Nederlandse Familienamen Databank (Dutch Database of Family Names)

Publisher:: Meertens Institute KNAW The Netherlands
Format:: application/octet-stream
Type:: toolService
Language:: Dutch
Description:: Enriched database of (mainly) Dutch family names, based on 1947 census (in progress; currently 90.000 entries from 140.000 max)
Rights:: Meertens Institute KNAW The Netherlands

21. OmegaWiki

Publisher:: Universität Bamberg, World Language Documentation Centre
Format:: application/octet-stream
Type:: lexicalConceptualResource
Language:: Afrikaans, Arabic, Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, Modern Greek (1453-), Hebrew, Hungarian, Icelandic, Indonesian, Interlingua (International Auxiliary Language Association), Irish, Italian, Japanese, Khmer, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Turkish, Ukrainian, and Welsh
Rights:: GFDL or CC and http://www.omegawiki.org/Licensing

22. ParRus, Russian-Finnish parallel corpus of literary texts

Publisher:: University of Tampere
Format:: application/octet-stream
Type:: corpus
Language:: Finnish and Russian
Description:: Russian literary texts (classical literature & 20th century) and their translations into Finnish aligned in paragraph level
Rights:: Not specified

23. Progress test on Russian language KARTTU

Publisher:: The Department of Modern Languages, University of Helsinki and University of Helsinki
Format:: application/octet-stream
Type:: toolService
Language:: Russian
Description:: Progress test on language competence in Russian
Rights:: Not specified

24. SENIE

Publisher:: Department of Baltic Languages, University of Latvia and Institute of Mathematics and Computer Science, University of Latvia
Format:: application/octet-stream
Type:: corpus
Subject:: diachronic corpus
Language:: Latvian
Description:: Diachronic Corpus of Early Written Latvian Texts (16-18th c.). > 1 mill. running words (work is on-going). The main data are ecclesiastical texts, secular texts (laws, fiction) and some first bilingual (Latvian-German) dictionaries. A KWIC-based concordancer, as well as inverse vocabulary, frequency lists and word lists are provided. Some source facsimiles are available.
Rights:: Not specified

1. Audio and video database of Latvian folklore

2. Beaver corpus

3. Comparable Russian-Finnish corpus of juridical texts

4. Copenhagen Dependency Treebanks versions 1-3

5. Corpus of the Contemporary Lithuanian Language

6. Croatian Dependency Treebank

7. Croatian Frequency Dictionary

8. Deutsches Referenzkorpus (DeReKo)

9. Dictionaries of Luxembourgish

10. Digital Morphology Archives for Finnish Dialects

11. Estonian Dialect Corpus

12. Estonian-Latvian dictionary

13. Eurotermbank

14. Helsinki annotated corpus of Russian language HANCO

15. HNC (Hellenic National Corpus)

16. Latvian-Lithuanian Web dictionary

17. Lithuanian-Latvian dictionary

18. Luxogramm - Grammatisches Informationssystem zum Luxemburgischen

19. Multilingual corpus of juridical texts

20. Nederlandse Familienamen Databank (Dutch Database of Family Names)

21. OmegaWiki

22. ParRus, Russian-Finnish parallel corpus of literary texts

23. Progress test on Russian language KARTTU

24. SENIE

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Subject

Show values starting with

Type

Date

Original context has metadata only

Harvested from