Format: application/xml / Harvested from: LINDAT/CLARIAH-CZ repository / Original context has metadata only: true

Publisher:: Center of Computational Linguistics, Vytautas Magnus University
Format:: application/xml
Type:: corpus
Language:: Czech, English, and Lithuanian
Description:: A collection of parallel corpora: English-Lithuanian (2m words), Lithuanian-English (0,06m words), Czech-Lithuanian (0,8m words), Lithuanian-Czech (0,02m words). All the corpora are online-searcheable via one interface at http://donelaitis.vdu.lt/main_en.php?id=4&nr=1_2. The corpus is still being updated with new texts.
Rights:: Not specified

Publisher:: Institute of Mathematics and Computer Science, University of Latvia
Format:: application/xml
Type:: lexicalConceptualResource
Language:: Latvian
Description:: > 25 000 entries
Rights:: Not specified

Publisher:: University of the Basque Country
Format:: application/xml
Type:: lexicalConceptualResource
Language:: Basque
Description:: EDBL (Lexical DataBase for Basque) is the lexical basis needed for the automatic treatment of Basque. It is made up of about 120.000 entries divided into dictionary entries (the same you can find in a conventional dictionay), verb forms and dependent morphemes, all of them with their respective morphological information.
Rights:: Only for research and demonstrative purposes

Publisher:: Academy of Sciences
Format:: application/xml
Type:: corpus
Language:: Hungarian
Description:: Containing 27 million running words the Hungarian Historical Corpus provides a valuable basis for research on the history of words of Hungarian between the second half of the 18th century and 2000.
Rights:: Not specified

Publisher:: Academy of Sciences
Format:: application/xml
Type:: corpus
Subject:: synchronic corpus
Language:: Hungarian
Description:: Written general synchronic reference corpus; 190m tokens; POS annotated XML
Rights:: Not specified

Publisher:: Research Institute for Artificial Intelligence, Romanian Academy of Sciences
Format:: application/xml
Type:: lexicalConceptualResource
Language:: Polish
Description:: currently: about 18 600 lexical units, about 11 000 synsets, planned (by the end of 2008): 25-30 thousands of lexical units
Rights:: Free for non-commercial use

Publisher:: Department of Informatics, Human Language Technology Group, University of Szeged
Format:: application/xml
Type:: corpus
Subject:: monolingual corpus, annotated corpus, and POS annotation
Language:: Hungarian
Description:: written, monolingual, general, manually POS annotated reference corpus; 1,247,546 tokens; MSD tagset, XML (TEIxLite) files
Rights:: Not specified

Publisher:: Department of Informatics, Human Language Technology Group, University of Szeged
Format:: application/xml
Type:: corpus
Subject:: monolingual corpus, annotated corpus, and POS annotation
Language:: Hungarian
Description:: written, monolingual, general, manually POS annotated reference corpus; 1,459,288 tokens; MSD tagset, XML (TEI P4) files
Rights:: Not specified

Publisher:: Department of Informatics, Human Language Technology Group, University of Szeged
Format:: application/xml
Type:: corpus
Language:: Hungarian
Description:: 82,000 sentences with shallow syntactic annotation (NP-level).
Rights:: Not specified

Publisher:: Department of Informatics, Human Language Technology Group, University of Szeged
Format:: application/xml
Type:: corpus
Language:: Hungarian
Description:: 82,000 sentences with full syntactic annotation.
Rights:: Not specified

Search