Full literary works (e-text, pdf, facsimile) in selected editions provided with scientific commentary and additional secondary materials; both copyright-free older works (still the lion's share) and new works (by licensing agreement with IPR holders' organizations); appr. 150 titles; planned to grow by 80-100 titles annually
The Netherlands Veterans Institute (VI) hosts about 250 interviews (audio) in which Dutch former military personel speak about their experiences during World War II (interviews about the years 1935-1945) and decolonisation in the Dutch East Indies (1945-1950) and Dutch New Guinea (1960-1962). In the project Living Oral History Workbench these interviews have been indexed by automatic speech recognition techniques. The list of interviews and their metadata are available at the CLARIN Center; researchers may apply to VI for access to the data.
Searchable multilingual text collection (700+ mwd) and a dictionary database of 251 languages and dialects. The Dictionary (ca. 8 mwd) provides translation of a word, definition, grammar, synonym, antonym, image, pronunciation, etc.
As a sub-section of MATEO, MARABU (Mannheimer Reihe Altes Buch) includes illustrated books, (manu)scripts and texts on the history of the Electoral Palatinate. Als Unterkategorie von MATEO beinhaltet MARABU (Mannheimer Reihe Altes Buch) illustrierte Bücher, Handschriften und Rarissima, Quellen zur Geschichte der Kurpfalz sowie Beiträge über Frauen des Humanismus.
MEBA is a lexical aligner, implemented in C#, based on an iterative algorithm that uses pre-processing steps: sentence alignment ([[http://www.clarin.eu/tools/sal-sentence-aligner|SAL]]), tokenization, POS-tagging and lemmatization (through [[http://www.clarin.eu/tools/ttl-tokenizing-tagging-and-lemmatizing-free-running-texts|TTL]], sentence chunking. Similar to YAWA aligner, MEBA generates the links step by step, beginning with the most probable (anchor links). The links to be
added at any later step are supported or restricted by the links created in the previous iterations. The aligner has different weights and different significance thresholds on each feature and iteration. Each of the iterations can be configured to align different categories of tokens (named entities, dates and numbers, content words, functional words, punctuation) in decreasing order of statistical evidence.
MEBA has an individual F-measure of 81.71% and it is currently integrated in the platform [[http://www.clarin.eu/tools/cowal-combined-word-aligner|COWAL]].
More detailed descriptions are available in [[http://www.racai.ro/~tufis/papers|the following papers]]:
-- Dan Tufiş (2007). Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Intercultural Collaboration. First International Workshop (IWIC 2007), volume 4568 of Lecture Notes in Computer Science, pp. 103-117. Springer-Verlag, August 2007. ISBN 978-3-540-73999-9.
-- -- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2006). Improved Lexical Alignment by Combining Multiple Reified Alignments. In Toru Ishida, Susan R. Fussell, and Piek T.J.M. Vossen (eds.), Proceedings of the 11th Conference EACL2006, pp. 153-160, Trento, Italy, April 2006. Association for Computational Linguistics. ISBN 1-9324-32-61-2.
-- Dan Tufiş, Radu Ion, Alexandru Ceauşu, and Dan Ştefănescu (2005). Combined Aligners. In Proceedings of the ACL Workshop on Building and Using Parallel Texts: Data-Driven Machine Translation and Beyond, pp. 107-110, Ann Arbor, USA, June 2005. Association for Computational Linguistics. ISBN 978-973-703-208-9.
On Mediaevum.de, a collection of links to Middle High German texts can be found. These texts are made available via the University of Virginia. Auf Mediaevum.de findet sich eine Linksammlung zu diversen mittelhochdeutschen Texten, welche als Volltexte über die University of Virginia erreichbar sind.