ANNIS2 is an open source, versatile web browser-based search and visualization architecture for complex multilevel linguistic corpora with diverse types of annotation. ANNIS, which stands for ANNotation of Information Structure, has been designed to provide access to the data of the SFB 632 - "Information Structure: The Linguistic Means for Structuring Utterances, Sentences and Texts". Since information structure interacts with linguistic phenomena on many levels, ANNIS2 addresses the SFB's need to concurrently annotate, query and visualize data from such varied areas as syntax, semantics, morphology, prosody, referentiality, lexis and more. For project working with spoken language, support for audio / video annotations is also required.
Transcribed narrative interviews with people from East and West Berlin about the events of November 9. 282,000 tokens. TEI XML, lemma and POS. Normalized version also available.
Chronology of German literature (Old High German literature, Middle High German literature, Early New High German literature, New High German literature); Chronologie der deutschen Literatur (alt-, mittel-, frühneu-, neuhochdeutsche Literatur)
Digital, morphologically annotated (N, V, A) part of the Bonn Corpus of Early New High German; used to create the Grammatik des Frühneuhochdeutschen (III. Nouns; IV. Verbs; VI. Adjectives); morphologisch annotiert; Materialgrundlage für die Erarbeitung der Bände 3, 4 und 6 der "Grammatik des Frühneuhochdeutschen"
Angabe von orthographischen, morphologischen (Wortformenbildung und Wortbildung) sowie semantischen Informationen (Synonymie; Hyperonymie/Hyponymie); Zuordnung der Wörter zu der jeweiligen syntaktischen Kategorie (bei Substantiven zusätzlich Angabe des Genus)
Sanskrit lexicons. The data is made available as scanned images of the works as well as a digitization of the scanned images, which permits computer-aided analyses and displays of the work. Can be downloaded or queried online.
A co-occurrence database, developed by the Institut fuer Deutsche Sprache, for research in the field of collocation analysis in modern German. The database holds over 200,000 analysed words that can be browsed or searched and shown in context.
Register of decrees as well as texts on the history of Prussia and the Teutonic Order; Regesten und Texte zur Geschichte Preußens und des Deutschen Ordens
A dictionary of old legal German. Includes words up until 1800. Historisches Wörterbuch; Dokumentation von Rechtswörtern sowie Wörtern mit rechtlichen Bezügen (bis etwa 1800)
written general monolingual synchronic (1959-) reference corpus archive; 5.4 billion words; structural information down to sentence level, rich bibliographic metadata, partial layout information, fully morpho-syntactically annotated
Digital copies of historical books and journals from the ULB Münster; collections from the region of Westphalia; Bilddigitalisate von Büchern und Zeitschriften aus dem historischen Bestand der ULB Münster sowie Sammlungen aus der Region Westfalen
The issues of the Polytechnic Journal are available as full texts and as digital copies. Zweifache Verfügbarkeit der Zeitschriftenbände: als Bilddigitalisate sowie als Volltexte
Documents on German history (e.g. German Empire; Weimar Republic; National Socialism; Federal Republic of Germany; German Democratic Republic); Dokumente zur deutschen Geschichte (z.B. Deutsches Kaiserreich; Weimarer Republik; Nationalsozialismus; Bundesrepublik Deutschland; Deutsche Demokratische Republik)
Angabe von Rechtschreibung, Bedeutungsübersicht, Synonymen, Aussprache (Audio-Datei), Herkunft, Grammatik, typischen Verbindungen (computergeneriert) sowie Bedeutungen, Beispielen und Wendungen (zusätzlich: Angabe der Wörter, die im Alphabet vorhergehen und nachfolgen)
German reference corpus. Ca 100 million words, 20th Century. Searchable online. Part of 'Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts' project; Korpus der BBAW; Grundlage des DWDS
setzt sich zusammen aus dem Deutschen Wörterbuch, dem Wörterbuch der Deutschen Gegenwartssprache (WDG) sowie dem Etymologischen Wörterbuch des Deutschen (EtymWb)
Schwerpunkt: Bedeutungs-/Verwendungsbeschreibung; zusätzlich: Angabe von Orthographie, Worttrennung und grammatischen Informationen; befindet sich noch im Aufbau
EMU is a collection of software tools for the creation, manipulation and analysis of speech databases. At the core of EMU is a database search engine which allows the researcher to find various speech segments based on the sequential and hierarchical structure of the utterances in which they occur. EMU includes an interactive labeller which can display spectrograms and other speech waveforms, and which allows the creation of hierarchical, as well as sequential, labels for a speech utterance.
web-based information system on scientific community (news, events, persons, job market, mailing list, database on research projects and corpora, bibliography, glossary and links) and recording equipment/software; disciplinary scope: research on conversation and discourse analysis and spoken language
Written German from 1920-39. 500,000 tokens, 392 texts. POS and lemma, TEI XML. Part of Das digitale Wörterbuch der deutschen Sprache der 20. Jahrhunderts
Diachronic corpus with focus on annotation and lemmatization of verbal categories; diachrones Korpus mit Fokus auf Annotation und Lemmatisierung von Verbalkategorien
Philosophical texts of the 18th century: Full text of the authoritative "Akademie-Ausgabe" (excluding most footnotes and editorial notes) and reference texts like A.G. Baumgarten's "Metaphysica".
Articles from the 'Berliner Zeitung' online edition from 3.1.1994 to 31.12.2005. About 252 million tokens in 869,000 articles. Part of the DWDS project.
1970s "representative" corpus of German created by the research group "Linguistik und Maschinelle Sprachbearbeitung" (linguistics and language processing); Zeitschnittkorpus der deutschen Schriftsprache von 1970; Querschnitt durch verschiedene Textsorten
As a sub-section of MATEO, MARABU (Mannheimer Reihe Altes Buch) includes illustrated books, (manu)scripts and texts on the history of the Electoral Palatinate. Als Unterkategorie von MATEO beinhaltet MARABU (Mannheimer Reihe Altes Buch) illustrierte Bücher, Handschriften und Rarissima, Quellen zur Geschichte der Kurpfalz sowie Beiträge über Frauen des Humanismus.
Integrated tool for corpus linguists built on Eclipse, Vex, Subversive, etc. for creating and editing transcriptions and annotations, querying, managing version controlled data, and building a shippable corpus.
Wörterbuch für Redensarten, Redewendungen, idiomatische Ausdrücke, feste Wortverbindungen; die Suchergebnisse werden jeweils nach den vier Dimensionen Redensart – Erläuterung – Beispiele – Ergänzungen angezeigt
Possibility to download the Ridges herbology corpus as a whole or parts of it; Möglichkeit zum Download des Ridges Herbology-Korpus als Ganzes oder einzelner Teildokumente
SMOR is a wide-coverage German computational morphology with inflection, derivation, and compounding. The SMOR code excepted the stem lexicon are available under the GNU license. SMOR (without a stem lexicon) comes with the SFST tools.
SpeechRecorder is a platform independent multi-channel audio recording software. Its main features are a configurable recording script, Unicode text, image and audio prompts, hardware independence and localized language interfaces.
SFST is a finite state transducer toolkit for the implementation of morphologies and other applications of finite state transducers. SFST comprises a compiler and several tools for transforming, printing and applying transducers.
A collection of pointers to teaching and learning materials on linguistics and linguistic tools, including quick starts, how-tos, technical documentation, short teaching modules (2h), and full courses. This resource is collaboratively built by its users.
TextGrid has purchased the Zeno.org online library (literary, historical, scientific, ... texts) and successively converts it to TEI. TextGrid hat die Online-Bibliothek von Zeno.org (literarische, naturwissenschaftliche, historische, ... Texte) erworben und konvertiert diese sukzessive in ein gültiges TEI-Format.
i.a. collection of old herbal books, old cookery books and texts on the history of German language in print media; u.a. eine Sammlung von alten Kräuterbüchern, alten Kochbüchern und Texten zur Geschichte der deutschen Pressesprache