Database of three inter-related early Irish glossaries. The texts, compiled from the eighth century, comprise several thousand headwords followed by entries that can range from single word explanations to whole narratives running to several pages.
A selection of nearly 400 literary compositions recorded on sources which come from ancient Mesopotamia and date to the late third and early second millennia BCE. The corpus contains Sumerian texts in transliteration, English prose translations and bibliographical information for each composition. The transliterations and the translations can be searched, browsed and read online using the tools of the website. The corpus is tagged for parts of speech.
Speech corpus comprising 4608 spoken sentences recorded for speech timing research. The complete archive, available for downloading, includes a structured list of the sentences, the speech recordings and the label files, plus full documentation.
French emblem books (27 in total) of the 16th century, together with Latin versions where appropriate. Transcribed and facsimile versions, and extensive search functionality.
Seven French L2 corpora. Digital sound files and related transcripts formatted using CHILDES software. The database currently contains over 4000 files (sound files, transcripts and morphosyntactically tagged transcripts). .
GATE-ANNIE, developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002,) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser.
The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages).
GATE-ANNIE returns the fully annotated document in GATE XML format. The file saved by the client contains ANNIE's output in the default AnnotationSet and
the input document's HTML or XML mark-up in the "Original markups" AnnotationSet.
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
ANNIE-RDF developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser.
The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages).
The text spans and annotations are exported into an RDF-XML ontology, in which the recognized named entities are instances according to the PROTON ontology (http://proton.semanticweb.org/).
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
The ultimate aim of the project is to compile a representative historical corpus of written German for the years 1650-1800. The complete GerManC corpus will contain 2000 word samples from nine genres