French emblem books (27 in total) of the 16th century, together with Latin versions where appropriate. Transcribed and facsimile versions, and extensive search functionality.
Seven French L2 corpora. Digital sound files and related transcripts formatted using CHILDES software. The database currently contains over 4000 files (sound files, transcripts and morphosyntactically tagged transcripts). .
GATE-ANNIE, developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002,) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser.
The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages).
GATE-ANNIE returns the fully annotated document in GATE XML format. The file saved by the client contains ANNIE's output in the default AnnotationSet and
the input document's HTML or XML mark-up in the "Original markups" AnnotationSet.
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
ANNIE-RDF developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser.
The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages).
The text spans and annotations are exported into an RDF-XML ontology, in which the recognized named entities are instances according to the PROTON ontology (http://proton.semanticweb.org/).
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
The ultimate aim of the project is to compile a representative historical corpus of written German for the years 1650-1800. The complete GerManC corpus will contain 2000 word samples from nine genres
web-based information system on scientific community (news, events, persons, job market, mailing list, database on research projects and corpora, bibliography, glossary and links) and recording equipment/software; disciplinary scope: research on conversation and discourse analysis and spoken language