GATE-ANNIE, developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002,) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser.
The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages).
GATE-ANNIE returns the fully annotated document in GATE XML format. The file saved by the client contains ANNIE's output in the default AnnotationSet and
the input document's HTML or XML mark-up in the "Original markups" AnnotationSet.
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
ANNIE-RDF developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser.
The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages).
The text spans and annotations are exported into an RDF-XML ontology, in which the recognized named entities are instances according to the PROTON ontology (http://proton.semanticweb.org/).
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
The ultimate aim of the project is to compile a representative historical corpus of written German for the years 1650-1800. The complete GerManC corpus will contain 2000 word samples from nine genres
web-based information system on scientific community (news, events, persons, job market, mailing list, database on research projects and corpora, bibliography, glossary and links) and recording equipment/software; disciplinary scope: research on conversation and discourse analysis and spoken language
Glossa is a web-based system for corpus search and results management. It comes with built-in support for CLARIN federated content search as well as corpora encoded with the IMS Corpus Workbench. It also has a plugin architecture that enables other search engines to be used once a wrapper has been created.Glossa can be freely downloaded and installed on the user's server. It currently supports only monolignual written corpora, but support for multilingual corpora is under development, as well as support for spoken corpora with audio, video and maps.
70K words, Non-validated sentence segmentation. Non-validated POS tagging, Manual annotation of syntactic dependencies and dependency labels, Manual annotation of semantic roles, Manual annotation of events based on a shallow domain specific ontology (only for a 31K words subset of GDT)