TextGrid has purchased the Zeno.org online library (literary, historical, scientific, ... texts) and successively converts it to TEI. TextGrid hat die Online-Bibliothek von Zeno.org (literarische, naturwissenschaftliche, historische, ... Texte) erworben und konvertiert diese sukzessive in ein gültiges TEI-Format.
i.a. collection of old herbal books, old cookery books and texts on the history of German language in print media; u.a. eine Sammlung von alten Kräuterbüchern, alten Kochbüchern und Texten zur Geschichte der deutschen Pressesprache
It calculates the Term Frequency and the Inverse Document Frequency of a word in a given corpus (a statistical measure used to evaluate how important a word is to a document in a collection or corpus).
The ACL RD-TEC 2.0 has been developed with the aim of providing a benchmark for the evaluation of methods for terminology extraction and classification as well as entity recognition tasks based on specialised text from the computational linguistics domain. This release of the corpus consists of 300 abstracts from articles in the ACL Anthology Reference Corpus, published between 1978--2006. In these abstracts, terms (i.e., single or multi-word lexical units with a specialised meaning) are manually annotated. In addition to their boundaries in running text, annotated terms are classified into one of the seven categories method, tool, language resource (LR), LR product, model, measures and measurements, and other. To assess the quality of the annotations and to determine the difficulty of this task, more than 171 of the abstracts are annotated twice, independently, by each of the two annotators. In total, 6,818 terms are identified and annotated, resulting in a specialised vocabulary made of 3,318 lexical forms, mapped to 3,471 concepts.
The segment shows the Bakulův ústav pro výchovu životem a prací (Bakula Institute for Education through Life and Work) in Prague's Smíchov district. The first-ever film footage of the physically disabled writer František Filip, known as the Handless Frantík. František Bakula conducting his choir Bakula's Little Singers (Bakulovi zpěváčci).
The audio collection and the written texts. Now it contains approximately 2000 hours of digitalised and more than 2000 not digitalised audio recordings; 400,000 cards with information on dialectal words, morphology, syntax, etc.; transcripts and notes.
An annotated corpus of literary Ancient Greek sourced from the Perseus Canonical Greek Lit repository (https://github.com/PerseusDL/canonical-greekLit), “The Little Sailing” digital library (http://www.mikrosapoplous.gr/en/texts1en.html), and the Bibliotheca Augustana digital library (http://www.hs-augsburg.de/~harsch/augustana.html#gr).
The corpus consists of 820 texts spanning between the beginnings of the AG literary tradition (Homer) and the fifth century AD, and it counts 10,206,421 words.
In addition to referring to this resource, please use the following citation when citing the corpus:
Vatri, A., & McGillivray, B. (2018). The Diorisis Ancient Greek Corpus, Research Data Journal for the Humanities and Social Sciences, 3(1), 55-65. doi: https://doi.org/10.1163/24523666-01000013