Harvested from: LINDAT/CLARIAH-CZ repository - LINDAT/CLARIAH-CZ Catalog Search Results

411. French-Croatian Parallel Corpus

Type:: corpus
Language:: Croatian and French
Description:: written; domain-specific (fiction); diachronic (the French side); bilingual; parallel; ca 263,000 tokens (148 Kw French; 115 Kw Croatian); XML; S-alignment
Rights:: Not specified

412. Frequency list: Early Modern Finnish

Publisher:: The Research Institute for the Languages of Finland
Type:: toolService
Subject:: word frequencies
Language:: Finnish
Description:: Frequency list of the Corpus of Early Modern Finnish, 4 862 190 words
Rights:: Not specified

413. Frequency list: Old Literary Finnish

Publisher:: The Research Institute for the Languages of Finland
Type:: toolService
Language:: Finnish
Description:: Frequency list of the Corpus of Old Literary Finnish, 3 425 382 words
Rights:: Not specified

414. Functional Morphology

Creator:: Forsberg, Markus and Ranta, Aarne
Publisher:: Språkbanken, Dept. of Swedish Language, Göteborg University
Type:: toolService
Subject:: morphology
Description:: Functional Morphology is a development environment for computational morphologies.
Rights:: Not specified

415. GATE-ANNIE

Publisher:: University of Sheffield
Type:: toolService
Description:: GATE-ANNIE, developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002,) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser. The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages). GATE-ANNIE returns the fully annotated document in GATE XML format. The file saved by the client contains ANNIE's output in the default AnnotationSet and the input document's HTML or XML mark-up in the "Original markups" AnnotationSet. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
Rights:: Not specified

416. GATE-ANNIE-RDF

Publisher:: University of Sheffield
Type:: toolService
Description:: ANNIE-RDF developed by the GATE group at the University of Sheffield (http;//www.gate.ac.uk; Cunningham et al., 2002) is an Information Extraction (IE) web service for English. It consists of the following main language processing tools: tokeniser, sentence splitter, POS tagger, coreference resolver and named entity recogniser. The named entity recogniser identifies and categorizes entity names (such as persons, organizations, and location names), temporal expressions (dates and times), and certain types of numerical expressions (monetary values and percentages). The text spans and annotations are exported into an RDF-XML ontology, in which the recognized named entities are instances according to the PROTON ontology (http://proton.semanticweb.org/). H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. 2002. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL-02).
Rights:: Not specified

417. GECCC Grammar Error Correction Corpus for Czech

Creator:: Náplava, Jakub, Straka, Milan, Straková, Jana, and Rosen, Alexandr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: gec, grammatical error correction, and dataset
Language:: Czech
Description:: Grammar Error Correction Corpus for Czech (GECCC) consists of 83 058 sentences and covers four diverse domains, including essays written by native students, informal website texts, essays written by Romani ethnic minority children and teenagers and essays written by nonnative speakers. All domains are professionally annotated for GEC errors in a unified manner, and errors were automatically categorized with a Czech-specific version of ERRANT released at https://github.com/ufal/errant_czech The dataset was introduced in the paper Czech Grammar Error Correction with a Large and Diverse Corpus that was accepted to TACL. Until published in TACL, see the arXiv version: https://arxiv.org/pdf/2201.05590.pdf
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), PUB, and http://creativecommons.org/licenses/by-sa/4.0/

418. GECCC Grammar Error Correction Corpus for Czech (2022-09-28)

Creator:: Náplava, Jakub, Straka, Milan, Straková, Jana, and Rosen, Alexandr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: gec, grammatical error correction, and dataset
Language:: Czech
Description:: Grammar Error Correction Corpus for Czech (GECCC) consists of 83 058 sentences and covers four diverse domains, including essays written by native students, informal website texts, essays written by Romani ethnic minority children and teenagers and essays written by nonnative speakers. All domains are professionally annotated for GEC errors in a unified manner, and errors were automatically categorized with a Czech-specific version of ERRANT released at https://github.com/ufal/errant_czech The dataset was introduced in the paper Czech Grammar Error Correction with a Large and Diverse Corpus that was accepted to TACL. Until published in TACL, see the arXiv version: https://arxiv.org/pdf/2201.05590.pdf This version fixes double annotation errors in train and dev M2 files, and also contains more metadata information.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), PUB, and http://creativecommons.org/licenses/by-sa/4.0/

419. Generator of Czech lyrics according to structure

Creator:: Štěpánková, Barbora
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: Song lyrics generation
Language:: Czech
Description:: Fine-tuned Czech TinyLlama model (https://huggingface.co/BUT-FIT/CSTinyLlama-1.2B) and Czech GPT2 small model (https://huggingface.co/lchaloupsky/czech-gpt2-oscar) to generate lyrics of song sections based on the provided syllable counts, keywords and rhyme scheme. The TinyLlama-based model yields better results, however, the GPT2-based model can run locally. Both models are discussed in a Bachelor Thesis: Generation of Czech Lyrics to Cover Songs.
Rights:: The MIT License (MIT), http://opensource.org/licenses/mit-license.php, and PUB

420. GerManC : A representative historical corpus of German 1650-1800

Type:: corpus
Language:: German
Description:: The ultimate aim of the project is to compile a representative historical corpus of written German for the years 1650-1800. The complete GerManC corpus will contain 2000 word samples from nine genres
Rights:: Not specified

411. French-Croatian Parallel Corpus

412. Frequency list: Early Modern Finnish

413. Frequency list: Old Literary Finnish

414. Functional Morphology

415. GATE-ANNIE

416. GATE-ANNIE-RDF

417. GECCC Grammar Error Correction Corpus for Czech

418. GECCC Grammar Error Correction Corpus for Czech (2022-09-28)

419. Generator of Czech lyrics according to structure

420. GerManC : A representative historical corpus of German 1650-1800

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Coverage

Show values starting with

Creator

Show values starting with

Format

Language

Show values starting with

Publisher

Show values starting with

Rights

Show values starting with

Subject

Show values starting with

Type

Show values starting with

Original context has metadata only

Harvested from