Contributor: Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2018101@@LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy@@nationalFunds@@ and Univerzita Karlova (mimo GAUK)@@SVV 260 575@@Specifický vysokoškolský výzkum@@nationalFunds@@ / Creator: Náplava, Jakub

Start Over Contributor Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2018101@@LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy@@nationalFunds@@ Contributor Univerzita Karlova (mimo GAUK)@@SVV 260 575@@Specifický vysokoškolský výzkum@@nationalFunds@@ Creator Náplava, Jakub Date Unknown

1. GECCC Grammar Error Correction Corpus for Czech

Creator:: Náplava, Jakub, Straka, Milan, Straková, Jana, and Rosen, Alexandr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: gec, grammatical error correction, and dataset
Language:: Czech
Description:: Grammar Error Correction Corpus for Czech (GECCC) consists of 83 058 sentences and covers four diverse domains, including essays written by native students, informal website texts, essays written by Romani ethnic minority children and teenagers and essays written by nonnative speakers. All domains are professionally annotated for GEC errors in a unified manner, and errors were automatically categorized with a Czech-specific version of ERRANT released at https://github.com/ufal/errant_czech The dataset was introduced in the paper Czech Grammar Error Correction with a Large and Diverse Corpus that was accepted to TACL. Until published in TACL, see the arXiv version: https://arxiv.org/pdf/2201.05590.pdf
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), PUB, and http://creativecommons.org/licenses/by-sa/4.0/

2. GECCC Grammar Error Correction Corpus for Czech (2022-09-28)

Creator:: Náplava, Jakub, Straka, Milan, Straková, Jana, and Rosen, Alexandr
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: gec, grammatical error correction, and dataset
Language:: Czech
Description:: Grammar Error Correction Corpus for Czech (GECCC) consists of 83 058 sentences and covers four diverse domains, including essays written by native students, informal website texts, essays written by Romani ethnic minority children and teenagers and essays written by nonnative speakers. All domains are professionally annotated for GEC errors in a unified manner, and errors were automatically categorized with a Czech-specific version of ERRANT released at https://github.com/ufal/errant_czech The dataset was introduced in the paper Czech Grammar Error Correction with a Large and Diverse Corpus that was accepted to TACL. Until published in TACL, see the arXiv version: https://arxiv.org/pdf/2201.05590.pdf This version fixes double annotation errors in train and dev M2 files, and also contains more metadata information.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), PUB, and http://creativecommons.org/licenses/by-sa/4.0/

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Language

Publisher

Rights

Subject

Type

Original context has metadata only

Harvested from