Show simple item record

 
dc.contributor.author Popel, Martin
dc.date.accessioned 2022-06-14T10:10:38Z
dc.date.available 2022-06-14T10:10:38Z
dc.date.issued 2020-07-06
dc.identifier.uri http://hdl.handle.net/11234/1-4774
dc.description CzEng is a sentence-parallel Czech-English corpus compiled at the Institute of Formal and Applied Linguistics (ÚFAL). While the full CzEng 2.0 is freely available for non-commercial research purposes from the project website (https://ufal.mff.cuni.cz/czeng), this release contains only the original monolingual parts of news text (csmono 53M and enmono 79M sentences) with automatic (synthetic) translations by CUBBITT. See the attached README for additional details such as the file format.
dc.language.iso ces
dc.language.iso eng
dc.publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
dc.relation.isreferencedby https://arxiv.org/abs/2007.03006
dc.rights Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.rights.uri http://creativecommons.org/licenses/by-sa/4.0/
dc.source.uri https://ufal.mff.cuni.cz/czeng
dc.subject parallel corpus
dc.title Synthetic part of CzEng 2.0
dc.type corpus
metashare.ResourceInfo#ContentInfo.mediaType text
dc.rights.label PUB
has.files yes
branding LINDAT / CLARIAH-CZ
contact.person Martin Popel popel@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2018101 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds
sponsor Ministerstvo školství, mládeže a tělovýchovy České republiky LM2015071 LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat nationalFunds
sponsor Grantová agentura České republiky GX20-16819X LUSyD – Language Understanding: from Syntax to Discourse nationalFunds
size.info 131537252 sentences
files.size 12798377982
files.count 3


 Files in this item

This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
README
Size
2.99 KB
Format
Unknown
Description
readme.txt
MD5
ab2d71950b2e51acdc461bec8674b164
 Download file
Icon
Name
czeng20-csmono.gz
Size
4.31 GB
Format
application/x-gzip
Description
filtered Czech news crawl from 2013-2018, translated to English by CUBBITT
MD5
b80333bef7cc9db8610daaae0e2186ea
 Download file
Icon
Name
czeng20-enmono.gz
Size
7.61 GB
Format
application/x-gzip
Description
filtered English news crawl from 2016-2018, translated to Czech by CUBBITT
MD5
bf5941d6de35af9cbd7f0f0efd190e1f
 Download file

Show simple item record