dc.contributor.author | Hajič, Jan |
dc.contributor.author | Mareček, David |
dc.contributor.author | Fučíková, Eva |
dc.contributor.author | Cinková, Silvie |
dc.contributor.author | Štěpánek, Jan |
dc.contributor.author | Mikulová, Marie |
dc.contributor.author | Popel, Martin |
dc.date.accessioned | 2021-10-15T13:57:12Z |
dc.date.available | 2021-10-15T13:57:12Z |
dc.date.issued | 2021-09-20 |
dc.identifier.uri | http://hdl.handle.net/11234/1-3775 |
dc.description | This machine translation test set contains 2223 Czech sentences collected within the FAUST project (https://ufal.mff.cuni.cz/grants/faust, http://hdl.handle.net/11234/1-3308). Each original (noisy) sentence was normalized (clean1 and clean2) and translated to English independently by two translators. |
dc.language.iso | eng |
dc.language.iso | ces |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation.isbasedon | http://hdl.handle.net/11234/1-3308 |
dc.rights | Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ |
dc.subject | noisy texts |
dc.subject | parallel corpus |
dc.subject | machine translation |
dc.title | FAUST cs-en 0.5 |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Martin Popel popel@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
sponsor | Ministerstvo školství, mládeže a tělovýchovy České republiky LM2018101 LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy nationalFunds |
sponsor | Grantová agentura České republiky GX20-16819X LUSyD – Language Understanding: from Syntax to Discourse nationalFunds |
size.info | 2223 sentences |
files.size | 917004 |
files.count | 1 |
Soubory tohoto záznamu
Licenční kategorie:
Licence: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
Publicly Available
Licence: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
- Název
- faust-csen.zip
- Velikost
- 895.51 KB
- Formát
- application/zip
- Popis
- Neznámý
- MD5
- ddb9093027913f1883d25dfafc1ecb1a
- scripts
- faust-extract-tmx.pl1 kB
- faust-merge-tsv.pl1 kB
- original-tmx
- faust-csen-rs.tmx1 MB
- faust-csen-mu.tmx1 MB
- README.txt979 B
- faust-csen-noisy-cs.txt160 kB
- faust-csen-noisy-en.txt338 kB
- faust-csen-clean2-cs.txt159 kB
- faust-csen-clean1-cs.txt159 kB