dc.contributor.author | Barančíková, Petra |
dc.contributor.author | Bojar, Ondřej |
dc.date.accessioned | 2020-06-19T09:13:14Z |
dc.date.available | 2020-06-19T09:13:14Z |
dc.date.issued | 2020-06-15 |
dc.identifier.uri | http://hdl.handle.net/11234/1-3248 |
dc.description | Costra 1.1 is a new dataset for testing geometric properties of sentence embeddings spaces. In particular, it concentrates on examining how well sentence embeddings capture complex phenomena such paraphrases, tense or generalization. The dataset is a direct expansion of Costra 1.0, which was extended with more sentences and sentence comparisons. |
dc.language.iso | ces |
dc.publisher | Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/825303 |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ |
dc.subject | paraphrases |
dc.subject | sentence embeddings |
dc.subject | evaluation |
dc.subject | sentence |
dc.title | COSTRA 1.1: A Dataset of Complex Sentence Transformations and Comparisons |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
dc.rights.label | PUB |
has.files | yes |
branding | LINDAT / CLARIAH-CZ |
contact.person | Petra Barančíková barancikova@ufal.mff.cuni.cz Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL) |
sponsor | European Union EC/H2020/825303 Bergamot - Browser-based Multilingual Translation euFunds info:eu-repo/grantAgreement/EC/H2020/825303 |
sponsor | Czech Science Foundation 19-26934X Neural Representations in Multi-modal and Multi-lingual Modelling nationalFunds |
size.info | 6968 sentences |
files.size | 819686 |
files.count | 2 |
Soubory tohoto záznamu
Stáhnout všechny soubory záznamu (800.47 KB)Licenční kategorie:
Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
Publicly Available
Licence: Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Název
- README
- Velikost
- 3.94 KB
- Formát
- Neznámý
- Popis
- README
- MD5
- ec1d7ad7c25a11b40f9496433a632a3f
- Název
- data.tsv
- Velikost
- 796.54 KB
- Formát
- Neznámý
- Popis
- data
- MD5
- e30cd60188074f3006eb5f976eddb993