This package contains the eye-tracker recordings of 8 subjects evaluating English-to-Czech machine translation quality using the WMT-style ranking of sentences.
We provide the set of sentences evaluated, the exact screens presented to the annotators (including bounding box information for every area of interest and even for individual letters in the text) and finally the raw EyeLink II files with gaze trajectories.
The description of the experiment can be found in the paper:
Ondřej Bojar, Filip Děchtěrenko, Maria Zelenina. A Pilot Eye-Tracking Study of WMT-Style Ranking Evaluation.
Proceedings of the LREC 2016 Workshop “Translation Evaluation – From Fragmented Tools
and Data Sets to an Integrated Ecosystem”, Georg Rehm, Aljoscha Burchardt et al. (eds.). pp. 20-26. May 2016, Portorož, Slovenia.
This work has received funding from the European Union's Horizon 2020 research
and innovation programme under grant agreement no. 645452 (QT21). This work was
partially financially supported by the Government of Russian Federation, Grant
074-U01.
This work has been using language resources developed, stored and distributed
by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of
the Czech Republic (project LM2010013).
The GrandStaff-LMX dataset is based on the GrandStaff dataset described in the "End-to-end optical music recognition for pianoform sheet music" paper by Antonio Ríos-Vila et al., 2023, https://doi.org/10.1007/s10032-023-00432-z .
The GrandStaff-LMX dataset contains MusicXML and Linearized MusicXML encodings of all systems from the original datase, suitable for evaluation with the TEDn metric. It also contains the GrandStaff official train/dev/split.
OLiMPiC: OpenScore Lieder Linearized MusicXML Piano Corpus is a dataset containing synthetic and scanned images of pianoform music scores. The scores and the scanned images originate from the OpenScore Lieder Corpus https://github.com/OpenScore/Lieder .
OLiMPiC contains the scores in MusicXML and Linearized MusicXML encoding, suitable for evaluation with the TEDn metric. The official train/dev/test split is also provided.