Manual classification of errors of Czech-Slovak translation according to the classification introduced by Vilar et al. [1]. First 50 sentences from WMT 2010 test set were translated by 5 MT systems (Česílko, Česílko2, Google Translate and two Moses setups) and MT errors were manually marked and classified. Classification was applied in MT systems comparison [3]. Reference translation is included.
References:
[1] David Vilar, Jia Xu, Luis Fernando D’Haro and Hermann Ney. Error Analysis of Machine Translation Output. In International Conference on Language Resources and Evaluation, pages 697-702. Genoa, Italy, May 2006.
[2] http://matrix.statmt.org/test_sets/list
[3] Ondřej Bojar, Petra Galuščáková, and Miroslav Týnovský. Evaluating Quality of Machine Translation from Czech to Slovak. In Markéta Lopatková, editor, Information Technologies - Applications and Theory, pages 3-9, September 2011 and This work has been supported by the grants Euro-MatrixPlus (FP7-ICT-2007-3-231720 of the EU and
7E09003 of the Czech Republic)
Manually ranked outputs of Czech-Slovak translations. Three annotators manually ranked outputs of five MT systems (Česílko, Česílko2, Google Translate and two Moses setups) on three data sets (100 sentences randomly selected from books, 100 sentences randomly selected from Acquis corpus and 50 first sentences from WMT 2010 test set). Ranking was applied in MT systems comparison in [1].
References:
[1] Ondřej Bojar, Petra Galuščáková, and Miroslav Týnovský. Evaluating Quality of Machine Translation from Czech to Slovak. In Markéta Lopatková, editor, Information Technologies - Applications and Theory, pages 3-9, September 2011 and This work has been supported by the grant Euro-MatrixPlus (FP7-ICT-2007-3-231720 of the EU and
7E09003 of the Czech Republic)
Mapping table for the article Hajič et al., 2024: Mapping Czech Verbal Valency to PropBank Argument Labels, in LREC-COLING 2024, as preprocess by the algorithm described in the paper. This dataset i smeant for verification (replicatoin) purposes only. It will b manually processed further to arrive at a workable CzezchpropBank, to be used in Czech UMR annotation, to be further updated during the annotation. The resulting PropBank frame files fir Czech are expected to be available with some future releases of UMR, containing Czech UMR annotation, or separately.
Actress Marie Hübnerová with her colleagues Andula (Anna) Sedláčková and Hugo Haas in the garden of Sedláčková's villa in Černošice. Hübnerová holding Sedláčková´s daughter Marcela on her lap.
Writer Marie Majerová in the Vinohrady Theatre in Prague on the day of her 70th birthday in a fragmented segment from Československé filmové noviny (Czechoslovak Film News) 1952, issue no. 8.
Opera singer Marie Podvalová taking bows after performing in Moscow in a fragmented segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1955, issue no. 30. Minister Zdeněk Nejedlý is in the audience. Marie Podvalová on Bohumil Veselý's balcony.
Film director Martin Frič behind the camera and reading a film newspaper. Frič with an unidentified man in a city courtyard. Frič accepting the Order of the Republic in a fragmented segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1955, issue no. 34. Frič with his wife Suzanne Marwille and daughter Marta Fričová on Bohumil Veselý's balcony.
Painter Max Švabinský speaking at the opening of his exhibition at the Mánes Exhibition Hall in Prague in a segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1933, issue no. 39. Švabinský with Taťjana Nilovna Jablonská in a segment from Československé filmové noviny (Czechoslovak Film News) 1951, issue no. 47. Švabinský at the exhibition held to mark his 80th birthday in a fragmented segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1953, issue no. 42.
This package provides an evaluation framework, training and test data for semi-automatic recognition of sections of historical diplomatic manuscripts. The data collection consists of 57 Latin charters issued by the Royal Chancellery of 7 different types. Documents were created in the era of John the Blind, King of Bohemia (1310–1346) and Count of Luxembourg. Manuscripts were digitized, transcribed, and typical sections of medieval charters ('corroboratio', 'datatio', 'dispositio', 'inscriptio', 'intitulatio', 'narratio', and 'publicatio') were manually tagged. Manuscripts also contain additional metadata, such as manually marked named entities and short Czech abstracts.
Recognition models are first trained using manually marked sections in training documents and the trained model can then be used for recognition of the sections in the test data. The parsing script supports methods based on Cosine Distance, TF-IDF weighting and adapted Viterbi algorithm.