Number of results to display per page
Search Results
52. ESIC 1.0 -- Europarl Simultaneous Interpreting Corpus
- Creator:
- Macháček, Dominik, Žilinec, Matúš, and Bojar, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- audio and corpus
- Subject:
- simultaneous interpreting, interpreting, ASR evaluation, automatic machine translation evaluation, and Europarl
- Language:
- English, Czech, and German
- Description:
- ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, with manual transcripts, transcribed simultaneous interpreting into Czech and German, and parallel translations. The corpus contains source English videos and audios. The interpreters' voices are not published within the corpus, but there is a tool that downloads them from the web of European Parliament, where they are publicly avaiable. The transcripts are equipped with metadata (disfluencies, mixing voices and languages, read or spontaneous speech, etc.), punctuated, and with word-level timestamps. The speeches in the corpus come from the European Parliament plenary sessions, from the period 2008-11. Most of the speakers are MEP, both native and non-native speakers of English. The corpus contains metadata about the speakers (name, surname, id, fraction) and about the speech (date, topic, read or spontaneous). The current version of ESIC is v1.0. It has validation and evaluation parts.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
53. ESIC 1.1 -- Europarl Simultaneous Interpreting Corpus (2024-02-05)
- Creator:
- Macháček, Dominik, Žilinec, Matúš, and Bojar, Ondřej
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- audio and corpus
- Subject:
- simultaneous interpreting, interpreting, ASR evaluation, automatic machine translation evaluation, and Europarl
- Language:
- English, Czech, and German
- Description:
- ESIC (Europarl Simultaneous Interpreting Corpus) is a corpus of 370 speeches (10 hours) in English, with manual transcripts, transcribed simultaneous interpreting into Czech and German, and parallel translations. The corpus contains source English videos and audios. The interpreters' voices are not published within the corpus, but there is a tool that downloads them from the web of European Parliament, where they are publicly avaiable. The transcripts are equipped with metadata (disfluencies, mixing voices and languages, read or spontaneous speech, etc.), punctuated, and with word-level timestamps. The speeches in the corpus come from the European Parliament plenary sessions, from the period 2008-11. Most of the speakers are MEP, both native and non-native speakers of English. The corpus contains metadata about the speakers (name, surname, id, fraction) and about the speech (date, topic, read or spontaneous). ESIC has validation and evaluation parts. The current version is ESIC v1.1, it extends v1.0 with manual sentence alignment of the tri-parallel texts, and with bi-parallel sentence alignment of English original transcripts and German interpreting.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
54. Etalon 1.0
- Creator:
- Skoumalová, Hana
- Publisher:
- Charles University, Faculty of Arts, Institute of Theoretical and Computational Linguistics
- Type:
- text and corpus
- Subject:
- annotated corpus and morphological annotation
- Language:
- Czech
- Description:
- Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,265,722 tokens) and is annotated in the same way as SYN2020 of the Czech National Corpus. The corpus includes fiction (ca 24%), professional and scientific literature (ca 40%) and newspapers (ca 36%). The corpus is provided in a vertical format, where sentence boundaries are marked with a blank line. Every word form is written on a separate line, followed by five tab-separated attributes: syntactic word, lemma, sublemma, tag and verbtag. The texts are shuffled in random chunks of 100 words at maximum (respecting sentence boundaries).
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
55. EvaLatin 2020 models for UDPipe 2 (2020-08-31)
- Creator:
- Straka, Milan
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- POS tagger, lemmatization, and tagger
- Language:
- Latin
- Description:
- POS Tagger and Lemmatizer models for EvaLatin2020 data (https://github.com/CIRCSE/LT4HALA). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#evalatin20_models . To use these models, you need UDPipe version at least 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
56. EVALD 2.0
- Creator:
- Novák, Michal, Rysová, Kateřina, Mírovský, Jiří, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, UFAL
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and native speakers
- Language:
- Czech
- Description:
- EVALD 2.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
57. EVALD 2.0 for Foreigners
- Creator:
- Novák, Michal, Rysová, Kateřina, Mírovský, Jiří, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, UFAL
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 2.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
58. EVALD 3.0 – Evaluator of Discourse
- Creator:
- Mírovský, Jiří, Novák, Michal, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, UFAL
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and native speakers
- Language:
- Czech
- Description:
- EVALD 3.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
59. EVALD 3.0 for Foreigners – Evaluator of Discourse
- Creator:
- Mírovský, Jiří, Novák, Michal, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, UFAL
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 3.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
60. EVALD 4.0 – Evaluator of Discourse
- Creator:
- Novák, Michal, Mírovský, Jiří, Rysová, Kateřina, Rysová, Magdaléna, and Hajičová, Eva
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- tool and toolService
- Subject:
- text coherence, discourse, automatic evaluation, and non-native speakers
- Language:
- Czech
- Description:
- EVALD 4.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB