Harvested from: LINDAT/CLARIAH-CZ repository / Rights: PUB and http://creativecommons.org/licenses/by-nc-sa/4.0/

121. STAZKA – Speech recordings from vehicles

Creator:: Šmídl, Luboš, Stanislav, Petr, and Radová, Vlasta
Publisher:: University of West Bohemia, Department of Cybernetics
Type:: audio and corpus
Subject:: speech corpus, noisy speech, voice activity detector, and speech recognition
Language:: Czech
Description:: The database actually contains two sets of recordings, both recorded in the moving or stationary vehicles (passenger cars or trucks). All data were recorded within the project “Intelligent Electronic Record of the Operation and Vehicle Performance” whose aim is to develop a voice-operated software for registering the vehicle operation data. The first part (full_noises.zip) consists of relatively long recordings from the vehicle cabin, containing spontaneous speech from the vehicle crew. The recordings are accompanied with detailed transcripts in the Transcriber XML-based format (.trs). Due to the recording settings, the audio contains many different noises, only sparsely interspersed with speech. As such, the set is suitable for robust estimation of the voice activity detector parameters. The second set (prompts.zip) consists of short prompts that were recorded in the controlled setting – the speakers either answered simple questions or they repeated commands and short phrases. The prompts were recorded by 26 different speakers. Each speaker recorded at least two sessions (with identical set of prompts) – first in stationary vehicle, with low level of noise (those recordings are marked by –A_ in the file name) and second while actually driving the car (marked by –B_ or, since several speakers recorded 3 sessions, by –C_). The recordings from this set are suitable mostly for training of the robust domain-specific speech recognizer and also ASR test purposes.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

122. SynSemClass 1.0

Creator:: Urešová, Zdeňka, Fučíková, Eva, Hajičová, Eva, and Hajič, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: verbal valency, predicate argument structure, semantic roles, bilingual corpus annotation, translational equivalence, comparative syntax, and comparative semantics
Language:: English and Czech
Description:: The SynSemClass synonym verb lexicon is a result of a project investigating semantic ‘equivalence’ of verb senses and their valency behavior in parallel Czech-English language resources, i.e., relating verb meanings with respect to contextually-based verb synonymy. The lexicon entries are linked to PDT-Vallex (http://hdl.handle.net/11858/00-097C-0000-0023-4338-F), EngVallex (http://hdl.handle.net/11858/00-097C-0000-0023-4337-2), CzEngVallex (http://hdl.handle.net/11234/1-1512), FrameNet (https://framenet.icsi.berkeley.edu/fndrupal/), VerbNet (http://verbs.colorado.edu/verbnet/index.html), PropBank (http://verbs.colorado.edu/%7Empalmer/projects/ace.html), Ontonotes (http://verbs.colorado.edu/html_groupings/), and English Wordnet (https://wordnet.princeton.edu/). Part of the dataset are files reflecting interannotator agreement.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

123. SynTagRus gapping test set

Creator:: Droganova, Kira, Ponomareva, Maria, Smurov, Ivan, and Shavrina, Tatiana
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: linguistic data, gapping, and ellipsis
Language:: Russian
Description:: A test set that contains manually annotated sentences with gapping. The test set was compiled from SynTagRus (v. 2015) the dependency treebank for Russian that provides comprehensive manually-corrected morphological and syntactic annotation.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

124. Teaching practicum and its role in the professional training of English teachers at the Faculty of Arts, Charles University

Creator:: Jiránková, Lucie
Publisher:: Charles University, Faculty of Arts
Type:: text, other, and languageDescription
Subject:: teaching practicum, english, uajd, reflection, questionnaire, teaching, english language teaching, and teaching trainees
Language:: Czech
Description:: The presented data and metadata include answers to questions raised in the questionnaire focused on the experience of teaching practicums and their role in the practical preparation of English language teachers at the Faculty of Arts, Charles University, as well as a basic quantitative analysis of the answers. The analysis of the questionnaires shows that trainees are, in most cases, prepared for their teaching practicum both professionally and in terms of pedagogy and psychology, and the use of reflective teaching methods seems very useful. The benefits of the teaching practicum include, in particular, getting to know the real situation of teaching in secondary schools and working with a larger group of pupils, getting to know oneself as a teacher, gaining self-confidence, and becoming aware of one's own limits and areas for improvement. The downsides of the current system of teaching practice include mainly the low time allocation, the lack of integration of the practice in the curriculum, and the lack of involvement of the trainee in the daily running of the school (administrative work, supervision, meetings) and the lack of quality feedback from the faculty teacher.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

125. The ACL RD-TEC 2.0

Creator:: QasemiZadeh, Behrang and Schumann, Anne-Kathrin
Publisher:: DFG Collaborative Research Centre 991, University of Duesseldorf and Department of Applied Linguistics, Translation and Interpreting, Saarland University
Type:: text and corpus
Subject:: Terminology, Term Extraction, Term Classification, Entity Recognition, Evaluation Corpus, Language Resource, Gold Dataset, and Evaluation of Automatic Terminology Construction Methods
Language:: English
Description:: The ACL RD-TEC 2.0 has been developed with the aim of providing a benchmark for the evaluation of methods for terminology extraction and classification as well as entity recognition tasks based on specialised text from the computational linguistics domain. This release of the corpus consists of 300 abstracts from articles in the ACL Anthology Reference Corpus, published between 1978--2006. In these abstracts, terms (i.e., single or multi-word lexical units with a specialised meaning) are manually annotated. In addition to their boundaries in running text, annotated terms are classified into one of the seven categories method, tool, language resource (LR), LR product, model, measures and measurements, and other. To assess the quality of the annotations and to determine the difficulty of this task, more than 171 of the abstracts are annotated twice, independently, by each of the two annotators. In total, 6,818 terms are identified and annotated, resulting in a specialised vocabulary made of 3,318 lexical forms, mapped to 3,471 concepts.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

126. Translation Models (en-de) (v1.0)

Creator:: Variš, Dušan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, and transformer
Language:: English and German
Description:: En-De translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->de: 25.9 de->en: 33.4 (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

127. Translation Models (en-ru) (v1.0)

Creator:: Variš, Dušan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: machine translation, neural machine translation, and transformer
Language:: English and Russian
Description:: En-Ru translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->ru: 18.0 ru->en: 30.4 (Evaluated using multeval: https://github.com/jhclark/multeval)
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

128. UFAL Parallel Corpus of North Levantine 1.0

Creator:: Sellat, Hashem, Saleh, Shadi, Krubiński, Mateusz, Pospíšil, Adam, Zemánek, Petr, and Pecina, Pavel
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: multilingual, machine translation, parallel corpus, north levantine, and corpus
Language:: North Levantine Arabic, English, French, Spanish, Standard Arabic, Modern Greek (1453-), and German
Description:: This is the first release of the UFAL Parallel Corpus of North Levantine, compiled by the Institute of Formal and Applied Linguistics (ÚFAL) at Charles University within the Welcome project (https://welcome-h2020.eu/). The corpus consists of 120,600 multiparallel sentences in English, French, German, Greek, Spanish, and Standard Arabic selected from the OpenSubtitles2018 corpus [1] and manually translated into the North Levantine Arabic language. The corpus was created for the purpose of training machine translation for North Levantine and the other languages.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

129. Universal Dependencies 1.2 Models for Parsito

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: parser and dependency parser
Language:: English
Description:: Parsing models for all Universal Depenencies 1.2 Treebanks, created solely using UD 1.2 data (http://hdl.handle.net/11234/1-1548). To use these models, you need Parsito binary, which you can download from http://hdl.handle.net/11234/1-1584.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

130. Universal Dependencies 1.2 Models for UDPipe

Creator:: Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: tool and toolService
Subject:: tokenizer, POS tagger, lemmatization, tagger, parser, and dependency parser
Language:: English
Description:: Tokenizer, POS Tagger, Lemmatizer and Parser models for all Universal Depenencies 1.2 Treebanks, created solely using UD 1.2 data (http://hdl.handle.net/11234/1-1548). To use these models, you need UDPipe binary, which you can download from http://ufal.mff.cuni.cz/udpipe.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

121. STAZKA – Speech recordings from vehicles

122. SynSemClass 1.0

123. SynTagRus gapping test set

124. Teaching practicum and its role in the professional training of English teachers at the Faculty of Arts, Charles University

125. The ACL RD-TEC 2.0

126. Translation Models (en-de) (v1.0)

127. Translation Models (en-ru) (v1.0)

128. UFAL Parallel Corpus of North Levantine 1.0

129. Universal Dependencies 1.2 Models for Parsito

130. Universal Dependencies 1.2 Models for UDPipe

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Show values starting with

Rights

Subject

Show values starting with

Type

Show values starting with

Original context has metadata only

Harvested from