Number of results to display per page
Search Results
602. English gustatory adjectives and lexical synaesthesia - data analysis
- Creator:
- Jurčević, Jana
- Publisher:
- Faculty of Humanities and Social Sciences, University of Rijeka
- Type:
- text, wordList, and lexicalConceptualResource
- Subject:
- lexical synaesthesia, metaphorical collocations, metonymy, cross-modal mapping, and embodiment
- Language:
- English
- Description:
- Data collection has been done by the means of Sketch Engine program. Data were extrapolated from the annotated English web corpus enTenTen20. Data collection and analysis has been done during the period of two months: April and May 2023. Recently, the enTenTen20 corpus has been updated to a newer version - enTenTen21. Nevertheless, the older version is still available, can be worked on and can be compared with the newer one. It has been noticed that the differences between the two versions of the English web corpus did not affect the results of this study. The only apparent difference was seen in slightly different numbers in frequency values for specific collocations. This was expected since the older version of web corpus consists of 36 billion words, while the new version counts 52 billion words. On the other hand, as noted above, these frequency deviations were not significant enough to refute the hypotheses. They have rather confirmed them once again. This study is one of the results of work on a larger scientific-research project called "Metaphorical collocations - syntagmatic relations between semantics and pragmatics". More information about the project is available on the following link: https://metakol.uniri.hr/en/opis-projekta/ The study has been financed by the Croatian science foundation. Working with the data/replicating the study: Data collected for the purposes of this study is available in CSV format. Data for each gustatory adjective (collocate) is presented in a separate CSV file. Upon opening each file, stretch the borders of every column for better visibility of data. Tables show different collocational bases (nouns) which are found in the corpus, in combination with a specific gustatory adjective, their collocate. These nouns are listed by their score number (The Mutual Information score expresses the extent to which words co-occur compared to the number of times they appear separately). Tables show what type of mapping is present in a certain collocation (e.g., intra-modal or cross-modal). Tables show what type of meaning or cognitive process is working in the background of the meaning formation (e.g., metonymic or metaphoric). For every analyzed collocation, we provided a contextualized example of its use from the corpus, along with the hyperlink where it can be found.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
603. English Model (CoNLL-2003) for NameTag
- Creator:
- Straka, Milan and Straková, Jana
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, mlmodel, and languageDescription
- Subject:
- NameTag, English, and named entity recognition
- Language:
- English
- Description:
- English model for NameTag, a named entity recognition tool. The model is trained on CoNLL-2003 training data. Recognizes PER, ORG, LOC and MISC named entities. Achieves F-measure 84.73 on CoNLL-2003 test data.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
604. English Models (Morphium + WSJ) for MorphoDiTa
- Creator:
- Straka, Milan and Straková, Jana
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text, languageDescription, and mlmodel
- Subject:
- MorphoDiTa, English, morphological analysis, morphological generation, and PoS tagging
- Language:
- English
- Description:
- English models for MorphoDiTa, providing morphological analysis, morphological generation and part-of-speech tagging. The morphological dictionary is created from Morphium and SCOWL (Spell Checker Oriented Word Lists), the PoS tagger is trained on WSJ (Wall Street Journal). and This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013). The morphological POS analyzer development was supported by grant of the Ministry of Education, Youth and Sports of the Czech Republic No. LC536 "Center for Computational Linguistics". The morphological POS analyzer research was performed by Johanka Spoustová (Spoustová 2008; the Treex::Tool::EnglishMorpho::Analysis Perl module). The lemmatizer was implemented by Martin Popel (Popel 2009; the Treex::Tool::EnglishMorpho::Lemmatizer Perl module). The lemmatizer is based on morpha, which was released under LGPL licence as a part of RASP system (http://ilexir.co.uk/applications/rasp). The tagger algorithm and feature set research was supported by the projects MSM0021620838 and LC536 of Ministry of Education, Youth and Sports of the Czech Republic, GA405/09/0278 of the Grant Agency of the Czech Republic and 1ET101120503 of Academy of Sciences of the Czech Republic. The research was performed by Drahomíra "johanka" Spoustová, Jan Hajič, Jan Raab and Miroslav Spousta.
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
605. English TTS speech corpus of air traffic (pilot) messages - Czech accent
- Creator:
- Matoušek, Jindřich and Tihelka, Daniel
- Publisher:
- University of West Bohemia, Department of Cybernetics
- Type:
- audio and corpus
- Subject:
- speech corpus, text-to-speech (TTS), and pitch-marks
- Language:
- English
- Description:
- The corpus contains recordings of male speaker, native in Czech, talking in English. The sentences that were read by the speaker originate in the domain of air traffic control (ATC), specifically the messages used by plane pilots during routine flight. The text in the corpus originates from the transcripts of the real recordings, part of which has been released in LINDAT/CLARIN (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0), and individual phrases were selected by special algorithm described in Jůzová, M. and Tihelka, D.: Minimum Text Corpus Selection for Limited Domain Speech Synthesis (DOI 10.1007/978-3-319-10816-2_48). The corpus was used to create a limited domain speech synthesis system capable of simulating a pilot communication with an ATC officer.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
606. English TTS speech corpus of air traffic (pilot) messages - German accent
- Creator:
- Matoušek, Jindřich and Tihelka, Daniel
- Publisher:
- University of West Bohemia, Department of Cybernetics
- Type:
- audio and corpus
- Subject:
- speech corpus, text-to-speech (TTS), and pitch-marks
- Language:
- English
- Description:
- The corpus contains recordings of male speaker, native in German, talking in English. The sentences that were read by the speaker originate in the domain of air traffic control (ATC), specifically the messages used by plane pilots during routine flight. The text in the corpus originates from the transcripts of the real recordings, part of which has been released in LINDAT/CLARIN (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0), and individual phrases were selected by special algorithm described in Jůzová, M. and Tihelka, D.: Minimum Text Corpus Selection for Limited Domain Speech Synthesis (DOI 10.1007/978-3-319-10816-2_48). The corpus was used to create a limited domain speech synthesis system capable of simulating a pilot communication with an ATC officer.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
607. English TTS speech corpus of air traffic (pilot) messages - Serbian accent
- Creator:
- Matoušek, Jindřich and Tihelka, Daniel
- Publisher:
- University of West Bohemia, Department of Cybernetics
- Type:
- audio and corpus
- Subject:
- speech corpus, text-to-speech (TTS), and pitch-marks
- Language:
- English
- Description:
- The corpus contains recordings of male speaker, native in Serbian, talking in English. The sentences that were read by the speaker originate in the domain of air traffic control (ATC), specifically the messages used by plane pilots during routine flight. The text in the corpus originates from the transcripts of the real recordings, part of which has been released in LINDAT/CLARIN (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0), and individual phrases were selected by special algorithm described in Jůzová, M. and Tihelka, D.: Minimum Text Corpus Selection for Limited Domain Speech Synthesis (DOI 10.1007/978-3-319-10816-2_48). The corpus was used to create a limited domain speech synthesis system capable of simulating a pilot communication with an ATC officer.
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
608. English TTS speech corpus of air traffic (pilot) messages - Taiwanese accent
- Creator:
- Matoušek, Jindřich and Tihelka, Daniel
- Publisher:
- University of West Bohemia, Department of Cybernetics
- Type:
- audio and corpus
- Subject:
- speech corpus, text-to-speech (TTS), and pitch-marks
- Language:
- English
- Description:
- The corpus contains recordings of male speaker, native in Taiwanese, talking in English. The sentences that were read by the speaker originate in the domain of air traffic control (ATC), specifically the messages used by plane pilots during routine flight. The text in the corpus originates from the transcripts of the real recordings, part of which has been released in LINDAT/CLARIN (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0), and individual phrases were selected by special algorithm described in Jůzová, M. and Tihelka, D.: Minimum Text Corpus Selection for Limited Domain Speech Synthesis (DOI 10.1007/978-3-319-10816-2_48). The corpus was used to create a limited domain speech synthesis system capable of simulating a pilot communication with an ATC officer.
- Rights:
- Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB
609. English-Bulgarian INTERA
- Type:
- corpus
- Language:
- Bulgarian and English
- Description:
- Alignment – TMX, structural – XCES, morphosyntactic – XCES, MTE tagset
- Rights:
- Not specified
610. English-Czech Corpus from Wikipedia
- Creator:
- Štromajerová, Adéla, Baisa, Vít, and Blahuš, Marek
- Publisher:
- Masaryk University, NLP Centre
- Type:
- text and corpus
- Subject:
- Wikipedia
- Language:
- English and Czech
- Description:
- Sentence-parallel corpus made from English and Czech Wikipedias based on translated articles from English into Czech. The work done is described in the paper: ŠTROMAJEROVÁ, Adéla, Vít BAISA a Marek BLAHUŠ. Between Comparable and Parallel: English-Czech Corpus from Wikipedia. In RASLAN 2016 Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2016. s. 3-8, 6 s. ISBN 978-80-263-1095-2.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB