Harvested from: LINDAT/CLARIAH-CZ repository - LINDAT/CLARIAH-CZ Catalog Search Results

114. Antonín Pelc (painter)

Creator:: Krátký film and Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: ateliér malířský, narozeniny Pelc Antonín 60., Galerie osobností, People::Pelc Antonín (1895-1967), People::Záhořová Jarmila (1924-1958), and Československé filmové noviny 1952/43
Language:: Czech
Description:: Painter Antonín Pelc with his wife Jarmila Záhořová in the studio in a segment from Československé filmové noviny (Czechoslovak Film News) 1952, issue no. 43. The painter in his studio on the day of his 60th birthday in a segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1955, issue no. 4.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

115. Antonín Přecechtěl (otolaryngologist)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: otorinolaryngologie, lékař při práci, Galerie osobností, Places::Praha::Klinika nemocí ušních::ústních a hrtanových, and People::Přecechtěl Antonín (1885-1971)
Language:: Czech
Description:: Professor and otolaryngologist Antonín Přecechtěl working at the clinic.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

116. Antonín Růžička (painter)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: viržínko, obrazy Růžička Antonín, obraz akt, Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, and People::Růžička Antonín (1887-1951)
Language:: No linguistic content
Description:: Footage of paintings by Antonín Růžička. Painter Růžička on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

117. Antonín Svoboda (inventor)

Creator:: Krátký film and Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, People::Svoboda Antonín (1907-1980), and Československý filmový týdeník 1954/23
Language:: No linguistic content
Description:: Professor Antonín Svoboda, the inventor of punch cards, with his colleagues in the designing office in a fragmented segment from Československý filmový týdeník (Czechoslovak Film Weekly Newsreel) 1954, issue no. 23.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

118. Antonín Vaverka (actor)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: film Tam na horách ukázky, Galerie osobností, People::Vaverka Antonín (1868-1937), People::Romona Julietta (neuvedeno-), and Tam na horách
Language:: No linguistic content
Description:: Actor Antonín Vaverka with his colleague Julietta Romona in Tam na horách (Up There in the Mountains, dir. Sidney M. Goldin, 1920). Vaverka with his colleague Theodor Pištěk.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

119. Antonín Vávra (opera singer)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností and People::Vávra Antonín (1847-1932)
Language:: No linguistic content
Description:: Footage of opera singer Antonín Vávra.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

120. Antonín Vlas (film entrepreneur)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: park městský, lavička v parku, Galerie osobností, and People::Vlas Antonín (1885-1946)
Language:: No linguistic content
Description:: Film entrepreneur Antonín Vlas in a park.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

121. Antonín Vlas (film professional)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností and People::Vlas Antonín (1885-1946)
Language:: No linguistic content
Description:: Footage of film professional Antonín Vlas.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

122. APE Shared Task WMT17: Human Post-edits Test Data DE-EN

Creator:: Turchi, Marco, Chatterjee, Rajen, and Negri, Matteo
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: Human post-edits, machine translation, shared task, automatic post-editing, and post-editing
Language:: English
Description:: Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 English sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2132. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

123. APE Shared Task WMT17: Human Post-edits Test Data EN-DE

Creator:: Turchi, Marco, Chatterjee, Rajen, and Negri, Matteo
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: machine translation, human post-edits, shared task, automatic post-editing, and post-editing
Language:: German
Description:: Human post-edited test sentences for the WMT 2017 Automatic post-editing task. This consists in 2,000 German sentences belonging to the IT domain and already tokenized. Source and target segments can be downloaded from: https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2133. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

124. APE Shared Task WMT18: Human Post-edits and References Test Data EN-DE PBSMT

Creator:: Turchi, Marco, Negri, Matteo, and Chatterjee, Rajen
Publisher:: Fondazione Bruno Kessler, Trento, Italy
Type:: text and corpus
Subject:: automatic post-editing, post-editing, phrase-based MT, and reference translation
Language:: German
Description:: Human post-edited and reference test sentences for the En-De PBSMT WMT 2018 Automatic post-editing task. This consists of 2,000 German sentences for each file belonging to the IT domain and already tokenized. All data is provided by the EU project QT21 (http://www.qt21.eu/).
Rights:: AGREEMENT ON THE USE OF DATA IN QT21 APE Task, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21, and PUB

125. Apertium Old Catalan morphological analyzer

Publisher:: Universidad de Alicante
Type:: toolService
Subject:: morphological analyzer
Language:: Catalan
Description:: A RESTful morphological analyzer for Old Catalan.
Rights:: Not specified

126. Aquén - Toponimia galega

Publisher:: TALG Research Group (University of Vigo)
Type:: lexicalConceptualResource
Language:: Galician
Description:: Galician Toponymy Database, 40,000 entries
Rights:: Not specified

127. Arabic ACL corpus

Creator:: Salah Elfahal Elebaed, Hoyam, Kasbi, Mohammed, Nasri, Mohammed, and Bouzoubaa, Karim
Publisher:: International Journal of Computer Science Trends and Technology (IJCST)
Type:: text and corpus
Subject:: Controlled Natural Language, Arabic CNL, ACL, Arabic Corpus, and and TEI.
Language:: Arabic
Description:: This corpus constitutes all sentences representing the Arabic Controlled Language (ACL). It contains 551 sentences taken from four textbooks and websites dedicated to teach Arabic language to kids such as: a) First grade book, Republic of Sudan (كتاب الصف الاول جمهورية السودان), b) Al Jazeera Educational Site (موقع الجزيرة التعليمي), c) Bella Preparatory School Girls Forum (منتدى مدرسة بيلا الاعدادية بنات), and d) Albahr website (موقع انا البحر). These sentences are respecting 52 ACL rules. The average number of sentences for each rule is 10.6. All sentences in the corpus were analyzed by Farasa syntactic parser to confirm they are correctly analyzed. The validity of the parsing was done manually by linguist experts. The structure of this corpus is made of a header and a body. The header consists of a set of metadata that describe the corpus, such as the corpus name, the authors, the sources and further meta data. While the header is made of metadata, the body contains rules. Each rule has a code, a structure and all sentences respecting that rule. For each sentence, we store an id, the vowelledand unvowelled text as well as the result of parsing using Farasa.
Rights:: Not specified

128. Arabic characters lexicon

Creator:: Namly, Driss
Publisher:: Ibtikarat team
Type:: text, lexicon, and lexicalConceptualResource
Subject:: alphabets
Language:: Arabic
Description:: A XML-based file containing all Arabic characters (letters, vowels and punctuations). Each character described with a description, different displays (isolated, at the beginning, middle and the end of a word), a codification (Unicode, others could be added later), and two transliterations (Buckwalter and wiki)
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

129. Arabic Enclitics Lexicon

Creator:: Loukili, Taoufik
Publisher:: Ibtikarat team
Type:: text, lexicon, and lexicalConceptualResource
Subject:: Enclitics
Language:: Arabic
Description:: An XML-based file containing all Arabic enclitics
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

130. Arabic Morphological evaluation corpus

Creator:: Jaafar, Younes
Publisher:: Ibtikarat team
Type:: text, wordList, and lexicalConceptualResource
Subject:: morphological analysis and benchmarking corpus
Language:: Arabic
Description:: An annotated corpus dedicated to the benchmark and evaluation of Arabic morphological analyzers. It consists of 100 words with all their possible analysis. The corpus contains several morphological information such as stem, pattern, root, lemma, etc.
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

131. Arabic Particles Lexicon

Creator:: Namly, Driss
Publisher:: Ibtikarat team
Type:: text, lexicon, and lexicalConceptualResource
Subject:: particles
Language:: Arabic
Description:: An XML-based file containing Arabic particles
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

132. Arabic Phonetic Rules

Creator:: Mustafa, Ebtihal and Bouzoubaa, Karim
Publisher:: languages journal
Type:: text, other, and lexicalConceptualResource
Subject:: Arabic and phonetic rules
Language:: Arabic
Description:: Description: this xml file describes the Arabic phonetic constraints (rules) resulting from the analysis of the lexicons(Taj Alarous, Al ain, Lisan Al arab, Alwassit and almoassir ). These rules are to be applied to Arabic roots and are classified into a number of categories. Each category has a certain type of constraints as follow: The first category defines that the root must not consist of three identical letters. The second category defines that the root must not start with two repeating letters. The third category lists the letters that must not occur in the same root, regardless of their order. The fourth category lists the letters that may not be used together in a certain order in a root. ISLRN: 190-535-098-473-3
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

133. Arabic Proclitics Lexicon

Creator:: Loukili, Taoufik
Publisher:: Ibtikarat team
Type:: text, lexicon, and lexicalConceptualResource
Subject:: proclitics
Language:: Arabic
Description:: An XML-based file containing all Arabic proclitics
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

134. Arabic Special verbs Lexicon

Creator:: Namly, Driss
Publisher:: Ibtikarat team
Type:: text, lexicon, and lexicalConceptualResource
Subject:: particles
Language:: Arabic
Description:: An XML-based file containing Arabic Stop-words respecting verbs syntax
Rights:: Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB

135. Arabic Triliteral roots Lexicon

Creator:: Mustafa, Ebtihal and Bouzoubaa, Karim
Publisher:: MDPI language jurnal
Type:: text, lexicon, and lexicalConceptualResource
Subject:: Arabic language, Arabic roots, lexicons, phonetic system, bigram frequencies, and roots weight.
Language:: Arabic
Description:: Description: This xml file is a lexicon containing all 21952 (28x28x28) Arabic triliteral combinations (roots). the file is split into three parts as follow: the first part contains the phonetic constraints that must be taken into account in the formation of Arabic roots (for more details see all_phonetic_rules.xml in http://arabic.emi.ac.ma/alelm/?q=Resources). the second part contains the lexicons that were used to create this lexicon (see in lexicons tag). the third part contains the roots. ISLRN: 813-907-570-946-2
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

136. Arabic WordNet ontology

Creator:: Abouenour, Lahcen, Bouzoubaa, Karim, and Rosso, Paulo
Publisher:: Wordnet
Type:: text, wordnet, and lexicalConceptualResource
Subject:: WordNet
Language:: Arabic
Description:: This improved version is an extension of the original Arabic Wordnet (http://globalwordnet.org/arabic-wordnet/awn-browser/), it was enriched by new verbs, nouns including the broken plurals that is a specific form for Arabic words.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

137. Araucaria

Publisher:: School of Computing, University of Dundee
Type:: toolService
Subject:: argument analyzer
Description:: Araucaria is a software tool for analysing arguments. It aids a user in reconstructing and diagramming an argument using a simple point-and-click interface. The software also supports argumentation schemes, and provides a user-customisable set of schemes with which to analyse arguments. Written in Java, released under the GNU General Public License.
Rights:: Not specified

138. Arborest

Type:: corpus
Language:: Estonian
Description:: 149 sentences, VISL tagset
Rights:: Not specified

139. Arno Kraus (writer)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, and People::Kraus Arno (1895-1974)
Language:: No linguistic content
Description:: Writer Arno Kraus on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

140. Arno Nauman (painter)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: klobouk smeknutí, Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, and People::Nauman Arno (1887-1959)
Language:: No linguistic content
Description:: Painter Arno Nauman on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

141. Arnold Jirásek (medical doctor)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: laboratoř interiér, zkumavky laboratorní, fotoaparát, Galerie osobností, and People::Jirásek Arnold (1887-1960)
Language:: No linguistic content
Description:: Professor and medical doctor Arnold Jirásek at work at the 1st Surgery Clinic of the General Faculty Hospital in Prague (VFN).
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

142. Artificial Treebank with Ellipsis

Creator:: Droganova, Kira, Zeman, Daniel, Kanerva, Jenna, and Ginter, Filip
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: universal dependencies, ellipsis, and gapping
Language:: English, Czech, Finnish, Russian, and Slovak
Description:: Artificially created treebank of elliptical constructions (gapping), in the annotation style of Universal Dependencies. Data taken from UD 2.1 release, and from large web corpora parsed by two parsers. Input data are filtered, sentences are identified where gapping could be applied, then those sentences are transformed, one or more words are omitted, resulting in a sentence with gapping. Details in Droganova et al.: Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions, LREC 2018, Miyazaki, Japan.
Rights:: Licence Universal Dependencies v2.1, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.1, and PUB

143. Arts and Humanities Data Service Literature, Languages and Linguistics

Type:: corpus
Language:: English
Description:: Electronic texts, corpora, lexicons. other
Rights:: Not specified

144. Artuš Černík (film journalist, poet)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností and People::Černík Artuš (1900-1953)
Language:: No linguistic content
Description:: Film journalist and poet Artuš Černík on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

145. Aspect-Term Annotated Customer Reviews in Czech

Creator:: Fiala, Ondřej
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: sentiment analysis, opinion target, and customer review
Language:: Czech
Description:: This dataset contains a number of user product reviews which are publicly available on the website of an established Czech online shop with electronic devices. Each review consists of negative and positive aspects of the product. This setting pushes the customer to rate important characteristics. We have selected 2000 positive and negative segments from these reviews and manually tagged their targets. Additionally, we selected 200 of the longest reviews and annotated them in the same way. The targets were either aspects of the evaluated product or some general attributes (e.g. price, ease of use).
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

146. Assigning lemmas and part-of-speech to wordform lists

Type:: toolService
Language:: Slovenian
Description:: online service
Rights:: Not specified

147. ATCC: Pronunciation lexicon and n-gram counts for ASR module

Creator:: Šmídl, Luboš
Publisher:: University of West Bohemia, Department of Cybernetics
Type:: text, lexicalConceptualResource, and other
Subject:: pronunciation lexicon, n-gram counts, and language model
Language:: English
Description:: The corpus contains pronunciation lexicon and n-gram counts (unigrams, bigrams and trigrams) that can be used for constructing the language model for air traffic control communication domain. It could be used together with the Air Traffic Control Communication corpus (http://hdl.handle.net/11858/00-097C-0000-0001-CCA1-0). and Technology Agency of the Czech Republic, project No. TA01030476
Rights:: Attribution-NonCommercial 3.0 Unported (CC BY-NC 3.0), http://creativecommons.org/licenses/by-nc/3.0/, and PUB

148. Atlas of Place Names

Publisher:: The Research Institute for the Languages of Finland
Type:: toolService
Language:: Finnish
Description:: The digital atlas illustrates the distribution of 234 common Finnish place-name elements based on data in the Names Archive.
Rights:: Not specified

150. Audio Recordings Archive

Publisher:: The Research Institute for the Languages of Finland
Type:: corpus
Language:: Finnish
Description:: The Audio Recordings Archive (Suomen kielen nauhoitearkisto) holds over 23,000 hours of recordings collected since 1959, providing authentic samples of Finnish dialects, languages related to Finnish, and other world languages. The collection additionally includes samples of Finnish dialects spoken in Sweden, Norway, Ingria, the United States and Australia. Digitisation of the audio bank was undertaken in 1999. Over half of its content has been digitised, totalling about 13,000 hours of recordings.
Rights:: Not specified

151. AudioPSP 24.01: Audio recordings of proceedings of the Chamber of Deputies of the Parliament of the Czech Republic

Creator:: Kopp, Matyáš
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: audio and corpus
Subject:: Parliament of the Czech Republic
Language:: Czech
Description:: This record contains audio recordings of proceedings of the Chamber of Deputies of the Parliament of the Czech Republic. The recordings have been provided by the official websites of the Chamber of Deputies, and the set contains them in their original format with no further processing. Recordings cover all available audio files from 2013-11-25 to 2023-07-26. Audio files are packed by year (2013-2023) and quarter (Q1-Q4) in tar archives audioPSP-YYYY-QN.tar. Furthermore, there are two TSV files: audioPSP-meta.quarterArchive.tsv contains metadata about archives, and audioPSP-meta.audioFile.tsv contains metadata about individual audio files.
Rights:: Public Domain Dedication (CC Zero), http://creativecommons.org/publicdomain/zero/1.0/, and PUB

152. Augustin Berger (dancer, choreographer)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností
Language:: No linguistic content
Description:: Dancer and choreographer Augustin Berger with an unidentified woman on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

153. Augustin Jirouch (owner of a mobile cinema)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: shromáždění slavnostní, nápis 30 let ve službách kinematografu, Galerie osobností, People::Baarová Lída (1914-2000), People::Borský Vladimír (1904-1962), People::Jirouchová (neuvedeno-), and People::Jirouch Augustin (1884-1954)
Language:: No linguistic content
Description:: Augustin Jirouch, the owner of a mobile cinema, with his wife on Bohumil Veselý's balcony. The couple attending an event called �38 years in the service of a cinematographer´ on 5 April 1941. Screening of Paličova dcera (The Incendiary's Daughter, dir. Vladimír Borský, 1941) at the Šibřina Cinema attended by Vladimír Borský and Lída Baarová.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

154. Augustin Vilém Ludvík (film professional)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností and People::Ludvík Augustin Vilém (1890-1945)
Language:: No linguistic content
Description:: Family shots of film professional Augustin Vilém Ludvík. Ludvík With his wife at Lucerna Palace in Prague in 1926. Ludvík with his wife and his daughter Eva by the Sandberk Water Reservoir in 1927. Ludvík in his later years with an unidentified man.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

155. Augustin Vondřich (cyclist)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, and People::Vondřich Augustin (1880-1958)
Language:: No linguistic content
Description:: Cyclist Augustin Vondřich on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

156. Automatic Paraphrases of Czech Reference Sentences for WMT11, 13 and 14

Creator:: Barančíková, Petra and Tamchyna, Aleš
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: machine translation, automatic evaluation, and paraphrasing
Language:: Czech
Description:: This dataset contains automatic paraphrases of Czech official reference translations for the Workshop on Statistical Machine Translation shared task. The data covers the years 2011, 2013 and 2014. For each sentence, at most 10000 paraphrases were included (randomly selected from the full set). The goal of using this dataset is to improve automatic evaluation of machine translation outputs. If you use this work, please cite the following paper: Tamchyna Aleš, Barančíková Petra: Automatic and Manual Paraphrases for MT Evaluation. In proceedings of LREC, 2016.
Rights:: Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB

157. Automatically generated spelling correction corpus for Czech (Czech-SEC-AG)

Creator:: Hajič, Jan, Náplava, Jakub, and Straka, Milan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text and corpus
Subject:: spelling correction and natural language correction
Language:: Czech
Description:: Automatically generated spelling correction corpus for Czech (Czesl-SEC-AG) is a corpus containg text with automatically generated spelling errors. To create spelling errors, a character error model containing probabilities of character substitution, insertion, deletion and probabilities of swaping two adjacent characters is used. Besides these probabilities, also the probabilities of changing character casing are considered. The original clean text on which the spelling errors were generated is PDT3.0 (http://hdl.handle.net/11858/00-097C-0000-0023-1AAF-3). The original train/dev/test sentence split of PDT3.0 corpus is preserved in this dataset. Besides the data with artificial spelling errors, we also publish texts from which the character error model was created. These are the original manual transcript of an audiobook Švejk and its corrected version performed by authors of Korektor (http://ufal.mff.cuni.cz/korektor). These data are similarly to CzeSL Grammatical Error Correction Dataset (CzeSL-GEC: http://hdl.handle.net/11234/1-2143) processed into four sets based on error difficulty present.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

158. Awetí corpus

Publisher:: Freie Universität Berlin
Type:: corpus
Description:: Documentation of the Awetí project (DoBeS project)
Rights:: Code of conduct

159. BABEL Estonian Database

Publisher:: Institute of Cybernetics at Tallinn University of Technology
Type:: corpus
Language:: Estonian
Description:: The database consists of three sets: - Many Talker Set: 30 males, 30 females; each to read 50 numbers, 1-2 connected passages, 1 block of "filler" sentences, and 1 block of syllables. - Few Talker Set: 4 males, 4 females; each to read 50 numbers, 10 connected passages, 1 block of "filler" sentences, and 2-3 blocks of syllables. - Very Few Talker Set: 1 male, 1 female; each to read 2 blocks of 50 numbers, 40 connected passages, 4 blocks of "filler" sentences, and 9 blocks of syllables. Total amount ca 12 hours of speech.
Rights:: Not specified

160. Balaxan Corpus of Kurmanji

Creator:: Rahimi, Adel
Publisher:: Imam Khomeini International University
Type:: audio and corpus
Subject:: speech corpus and corpus
Language:: Northern Kurdish
Description:: Balaxan is the first speech corpus of Kurmanji Kurdish with 58 utterances by speakers of Kurmanji. utterances are divided into 4 categories based on their sentence structures: Declarative, Imperative, Interrogative, and Exclamatory. The corpus has subtitles both in Kurmanji (Latin alphabet) and English.
Rights:: Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB

161. Banco de neologismos 2004-2007

Publisher:: Instituto Cervantes and Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: lexicalConceptualResource
Subject:: neologisms database
Language:: Catalan
Description:: Repository of neologisms (15.375 entries)
Rights:: Not specified

162. Barbora Markéta Eliášová (writer, explorer)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, People::Eliášová Barbora Markéta (1874-1957), and People::Plíhalová Eugenie (neuvedeno-)
Language:: No linguistic content
Description:: Writer and explorer Barbora Markéta Eliášová with nurse Eugenie Plíhalová on the balcony of the Teachers' Boarding House in Prague-Pankrác.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

163. Base de synonymes CRISCO

Type:: lexicalConceptualResource
Language:: French
Description:: 49.000, RDB
Rights:: Not specified

164. Basic vocabulary on the Human Genome

Publisher:: Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
Type:: lexicalConceptualResource
Language:: Catalan, English, French, Galician, Italian, Portuguese, and Spanish
Description:: A vocabulary resulting from the cooperation of the groups of REALITER network that collects the basic terminology mostly used in texts about Genomics. It contains equivalents in English, Peninsular and Latinamerican Spanish, French, Italian, Galician, Portuguese and Catalan.
Rights:: Not specified

165. Bavaria's Dialects Online

Creator:: Raaf, Manuel
Publisher:: Bayerische Akademie der Wissenschaften and Bavarian Academy of Sciences and Humanities
Type:: text, machineReadableDictionary, and lexicalConceptualResource
Subject:: dictionary, web dictionary, Dialektologie, dialect variation, language variation, Dialectology, dialectology, Bavarian, Bavaria, Swabian, Frankish, Franconian Language, and spoken language
Language:: German, Bavarian, Swabian, and Frankish
Description:: Bavaria's Dialects Online (BDO) is the digital language information system of the three projects "Bavarian Dictionary", "Franconian Dictionary", and "Dialectological Information System of Bavarian Swabia". The database combines the research results of dialect research and presents dictionary articles as well as research data in a freely accessible online tool. BDO is not only aimed at scholars, but also at the lay public interested in the language. Here, the vocabulary of all Bavarian dialects is collected in one place and made accessible. The system shows the richness of the dialects of Bavaria in combination. With the new database, one will be able to compare the dialect vocabulary of Old Bavaria, Franconia and Swabia. Authentic dialect evidence is used to illustrate the dialect words in their variety of meanings and regional distribution, as well as to show their use in idioms, proverbs, and much more. BDO allows a whole new look at the vocabulary of the dialects of all parts of the state of Bavaria.
Rights:: Not specified

167. Bedřich Antonín Wiedermann (composer)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: varhany, hra na varhany, nástroj hudební varhany, Galerie osobností, People::Wiedermann Bedřich Antonín (1883-1951), and People::Wiedermannová Františka (1896-1985)
Language:: No linguistic content
Description:: Composer Bedřich Antonín Wiedermann playing the organ. Wiedermann with his wife Františka in front of their house.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

168. Bedřich Hrozný (orientalist)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností and People::Hrozný Bedřich (1879-1952)
Language:: No linguistic content
Description:: Orientalist Bedřich Hrozný in front of his villa in Prague-Ořechovka.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

169. Bedřich Karen (actor)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: film Neznámé matky ukázka, Galerie osobností, People::Karen Bedřich (1887-1964), People::Iblová Anna (1893-1954), People::Hlavatý František (1873-1952), and Neznámé matky
Language:: No linguistic content
Description:: Actor Bedřich Karen with his colleagues Anna Iblová and Vladimír Hlavatý in Neznámé matky (Unknown Mothers, dir. František Hlavatý, 1921).
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

170. Bedřich Kočí (publisher)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Hodkovička::v zátiší::vila Bedřicha Kočího, and People::Kočí Bedřich (1869-1955)
Language:: No linguistic content
Description:: Publisher Bedřich Kočí in front of his villa in Prague-Hodkovičky.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

171. Bedřich Lak (comedian,acrobat and illusionist)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, and People::Lak Bedřich (1896-1971)
Language:: No linguistic content
Description:: Comedian, acrobat and illusionist Bedřich (Beda) Lak on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

172. Bedřich Miroslav Böhnel (writer)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, People::Böhnel Miroslav Bedřich (1886-1962), and People::Böhnelová (neuvedeno-)
Language:: No linguistic content
Description:: Writer Miroslav Bedřich Böhnel with his wife on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

173. Bedřich Plaške (opera singer)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Nové Město::Vodičkova, Places::Praha::Nové Město::Školská::pavlač domu, and People::Plaške Bedřich (1875-1952)
Language:: No linguistic content
Description:: Opera singer Bedřich Plaške with his wife and daughter on Vodičkova Street and on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

174. Bedřich Polanecký (film professional)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností and People::Polanecký Bedřich (neuvedeno-)
Language:: No linguistic content
Description:: Bedřich Polanecký working at a film lab.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

175. Bedřich Slavík (literary historian)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, and People::Slavík Bedřich (1911-1979)
Language:: No linguistic content
Description:: Literary historian Bedřich Slavík on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

176. Bedřich Spurný (concert director)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, People::Spurný Bedřich (1875-1933), and Velcí hudebníci a zpěváci
Language:: No linguistic content
Description:: A close-up of concert director Bedřich Spurný in archival footage from a newsreel segment from the documentary Velcí hudebníci a zpěváci (Great Musicians and Singers, 1931), now considered lost.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

177. Bedřich Voldán (violin teacher)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: housle, hra na housle, nástroj hudební housle, Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, and People::Voldán Bedřich (1892-1978)
Language:: No linguistic content
Description:: Violin teacher Bedřich Voldán with an unidentified woman on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

178. Bedřiška Seidlová (actor)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: film Hudba srdcí ukázka, Galerie osobností, People::Seidlová Bedřiška (1914-1995), People::Průcha Jaroslav (1898-1963), People::Veverka Ludvík (1892-1947), and Hudba srdcí
Language:: No linguistic content
Description:: Actress Bedřiška Seidlová with her colleague Jaroslav Průcha in filmu Hudba srdcí (The Music of the Heart, dir. Svatopluk Innemann, 1934).
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

179. Běla Friedländerová (physical education promoter, member of the

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: lavička, Galerie osobností, and People::Pilc Václav (1891-1958)
Language:: No linguistic content
Description:: Běla Friedländerová, a physical education promoter and member of the national swimming team, with her husband, architect Václav Pilc, on a bench.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

180. Bengali Visual Genome 1.0

Creator:: Sen, Arghyadeep, Parida, Shantipriya, Kotwal, Ketan, Panda, Subhadarshi, Bojar, Ondřej, and Dash, Satya Ranjan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: image and corpus
Subject:: multi-modal, neural machine translation, image captioning, Bengali captioning, and English-Bengali Multimodal Corpus
Language:: English and Bengali
Description:: Data ------- Bengali Visual Genome (BVG for short) 1.0 has similar goals as Hindi Visual Genome (HVG) 1.1: to support the Bengali language. Bengali Visual Genome 1.0 is the multi-modal dataset in Bengali for machine translation and image captioning. Bengali Visual Genome is a multimodal dataset consisting of text and images suitable for English-to-Bengali multimodal machine translation tasks and multimodal research. We follow the same selection of short English segments (captions) and the associated images from Visual Genome as HGV 1.1 has. For BVG, we manually translated these captions from English to Bengali taking the associated images into account. The manual translation is performed by the native Bengali speakers without referring to any machine translation system. The training set contains 29K segments. Further 1K and 1.6K segments are provided in development and test sets, respectively, which follow the same (random) sampling from the original Hindi Visual Genome. A third test set is called the ``challenge test set'' and consists of 1.4K segments. The challenge test set was created for the WAT2019 multi-modal task by searching for (particularly) ambiguous English words based on the embedding similarity and manually selecting those where the image helps to resolve the ambiguity. The surrounding words in the sentence however also often include sufficient cues to identify the correct meaning of the ambiguous word. Dataset Formats --------------- The multimodal dataset contains both text and images. The text parts of the dataset (train and test sets) are in simple tab-delimited plain text files. All the text files have seven columns as follows: Column1 - image_id Column2 - X Column3 - Y Column4 - Width Column5 - Height Column6 - English Text Column7 - Bengali Text The image part contains the full images with the corresponding image_id as the file name. The X, Y, Width and Height columns indicate the rectangular region in the image described by the caption. Data Statistics --------------- The statistics of the current release are given below. Parallel Corpus Statistics -------------------------- Dataset Segments English Words Bengali Words ---------- -------- ------------- ------------- Train 28930 143115 113978 Dev 998 4922 3936 Test 1595 7853 6408 Challenge Test 1400 8186 6657 ---------- -------- ------------- ------------- Total 32923 164076 130979 The word counts are approximate, prior to tokenization. Citation -------- If you use this corpus, please cite the following paper: @inproceedings{hindi-visual-genome:2022, title= "{Bengali Visual Genome: A Multimodal Dataset for Machine Translation and Image Captioning}", author={Sen, Arghyadeep and Parida, Shantipriya and Kotwal, Ketan and Panda, Subhadarshi and Bojar, Ond{\v{r}}ej and Dash, Satya Ranjan}, editor={Satapathy, Suresh Chandra and Peer, Peter and Tang, Jinshan and Bhateja, Vikrant and Ghosh, Anumoy}, booktitle= {Intelligent Data Engineering and Analytics}, publisher= {Springer Nature Singapore}, address= {Singapore}, pages = {63--70}, isbn = {978-981-16-6624-7}, doi = {10.1007/978-981-16-6624-7_7}, }
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

181. Beno Blachut (opera singer)

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností, Places::Praha::Nové Město::Školská::pavlač domu, and People::Blachut Beno (1913-1985)
Language:: No linguistic content
Description:: Opera singer Beno Blachut on Bohumil Veselý's balcony.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

183. Bibliografie zur deutschen Grammatik (BDG)

Publisher:: Institut für Deutsche Sprache
Type:: toolService
Language:: German
Description:: Online Bibliography, bibliographic database
Rights:: Not specified

184. Bibliotheca Augustana / Bibliotheca Germanica

Publisher:: Hochschule Augsburg
Type:: corpus
Subject:: Germanistik
Language:: German
Description:: Chronology of German literature (Old High German literature, Middle High German literature, Early New High German literature, New High German literature); Chronologie der deutschen Literatur (alt-, mittel-, frühneu-, neuhochdeutsche Literatur)
Rights:: Not specified

185. Bilder-Conversations-Lexikon

Type:: lexicalConceptualResource
Subject:: Germanistik
Language:: German
Description:: digitale Ausgabe der ersten Auflage des "Bilder-Conversations-Lexikons für das deutsche Volk" (1837-1841); "Handbuch zur Verbreitung gemeinnütziger Kenntnisse und zur Unterhaltung" (Selbstbeschreibung im Vorwort); beinhaltet zahlreiche Abbildungen und Landkarten
Rights:: Not specified

187. BitPar

Creator:: Schmid, Helmut
Publisher:: University of Stuttgart
Type:: toolService
Subject:: parser
Description:: Statistical parser
Rights:: Not specified

188. Blahoslav Šeplavý (Director of the Office of the Czech Academy

Creator:: Veselý, Bohumil
Publisher:: Národní filmový archiv
Type:: video and clip
Subject:: Galerie osobností and People::Šeplavý Blahoslav (1897-1960)
Language:: No linguistic content
Description:: Bohuslav Šeplavý, the Director of the Office of the Czech Academy of Sciences, and his wife in the garden of the family villa.
Rights:: http://creativecommons.org/licenses/by-nc-nd/4.0/, PUB, and Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

189. Blingual Language Acquisition Julka Corpus

Publisher:: Max Planck Institute for Psycholinguistics
Type:: corpus
Language:: German and Polish
Description:: Language Acquisition corpus
Rights:: Not specified

190. BNF Converter

Publisher:: Språkbanken, Dept. of Swedish Language, Göteborg University
Type:: toolService
Subject:: compiler construction and grammar
Description:: The BNF Converter is a compiler construction tool generating a compiler front-end from a Labelled BNF grammar.
Rights:: Not specified

191. Bochumer Mittelhochdeutsch-Korpus

Publisher:: Ruhr-Universität Bochum
Type:: corpus
Subject:: Germanistik
Language:: German
Description:: Verses, prose and certificates from Middle High German; mittelhochdeutsche Verse, Prosastücke und Urkunden
Rights:: Not specified