Contributor: Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2018101@@LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy@@nationalFunds@@ / Creator: Kyjánek, Lukáš / Rights: PUB / Type: lexicalConceptualResource

Start Over Contributor Ministerstvo školství, mládeže a tělovýchovy České republiky@@LM2018101@@LINDAT/CLARIAH-CZ: Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy@@nationalFunds@@ Creator Kyjánek, Lukáš Rights PUB Type lexicalConceptualResource

1. Package of word embeddings of Czech from a large corpus

Creator:: Kyjánek, Lukáš and Bonami, Olivier
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, computationalLexicon, and lexicalConceptualResource
Subject:: word embeddings, word vectors, large corpus, word2vec, skipgram, and cbow
Language:: Czech
Description:: This package comprises eight models of Czech word embeddings trained by applying word2vec (Mikolov et al. 2013) to the currently most extensive corpus of Czech, namely SYN v9 (Křen et al. 2022). The minimum frequency threshold for including a word in the model was 10 occurrences in the corpus. The original lemmatisation and tagging included in the corpus were used for disambiguation. In the case of word embeddings of word forms, units comprise word forms and their tag from a positional tagset (cf. https://wiki.korpus.cz/doku.php/en:pojmy:tag) separated by '>', e.g., kočka>NNFS1-----A----. The published package provides models trained on both tokens and lemmas. In addition, the models combine training algorithms (CBOW and Skipgram) and dimensions of the resulting vectors (100 or 500), while the training window and negative sampling remained the same during the training. The package also includes files with frequencies of word forms (vocab-frequencies.forms) and lemmas (vocab-frequencies.lemmas).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

2. Semantic annotation of noun/verb conversion in Czech

Creator:: Ševčíková, Magda, Kyjánek, Lukáš, Hledíková, Hana, and Staňková, Anna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: other, text, and lexicalConceptualResource
Subject:: conversion, semantic, noun, verb, word formation, and Czech
Language:: Czech
Description:: The item contains a list of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic categories that characterize semantic relations between the noun and the verb in each conversion pair. Semantic categories were assigned manually by two human annotators based on a set of sentences containing the noun and the verb from individual conversion pairs. In addition to the list of paradigms, the item contains a set of 739 files (a separate file for each conversion pair) annotated by the annotators in parallel and a set of 2,058 files containing the final annotation, which is included in the list of paradigms.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/4.0/

3. Universal Derivations v0.5

Creator:: Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: universal derivations, uder, word-formation, derivation, derivational morphology, and lexical network
Language:: Czech, English, Estonian, Finnish, French, German, Latin, Persian, Polish, Portuguese, and Spanish
Description:: Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains eleven harmonized resources covering eleven different languages.
Rights:: Universal Derivations v0.5 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-0.5, and PUB

4. Universal Derivations v1.0

Creator:: Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: universal derivations, uder, word-formation, derivation, derivational morphology, lexical network, and harmonization
Language:: Czech, English, Estonian, Finnish, German, French, Latin, Persian, Polish, Portuguese, Spanish, Catalan, Turkish, Scottish Gaelic, Russian, Swedish, Serbo-Croatian, Italian, Dutch, and Croatian
Description:: Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains twenty-seven harmonized resources covering twenty different languages.
Rights:: Universal Derivations v1.0 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-1.0, and PUB

5. Universal Derivations v1.1

Creator:: Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: lexicon, text, and lexicalConceptualResource
Subject:: universal derivations, uder, word-formation, derivation, derivational morphology, lexical network, and harmonization
Language:: Czech, English, Estonian, Finnish, German, French, Latin, Persian, Polish, Portuguese, Spanish, Catalan, Turkish, Scottish Gaelic, Russian, Swedish, Serbo-Croatian, Italian, Dutch, Croatian, and Slovenian
Description:: Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains thirty-one harmonized resources covering twenty-one different languages.
Rights:: Universal Derivations v1.1 License Agreement, PUB, and https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-1.1

6. Universal Segmentations 1.0 (UniSegments 1.0)

Creator:: Žabokrtský, Zdeněk, Bafna, Nyati, Bodnár, Jan, Kyjánek, Lukáš, Svoboda, Emil, Ševčíková, Magda, Vidra, Jonáš, Angle, Sachi, Ansari, Ebrahim, Arkhangelskiy, Timofey, Batsuren, Khuyagbaatar, Bella, Gábor, Bertinetto, Pier Marco, Bonami, Olivier, Celata, Chiara, Daniel, Michael, Fedorenko, Alexei, Filko, Matea, Giunchiglia, Fausto, Haghdoost, Hamid, Hathout, Nabil, Khomchenkova, Irina, Khurshudyan, Victoria, Levonian, Dmitri, Litta, Eleonora, Medvedeva, Maria, Muralikrishna, S. N., Namer, Fiammetta, Nikravesh, Mahshid, Padó, Sebastian, Passarotti, Marco, Plungian, Vladimir, Polyakov, Alexey, Potapov, Mihail, Pruthwik, Mishra, Rao B, Ashwath, Rubakov, Sergei, Samar, Husain, Sharma, Dipti Misra, Šnajder, Jan, Šojat, Krešimir, Štefanec, Vanja, Talamo, Luigi, Tribout, Delphine, Vodolazsky, Daniil, Vydrin, Arseniy, Zakirova, Aigul, and Zeller, Britta
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: universal segmentations, morphological segmentation, word segmentation, segmentation, morphology, morphemes, morphological dictionary, unisegments, morph, and multilingual
Language:: Czech, Catalan, German, English, Persian, Finnish, French, Serbo-Croatian, Croatian, Hungarian, Italian, Komi-Zyrian, Latin, Moksha, Mari (Russia), Mongolian, Erzya, Polish, Portuguese, Russian, Spanish, Swedish, Tajik, Udmurt, Armenian, Bengali, Hindi, Malayalam, Marathi, and Kannada
Description:: Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.
Rights:: Universal Segmentations 1.0 License Terms, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-unisegs-1.0, and PUB

1. Package of word embeddings of Czech from a large corpus

2. Semantic annotation of noun/verb conversion in Czech

3. Universal Derivations v0.5

4. Universal Derivations v1.0

5. Universal Derivations v1.1

6. Universal Segmentations 1.0 (UniSegments 1.0)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Creator

Show values starting with

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Original context has metadata only

Harvested from