Creator: Kyjánek, Lukáš - LINDAT/CLARIAH-CZ Catalog Search Results

Start Over Creator Kyjánek, Lukáš

1. DeriNet 1.6 (2018-09-24)

Creator:: Vidra, Jonáš, Kyjánek, Lukáš, Ševčíková, Magda, Žabokrtský, Zdeněk, Kalužová, Adéla, Dohnalová, Šárka, and Hudeček, Vojtěch
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, wordnet, and lexicalConceptualResource
Subject:: DeriNet, derivation, derivational morphology, lexical network, and MorfFlex
Language:: Czech
Description:: DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational relations between a derived word and its base word. The present version, DeriNet 1.6, contains 1,027,832 lexemes (sampled from the MorfFlex dictionary) connected by 803,404 derivational links. Furthermore, starting with version 1.5, DeriNet contains annotations related to compounding (compound words are distinguished by a special mark in their part-of-speech labels). Compared to version 1.5, version 1.6 was expanded by extracting potential links from dictionaries available under suitable licences, such as Wiktionary, and by enlarging the number of marked compounds.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

2. DeriNet 2.0

Creator:: Vidra, Jonáš, Žabokrtský, Zdeněk, Kyjánek, Lukáš, Ševčíková, Magda, and Dohnalová, Šárka
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, wordnet, and lexicalConceptualResource
Subject:: DeriNet, derivation, derivational morphology, lexical network, and MorfFlex
Language:: Czech
Description:: DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational or compositional relations between a derived word and its base word / words. The present version, DeriNet 2.0, contains 1,027,665 lexemes (sampled from the MorfFlex dictionary) connected by 808682 derivational and 600 compositional links. Compared to previous versions, version 2.0 uses a new format and contains new types of annotations: compounding, annotation of several morphological and other categories of lexemes, identification of root morphs of 244,198 lexemes, semantic labelling of 151,005 relations using five labels and identification of 13 fictitious lexemes.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), http://creativecommons.org/licenses/by-nc-sa/3.0/, and PUB

3. DeriNet 2.1

Creator:: Vidra, Jonáš, Žabokrtský, Zdeněk, Kyjánek, Lukáš, Ševčíková, Magda, Dohnalová, Šárka, Svoboda, Emil, and Bodnár, Jan
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: wordnet, text, and lexicalConceptualResource
Subject:: DeriNet, derivation, derivational morphology, lexical network, and MorfFlex
Language:: Czech
Description:: DeriNet is a lexical network which models derivational relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent word-formational relations between a derived word and its base word / words. The present version, DeriNet 2.1, contains 1,039,012 lexemes (sampled from the MorfFlex CZ 2.0 dictionary) connected by 782,814 derivational, 50,533 orthographic variant, 1,952 compounding, 295 univerbation and 144 conversion relations. Compared to the previous version, version 2.1 contains annotations of orthographic variants, full automatically generated annotation of affix morpheme boundaries (in addition to the roots annotated in 2.0), 202 affixoid lexemes serving as bases for compounding, annotation of corpus frequency of lexemes, annotation of verbal conjugation classes and a pilot annotation of univerbation. The set of part-of-speech tags was converted to Universal POS from the Universal Dependencies project.
Rights:: Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/3.0/

4. Package of word embeddings of Czech from a large corpus

Creator:: Kyjánek, Lukáš and Bonami, Olivier
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, computationalLexicon, and lexicalConceptualResource
Subject:: word embeddings, word vectors, large corpus, word2vec, skipgram, and cbow
Language:: Czech
Description:: This package comprises eight models of Czech word embeddings trained by applying word2vec (Mikolov et al. 2013) to the currently most extensive corpus of Czech, namely SYN v9 (Křen et al. 2022). The minimum frequency threshold for including a word in the model was 10 occurrences in the corpus. The original lemmatisation and tagging included in the corpus were used for disambiguation. In the case of word embeddings of word forms, units comprise word forms and their tag from a positional tagset (cf. https://wiki.korpus.cz/doku.php/en:pojmy:tag) separated by '>', e.g., kočka>NNFS1-----A----. The published package provides models trained on both tokens and lemmas. In addition, the models combine training algorithms (CBOW and Skipgram) and dimensions of the resulting vectors (100 or 500), while the training window and negative sampling remained the same during the training. The package also includes files with frequencies of word forms (vocab-frequencies.forms) and lemmas (vocab-frequencies.lemmas).
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB

5. Semantic annotation of noun/verb conversion in Czech

Creator:: Ševčíková, Magda, Kyjánek, Lukáš, Hledíková, Hana, and Staňková, Anna
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: other, text, and lexicalConceptualResource
Subject:: conversion, semantic, noun, verb, word formation, and Czech
Language:: Czech
Description:: The item contains a list of 2,058 noun/verb conversion pairs along with related formations (word-formation paradigms) provided with linguistic features, including semantic categories that characterize semantic relations between the noun and the verb in each conversion pair. Semantic categories were assigned manually by two human annotators based on a set of sentences containing the noun and the verb from individual conversion pairs. In addition to the list of paradigms, the item contains a set of 739 files (a separate file for each conversion pair) annotated by the annotators in parallel and a set of 2,058 files containing the final annotation, which is included in the list of paradigms.
Rights:: Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), PUB, and http://creativecommons.org/licenses/by-nc-sa/4.0/

6. Universal Derivations v0.5

Creator:: Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: universal derivations, uder, word-formation, derivation, derivational morphology, and lexical network
Language:: Czech, English, Estonian, Finnish, French, German, Latin, Persian, Polish, Portuguese, and Spanish
Description:: Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains eleven harmonized resources covering eleven different languages.
Rights:: Universal Derivations v0.5 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-0.5, and PUB

7. Universal Derivations v1.0

Creator:: Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: universal derivations, uder, word-formation, derivation, derivational morphology, lexical network, and harmonization
Language:: Czech, English, Estonian, Finnish, German, French, Latin, Persian, Polish, Portuguese, Spanish, Catalan, Turkish, Scottish Gaelic, Russian, Swedish, Serbo-Croatian, Italian, Dutch, and Croatian
Description:: Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains twenty-seven harmonized resources covering twenty different languages.
Rights:: Universal Derivations v1.0 License Agreement, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-1.0, and PUB

8. Universal Derivations v1.1

Creator:: Kyjánek, Lukáš, Žabokrtský, Zdeněk, Vidra, Jonáš, and Ševčíková, Magda
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: lexicon, text, and lexicalConceptualResource
Subject:: universal derivations, uder, word-formation, derivation, derivational morphology, lexical network, and harmonization
Language:: Czech, English, Estonian, Finnish, German, French, Latin, Persian, Polish, Portuguese, Spanish, Catalan, Turkish, Scottish Gaelic, Russian, Swedish, Serbo-Croatian, Italian, Dutch, Croatian, and Slovenian
Description:: Universal Derivations (UDer) is a collection of harmonized lexical networks capturing word-formation, especially derivational relations, in a cross-linguistically consistent annotation scheme for many languages. The annotation scheme is based on a rooted tree data structure, in which nodes correspond to lexemes, while edges represent derivational relations or compounding. The current version of the UDer collection contains thirty-one harmonized resources covering twenty-one different languages.
Rights:: Universal Derivations v1.1 License Agreement, PUB, and https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UDer-1.1

9. Universal Segmentations 1.0 (UniSegments 1.0)

Creator:: Žabokrtský, Zdeněk, Bafna, Nyati, Bodnár, Jan, Kyjánek, Lukáš, Svoboda, Emil, Ševčíková, Magda, Vidra, Jonáš, Angle, Sachi, Ansari, Ebrahim, Arkhangelskiy, Timofey, Batsuren, Khuyagbaatar, Bella, Gábor, Bertinetto, Pier Marco, Bonami, Olivier, Celata, Chiara, Daniel, Michael, Fedorenko, Alexei, Filko, Matea, Giunchiglia, Fausto, Haghdoost, Hamid, Hathout, Nabil, Khomchenkova, Irina, Khurshudyan, Victoria, Levonian, Dmitri, Litta, Eleonora, Medvedeva, Maria, Muralikrishna, S. N., Namer, Fiammetta, Nikravesh, Mahshid, Padó, Sebastian, Passarotti, Marco, Plungian, Vladimir, Polyakov, Alexey, Potapov, Mihail, Pruthwik, Mishra, Rao B, Ashwath, Rubakov, Sergei, Samar, Husain, Sharma, Dipti Misra, Šnajder, Jan, Šojat, Krešimir, Štefanec, Vanja, Talamo, Luigi, Tribout, Delphine, Vodolazsky, Daniil, Vydrin, Arseniy, Zakirova, Aigul, and Zeller, Britta
Publisher:: Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:: text, lexicon, and lexicalConceptualResource
Subject:: universal segmentations, morphological segmentation, word segmentation, segmentation, morphology, morphemes, morphological dictionary, unisegments, morph, and multilingual
Language:: Czech, Catalan, German, English, Persian, Finnish, French, Serbo-Croatian, Croatian, Hungarian, Italian, Komi-Zyrian, Latin, Moksha, Mari (Russia), Mongolian, Erzya, Polish, Portuguese, Russian, Spanish, Swedish, Tajik, Udmurt, Armenian, Bengali, Hindi, Malayalam, Marathi, and Kannada
Description:: Universal Segmentations (UniSegments) is a collection of lexical resources capturing morphological segmentations harmonised into a cross-linguistically consistent annotation scheme for many languages. The annotation scheme consists of simple tab-separated columns that stores a word and its morphological segmentations, including pieces of information about the word and the segmented units, e.g., part-of-speech categories, type of morphs/morphemes etc. The current public version of the collection contains 38 harmonised segmentation datasets covering 30 different languages.
Rights:: Universal Segmentations 1.0 License Terms, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-unisegs-1.0, and PUB

1. DeriNet 1.6 (2018-09-24)

2. DeriNet 2.0

3. DeriNet 2.1

4. Package of word embeddings of Czech from a large corpus

5. Semantic annotation of noun/verb conversion in Czech

6. Universal Derivations v0.5

7. Universal Derivations v1.0

8. Universal Derivations v1.1

9. Universal Segmentations 1.0 (UniSegments 1.0)

Limit your search

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Show values starting with

Search

Search Constraints

Search Results

Limit your search

Contributor

Show values starting with

Creator

Show values starting with

Language

Show values starting with

Publisher

Rights

Show values starting with

Subject

Show values starting with

Type

Original context has metadata only

Harvested from