Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Zeman, Daniel , Potthast, Martin , Duthoo, Elie , Mesnard, Olivier , Rybak, Piotr , Wróblewska, Alina , Che, Wanxiang , Liu, Yijia , Wang, Yuxuan , Zheng, Bo , Liu, Ting , Li, Zuchao , He, Shexia , Zhang, Zhuosheng , Zhao, Hai , Wu, Yingting , Tong, Jia-Jun , Nguyen, Dat Quoc , Verspoor, Karin , Wan, Hui , Naseem, Tahira , Lee, Young-Suk , Castelli, Vittorio , Ballesteros, Miguel , Hershcovich, Daniel , Abend, Omri , Rappoport, Ari , Smith, Aaron , Bohnet, Bernd , de Lhoneux, Miryam , Nivre, Joakim , Shao, Yan , Stymne, Sara , Kırnap, Ömer , Dayanık, Erenay , Yuret, Deniz , Kanerva, Jenna , Ginter, Filip , Miekka, Niko , Leino, Akseli , Salakoski, Tapio , Lim, KyungTae , Park, Cheoneum , Lee, Changki , Poibeau, Thierry , Bhat, Riyaz Ahmad , Bhat, Irshad , Bangalore, Srinivas , Qi, Peng , Dozat, Timothy , Zhang, Yuhao , Manning, Christopher , Boroș, Tiberiu , Dumitrescu, Stefan Daniel , Burtica, Ruxandra , Arakelyan, Gor , Hambardzumyan, Karen , Khachatrian, Hrant , Rosa, Rudolf , Mareček, David , Straka, Milan , Seker, Amit , More, Amir , Tsarfaty, Reut , Önder, Berkay Furkan , Gümeli, Can , Jawahar, Ganesh , Muller, Benjamin , Fethi, Amal , Martin, Louis , Villemonte de la Clergerie, Eric , Sagot, Benoît , Seddah, Djamé , Özateş, Şaziye Betül , Özgür, Arzucan , Gungor, Tunga , Öztürk, Balkız , Ji, Tao , Liu, Yufang , Wang, Yijun , Wu, Yuanbin , Lan, Man , Chen, Danlu , Lin, Mengxiao , Hu, Zhifeng , and Qiu, Xipeng
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
parsed data , conllu , and universal dependencies
Language:
Afrikaans , Arabic , Breton , Bulgarian , Russia Buriat , Catalan , Czech , Church Slavic , Danish , German , Modern Greek (1453-) , English , Estonian , Basque , Faroese , Persian , Finnish , French , Old French (842-ca. 1400) , Irish , Galician , Gothic , Ancient Greek (to 1453) , Hebrew , Hindi , Croatian , Upper Sorbian , Hungarian , Armenian , Indonesian , Italian , Japanese , Kazakh , Northern Kurdish , Korean , Latin , Latvian , Dutch , Norwegian , Nigerian Pidgin , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Northern Sami , Spanish , Serbian , Swedish , Thai , Turkish , Uighur , Ukrainian , Urdu , Vietnamese , and Chinese
Description:
Test data parsed by systems submitted to the CoNLL 2018 UD parsing shared task.
Rights:
Licence Universal Dependencies v2.2 , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.2 , and PUB
Creator:
Náplava, Jakub , Straka, Milan , Hajič, Jan , and Straňák, Pavel
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
diacritical marks generation and natural language correction
Language:
Czech , Vietnamese , Romanian , Polish , Slovak , Spanish , Croatian , Irish , Latvian , Hungarian , French , and Turkish
Description:
Corpus of texts in 12 languages. For each language, we provide one training, one development and one testing set acquired from Wikipedia articles. Moreover, each language dataset contains (substantially larger) training set collected from (general) Web texts. All sets, except for Wikipedia and Web training sets that can contain similar sentences, are disjoint. Data are segmented into sentences which are further word tokenized.
All data in the corpus contain diacritics. To strip diacritics from them, use Python script diacritization_stripping.py contained within attached stripping_diacritics.zip. This script has two modes. We generally recommend using method called uninames, which for some languages behaves better.
The code for training recurrent neural-network based model for diacritics restoration is located at https://github.com/arahusky/diacritics_restoration.
Rights:
Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) , http://creativecommons.org/licenses/by-nc-sa/4.0/ , and PUB
Creator:
Štedimlija, Savić Marković,
Type:
text and monografie
Subject:
Dějiny států a území na Balkánském poloostrově , dějiny států , Černá Hora , politické dějiny, politici , and přehledná zpracování světových dějin (chronologicky)
Language:
Croatian
Rights:
unknown
Publisher:
University of Zagreb, Faculty of Humanities and Social Sciences
Format:
application/octet-stream
Type:
corpus
Language:
Croatian
Description:
Manually tagged dependency treebank, analytical layer according to the PDT formalism adapted for Croatian
Rights:
Not specified
Publisher:
University of Zagreb, Faculty of Humanities and Social Sciences
Format:
application/octet-stream
Type:
lexicalConceptualResource
Language:
Croatian
Description:
38,573 lemmas, plain text; database file
Rights:
Not specified
Publisher:
University of Zagreb, Faculty of Humanities and Social Sciences
Type:
toolService
Language:
Croatian
Description:
On line service for lemmatization, full POS or MSD tagging of Croatian texts.
Rights:
Not specified
Publisher:
University of Zagreb, Faculty of Humanities and Social Sciences
Type:
lexicalConceptualResource
Language:
Croatian
Description:
110,000+ lemmas; 3,900,000+ word-forms, MulText East lexica format
Rights:
Not specified
Publisher:
University of Zagreb, Faculty of Humanities and Social Sciences
Type:
corpus
Language:
Croatian
Description:
This is the reference corpus of standard Croatian. In its 3.0 version, which is accessible via noSketch Engine, it has 216.8 million tokens. In terms of annotation, the corpus is tokenised, lemmatised and tagged for MSDs (morphosyntactic descriptions).
Rights:
Not specified
Publisher:
University of Zagreb, Faculty of Humanities and Social Sciences
Type:
corpus
Language:
Croatian and English
Description:
written; domain-specific (newspaper); synchronic; bilingual; parallel; unidirectional; XML; S-alignment
Rights:
Not specified
Creator:
Karpatský, Dušan,
Type:
text and bibliografie
Subject:
Česká literatura (o ní) , literatura chorvatská , překlady , vztahy chorvatsko-české , bibliografie oborové , and bibliografie oborové a tematické, rejstříky časopisů
Language:
Croatian
Rights:
unknown