Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Mareček, David , Yu, Zhiwei , Zeman, Daniel , and Žabokrtský, Zdeněk
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
part of speech , tagging , semi-supervised , and cross-language
Language:
Belarusian , Bosnian , Bulgarian , Czech , Serbo-Croatian , Croatian , Upper Sorbian , Macedonian , Polish , Russian , Slovak , Slovenian , Serbian , Ukrainian , Latvian , Lithuanian , Afrikaans , Danish , German , English , Faroese , Western Frisian , Swiss German , Icelandic , Limburgan , Luxembourgish , Low German , Dutch , Norwegian Nynorsk , Norwegian , Scots , Swedish , Yiddish , Aragonese , Asturian , Catalan , French , Galician , Haitian , Italian , Latin , Lombard , Neapolitan , Piemontese , Portuguese , Romanian , Spanish , Venetian , Walloon , Breton , Welsh , Scottish Gaelic , Irish , Modern Greek (1453-) , Armenian , Albanian , Dimli (individual language) , Persian , Gilaki , Kurdish , Tajik , Bengali , Bishnupriya , Gujarati , Fiji Hindi , Hindi , Marathi , Nepali (macrolanguage) , Urdu , Amharic , Arabic , Egyptian Arabic , Hebrew , Estonian , Finnish , Hungarian , Basque , Georgian , Chuvash , Azerbaijani , Turkish , Uzbek , Kazakh , Tatar , Yakut , Korean , Mongolian , Telugu , Kannada , Malayalam , Tamil , Newari , Vietnamese , Indonesian , Javanese , Malagasy , Maori , Malay (macrolanguage) , Pampanga , Sundanese , Tagalog , Waray (Philippines) , Swahili (macrolanguage) , Esperanto , Ido , Interlingua (International Auxiliary Language Association) , and Volapük
Description:
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Changes in version 1.1:
1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset.
2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0.
3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
Rights:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) , http://creativecommons.org/licenses/by-sa/4.0/ , and PUB
Creator:
Johann Christoph Schambogen , Sauern, Antonius Franciscus a , Oldřich Adolf Vratislav <ze >Šternberka , Leopold , and Jezuitská tiskárna (Praha, Česko)
Publisher:
Tiskárna jezuitská
Format:
print and [16] ff ; 4°
Type:
model:monograph and TEXT
Subject:
století 17. , právo , and 094
Language:
Latin , Modern Greek (1453-) , and German
Description:
BCBT41687
Rights:
http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
Creator:
Carolides z Karlsperka, Daniel
Publisher:
Carolides z Karlsperka, Daniel
Format:
print and [12] ff ; 4°
Type:
model:monograph and TEXT
Subject:
Dvorský, Tobiáš , století 17 , poezie , epithalamia , sňatky , and 094
Language:
Latin and Modern Greek (1453-)
Description:
BCBT39027
Rights:
http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public
Creator:
Matěj Pardubský , Mikuláš Albert z Kaménka , Daniel Basilius , Jan Campanus Vodňanský , Prosdokonymus, Jan , Mikuláš Troilus , Žabonius z Vyšetína, Jakub , and Schultis de Felsdorf, Georgius
Publisher:
Pardubský, Matěj
Format:
print and [8] ff ; 4°
Type:
model:monograph and TEXT
Subject:
století 17. , poezie , gratulace , and 094
Language:
Latin and Modern Greek (1453-)
Description:
BCBT39702
Rights:
http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
Type:
model:monograph and TEXT
Language:
Latin and Modern Greek (1453-)
Rights:
http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
Creator:
Zeman, Daniel , Mareček, David , Mašek, Jan , Popel, Martin , Ramasamy, Loganathan , Rosa, Rudolf , Štěpánek, Jan , and Žabokrtský, Zdeněk
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
treebank , Stanford dependencies , Prague dependencies , harmonization , common annotation style , and Interset
Language:
Arabic , Bulgarian , Bengali , Catalan , Czech , Danish , German , Modern Greek (1453-) , English , Spanish , Estonian , Basque , Persian , Finnish , Ancient Greek (to 1453) , Hindi , Hungarian , Italian , Japanese , Latin , Dutch , Portuguese , Romanian , Russian , Slovak , Slovenian , Swedish , Tamil , Telugu , and Turkish
Description:
HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular recently. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes.
Rights:
HamleDT 2.0 Licence Agreement , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-2.0 , and ACA
Creator:
Zeman, Daniel , Mareček, David , Mašek, Jan , Popel, Martin , Ramasamy, Loganathan , Rosa, Rudolf , Štěpánek, Jan , and Žabokrtský, Zdeněk
Publisher:
Charles University
Type:
text and corpus
Subject:
annotated corpus , morphology , syntax , dependency , treebank , harmonized annotation , and common annotation style
Language:
Arabic , Basque , Bengali , Bulgarian , Catalan , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Modern Greek (1453-) , Ancient Greek (to 1453) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Persian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Spanish , Swedish , Tamil , Telugu , and Turkish
Description:
HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style.
Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
Rights:
HamleDT 3.0 License Terms , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-3.0 , and PUB
Creator:
Calligaris, Luigi
Type:
model:monograph and TEXT
Language:
French , Arabic , Latin , Italian , Spanish , Portuguese , German , English , and Modern Greek (1453-)
Rights:
http://creativecommons.org/publicdomain/mark/1.0/ and policy:public
Creator:
Sessius, Pavel
Publisher:
Sessius, Pavel
Format:
print and [4] ff ; 4°
Type:
model:monograph and TEXT
Subject:
století 17. , poezie , nuptialia , gratulace , humanismus , Zárybnický ze Zárybenska, Samuel , and 094
Language:
Latin and Modern Greek (1453-)
Description:
PRAGÆ, Imprimebat PAULUS SESSIUS. and BCBT38843
Rights:
http://creativecommons.org/licenses/by-nc-sa/4.0/ and policy:public
Creator:
Schumannová, Anna
Publisher:
Schumannová, Anna
Format:
print and [4] ff ; 4°
Type:
model:monograph and TEXT
Subject:
Benedictus, Nikodém , -1630 , století 16. , poezie , nuptialia , epithalamia , and 094
Language:
Latin and Modern Greek (1453-)
Description:
Národní knihovna ČR Praha CZ 10 J 93 adl. 36 and BCBT36484
Rights:
http://creativecommons.org/publicdomain/mark/1.0/ and policy:public