Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Mareček, David , Yu, Zhiwei , Zeman, Daniel , and Žabokrtský, Zdeněk
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
part of speech , tagging , semi-supervised , and cross-language
Language:
Belarusian , Bosnian , Bulgarian , Czech , Serbo-Croatian , Croatian , Upper Sorbian , Macedonian , Polish , Russian , Slovak , Slovenian , Serbian , Ukrainian , Latvian , Lithuanian , Afrikaans , Danish , German , English , Faroese , Western Frisian , Swiss German , Icelandic , Limburgan , Luxembourgish , Low German , Dutch , Norwegian Nynorsk , Norwegian , Scots , Swedish , Yiddish , Aragonese , Asturian , Catalan , French , Galician , Haitian , Italian , Latin , Lombard , Neapolitan , Piemontese , Portuguese , Romanian , Spanish , Venetian , Walloon , Breton , Welsh , Scottish Gaelic , Irish , Modern Greek (1453-) , Armenian , Albanian , Dimli (individual language) , Persian , Gilaki , Kurdish , Tajik , Bengali , Bishnupriya , Gujarati , Fiji Hindi , Hindi , Marathi , Nepali (macrolanguage) , Urdu , Amharic , Arabic , Egyptian Arabic , Hebrew , Estonian , Finnish , Hungarian , Basque , Georgian , Chuvash , Azerbaijani , Turkish , Uzbek , Kazakh , Tatar , Yakut , Korean , Mongolian , Telugu , Kannada , Malayalam , Tamil , Newari , Vietnamese , Indonesian , Javanese , Malagasy , Maori , Malay (macrolanguage) , Pampanga , Sundanese , Tagalog , Waray (Philippines) , Swahili (macrolanguage) , Esperanto , Ido , Interlingua (International Auxiliary Language Association) , and Volapük
Description:
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Rights:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) , http://creativecommons.org/licenses/by-sa/4.0/ , and PUB
Creator:
Mareček, David , Yu, Zhiwei , Zeman, Daniel , and Žabokrtský, Zdeněk
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
part of speech , tagging , semi-supervised , and cross-language
Language:
Belarusian , Bosnian , Bulgarian , Czech , Serbo-Croatian , Croatian , Upper Sorbian , Macedonian , Polish , Russian , Slovak , Slovenian , Serbian , Ukrainian , Latvian , Lithuanian , Afrikaans , Danish , German , English , Faroese , Western Frisian , Swiss German , Icelandic , Limburgan , Luxembourgish , Low German , Dutch , Norwegian Nynorsk , Norwegian , Scots , Swedish , Yiddish , Aragonese , Asturian , Catalan , French , Galician , Haitian , Italian , Latin , Lombard , Neapolitan , Piemontese , Portuguese , Romanian , Spanish , Venetian , Walloon , Breton , Welsh , Scottish Gaelic , Irish , Modern Greek (1453-) , Armenian , Albanian , Dimli (individual language) , Persian , Gilaki , Kurdish , Tajik , Bengali , Bishnupriya , Gujarati , Fiji Hindi , Hindi , Marathi , Nepali (macrolanguage) , Urdu , Amharic , Arabic , Egyptian Arabic , Hebrew , Estonian , Finnish , Hungarian , Basque , Georgian , Chuvash , Azerbaijani , Turkish , Uzbek , Kazakh , Tatar , Yakut , Korean , Mongolian , Telugu , Kannada , Malayalam , Tamil , Newari , Vietnamese , Indonesian , Javanese , Malagasy , Maori , Malay (macrolanguage) , Pampanga , Sundanese , Tagalog , Waray (Philippines) , Swahili (macrolanguage) , Esperanto , Ido , Interlingua (International Auxiliary Language Association) , and Volapük
Description:
Texts in 107 languages from the W2C corpus (http://hdl.handle.net/11858/00-097C-0000-0022-6133-9), first 1,000,000 tokens per language, tagged by the delexicalized tagger described in Yu et al. (2016, LREC, Portorož, Slovenia).
Changes in version 1.1:
1. Universal Dependencies tagset instead of the older and smaller Google Universal POS tagset.
2. SVM classifier trained on Universal Dependencies 1.2 instead of HamleDT 2.0.
3. Balto-Slavic languages, Germanic languages and Romance languages were tagged by classifier trained only on the respective group of languages. Other languages were tagged by a classifier trained on all available languages. The "c7" combination from version 1.0 is no longer used.
Rights:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) , http://creativecommons.org/licenses/by-sa/4.0/ , and PUB
Creator:
Jan Patočka
Publisher:
Neue Zeitschrift für systematische Theologie und Religionsphilosophie 15 (Berlin 1973), seš. 3, str. 291–303. Stať. něm.
Type:
Text
Subject:
1973 , 1977/12 , 1978/7 , 1979/16 , 1979/31 , 1985/1 , 1986/2 , 1987/31 , 1990/4 , 1992/11 , 2004/1 , 2004/2 , 2004/7-8 , AS/UF-2 , AS/UF-4 , cs , de , fr , pl , SS-4/UČ-I , SS-5/UČ-II , and Stať. něm.
Language:
German , Czech , French , and Polish
Rights:
open access and Rights holder: Archiv Jana Patočky, z.s.
Creator:
Jan Patočka
Publisher:
Ed. I. Chvatík a J. Polívka. Str. 197–212. [Přepis mgf. záznamu soukromé přednášky z 11. 4. 1975.] — 2. otisk in: Souvislosti 1 (1990), č. 1, str. 9–17. — 3. otisk in: Péče o duši III (SS-3/PD-III), Praha 2002, str. 355–371 (v. 2002/1).
Type:
Text
Subject:
1977/5 , 1977/7 , 1988 , 1990/6 , 1998/3 , 1999/1 , 2002/1 , 2007/18 , 2007/7 , cs , de , en , es , fr , fulltext , it , pl , Přepis mgf. záznamu , and SS-3/PD-III
Language:
English , French , Italian , German , Polish , Spanish , and Czech
Rights:
open access and Rights holder: Archiv Jana Patočky, z.s.
Type:
text and sborníky
Subject:
Genealogie. Heraldika. Šlechta. Vlajky , dvory , rezidence , and české (československé) sborníky a kolektivní monografie
Language:
Czech , English , French , German , Latin , and Polish
Description:
Příspěvky z 2. kolokvia konaného 18.-19. října 2007, které uspořádal Historický ústav Akademie věd České republiky ve spolupráci s Archivem hlavního města Prahy a Ústavem českých dějin Filozofické fakulty Univerzity Karlovy
Rights:
unknown
Type:
text and sborníky
Subject:
Genealogie. Heraldika. Šlechta. Vlajky , dvory , rezidence , and české (československé) sborníky a kolektivní monografie
Language:
Czech , English , French , German , Latin , and Polish
Description:
Příspěvky z 2. kolokvia konaného 18.-19. října 2007, které uspořádal Historický ústav Akademie věd České republiky ve spolupráci s Archivem hlavního města Prahy a Ústavem českých dějin Filozofické fakulty Univerzity Karlovy
Rights:
unknown
Creator:
Jan Patočka
Publisher:
Str. 132–159. Stať. [Věnován o F. Fajfrovi k 80. narozeninám 1972 a B. Komárkové k 70. narozeninám 1973.]
Type:
Text
Subject:
1975 , 1979/25 , 1981/6 , 1981/7 , 1988/28 , 1988/31 , 1988/32 , 1988/33 , 1988/34 , 1994/7 , 1996/4 , 1996/7 , 1998/3 , 1999/8 , 2001/9 , 2002/21 , 2006/1 , 2007/1 , 2008/3 , be , bg , cs , de , en , es , fr , fulltext , hu , I/1979 , it , lt , no , pl , ru , SS-3/PD-III , sv , and uk
Language:
Czech , English , Bulgarian , French , Italian , Lithuanian , Hungarian , German , Norwegian , Polish , Russian , Belarusian , Spanish , Swedish , and Ukrainian
Rights:
open access and Rights holder: Archiv Jana Patočky, z.s.
Creator:
Pecina, Pavel and Saleh, Shadi
Publisher:
Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Type:
text and corpus
Subject:
cross-lingual information retrieval and machine translation
Language:
English , Czech , French , German , Hungarian , Polish , Spanish , and Swedish
Description:
This package contains an extended version of the test collection used in the CLEF eHealth Information Retrieval tasks in 2013--2015. Compared to the original version, it provides complete query translations into Czech, French, German, Hungarian, Polish, Spanish and Swedish and additional relevance assessment.
Rights:
Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) , http://creativecommons.org/licenses/by-nc/4.0/ , and PUB
Creator:
Zeman, Daniel , Mareček, David , Mašek, Jan , Popel, Martin , Ramasamy, Loganathan , Rosa, Rudolf , Štěpánek, Jan , and Žabokrtský, Zdeněk
Publisher:
Charles University
Type:
text and corpus
Subject:
annotated corpus , morphology , syntax , dependency , treebank , harmonized annotation , and common annotation style
Language:
Arabic , Basque , Bengali , Bulgarian , Catalan , Croatian , Czech , Danish , Dutch , English , Estonian , Finnish , French , German , Modern Greek (1453-) , Ancient Greek (to 1453) , Hebrew , Hindi , Hungarian , Indonesian , Irish , Italian , Japanese , Latin , Persian , Polish , Portuguese , Romanian , Russian , Slovak , Slovenian , Spanish , Swedish , Tamil , Telugu , and Turkish
Description:
HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. This version uses Universal Dependencies as the common annotation style.
Update (November 1017): for a current collection of harmonized dependency treebanks, we recommend using the Universal Dependencies (UD). All of the corpora that are distributed in HamleDT in full are also part of the UD project; only some corpora from the Patch group (where HamleDT provides only the harmonizing scripts but not the full corpus data) are available in HamleDT but not in UD.
Rights:
HamleDT 3.0 License Terms , https://lindat.mff.cuni.cz/repository/xmlui/page/licence-hamledt-3.0 , and PUB
Creator:
Jan Patočka
Publisher:
M. Heidegger, Rozhovory, Praha (samizdat) 1984, str. 3–9, Ed. Expedice, sv. 168. Stať. [Český orig. se nedochoval, v. 1974/2.]
Type:
Text
Subject:
1974/2 , 1984 , 1999/1 , cs , de , en , fr , pl , samizdat , and stať
Language:
English , French , German , Polish , and Czech
Rights:
open access and Rights holder: Archiv Jana Patočky, z.s.