Number of results to display per page
Search Results
42. Lingua::Interset 2.026
- Creator:
- Zeman, Daniel
- Publisher:
- Charles University, Faculty of Mathematics and Physics
- Type:
- tool and toolService
- Subject:
- morphology, part of speech, conversion, and tagset
- Language:
- Arabic, Bulgarian, Bengali, Catalan, Czech, Danish, German, Modern Greek (1453-), English, Spanish, Estonian, Basque, Persian, Finnish, Ancient Greek (to 1453), Hebrew, Hindi, Croatian, Japanese, Multiple languages, and Portuguese
- Description:
- Lingua::Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped. Version 2.026 covers 37 different tagsets of 21 languages. Limited support of the older drivers for other languages (which are not included in this package but are available for download elsewhere) is also available; these will be fully ported to Interset 2 in future. Interset is implemented as Perl libraries. It is also available via CPAN.
- Rights:
- Artistic License (Perl) 1.0, http://opensource.org/licenses/Artistic-Perl-1.0, and PUB
43. Mercedes
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan
- Description:
- A tool for contrasting terminological vocabularies and textual corpora. It allows controlling the presence and location of reference vocabularies in textual corpora.
- Rights:
- Not specified
44. Multilingual Central Repository
- Publisher:
- Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
- Type:
- lexicalConceptualResource
- Subject:
- lexical database
- Language:
- Basque, Catalan, English, Galician, and Spanish
- Description:
- Multilingual lexical database that follows the model proposed by the EuroWordNet project. The MCR integrates into the same EuroWordNet framework wordnets from five different languages (together with four English WordNet versions). It also integrates WordNet Domains and new versions of the Base Concepts and Top Concept Ontology. Overall, it contains 1,642,389 semantic relations between synsets, most of them acquired by automatic means. Information contained: semantics, synonyms, antonyms, definition, equivalents, example of use, morphology.
- Rights:
- Not specified
45. Neologismos económicos en las lenguas románicas a través de la prensa
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- lexicalConceptualResource
- Subject:
- terminology database
- Language:
- Catalan, French, Galician, Italian, Portuguese, Romanian, and Spanish
- Description:
- Multilingual terminological resource containing 3.875 entries from the Economics, Finance and Banking domains.
- Rights:
- Not specified
46. OmegaWiki
- Publisher:
- Universität Bamberg, World Language Documentation Centre
- Format:
- application/octet-stream
- Type:
- lexicalConceptualResource
- Language:
- Afrikaans, Arabic, Basque, Bulgarian, Catalan, Chinese, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, Modern Greek (1453-), Hebrew, Hungarian, Icelandic, Indonesian, Interlingua (International Auxiliary Language Association), Irish, Italian, Japanese, Khmer, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, Turkish, Ukrainian, and Welsh
- Rights:
- GFDL or CC and http://www.omegawiki.org/Licensing
47. PALIC
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan, French, Portuguese, and Spanish
- Description:
- A package of tools for the processing of the Corpus Tècnic in Catalan and Spanish. It includes a preprocessor, a PoSTagger and a linguistic disambiguator.
- Rights:
- Not specified
48. Plaintext Wikipedia dump 2018
- Creator:
- Rosa, Rudolf
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- Wikipedia, text corpora, and monolingual corpus
- Language:
- Abkhazian, Achinese, Adyghe, Afrikaans, Akan, Tosk Albanian, Amharic, Old English (ca. 450-1100), Arabic, Official Aramaic (700-300 BCE), Aragonese, Egyptian Arabic, Assamese, Asturian, Atikamekw, Avaric, Aymara, South Azerbaijani, Azerbaijani, Bashkir, Bambara, Bavarian, Central Bikol, Belarusian, Bengali, Bislama, Banjar, Tibetan, Bosnian, Bishnupriya, Breton, Buginese, Bulgarian, Russia Buriat, Catalan, Min Dong Chinese, Cebuano, Czech, Chamorro, Chechen, Cherokee, Church Slavic, Chuvash, Cheyenne, Central Kurdish, Cornish, Corsican, Cree, Crimean Tatar, Kashubian, Welsh, Danish, German, Dinka, Dimli (individual language), Dhivehi, Lower Sorbian, Dzongkha, Modern Greek (1453-), English, Esperanto, Estonian, Basque, Ewe, Extremaduran, Faroese, Persian, Fijian, Finnish, French, Arpitan, Northern Frisian, Western Frisian, Fulah, Friulian, Gagauz, Gan Chinese, Scottish Gaelic, Irish, Galician, Gilaki, Manx, Goan Konkani, Gothic, Guarani, Gujarati, Hakka Chinese, Haitian, Hausa, Hawaiian, Serbo-Croatian, Hebrew, Herero, Fiji Hindi, Hindi, Hiri Motu, Croatian, Upper Sorbian, Hungarian, Armenian, Igbo, Ido, Inuktitut, Interlingue, Iloko, Interlingua (International Auxiliary Language Association), Indonesian, Inupiaq, Icelandic, Italian, Jamaican Creole English, Javanese, Lojban, Japanese, Kara-Kalpak, Kabyle, Kalaallisut, Kannada, Kashmiri, Georgian, Kanuri, Kazakh, Kabardian, Kabiyè, Khmer, Kikuyu, Kinyarwanda, Kirghiz, Komi-Permyak, Komi, Kongo, Korean, Karachay-Balkar, Kölsch, Kurdish, Ladino, Lao, Latin, Latvian, Lak, Lezghian, Ligurian, Limburgan, Lingala, Lithuanian, Lombard, Northern Luri, Latgalian, Luxembourgish, Ganda, Literary Chinese, Marshallese, Maithili, Malayalam, Marathi, Moksha, Eastern Mari, Minangkabau, Macedonian, Malagasy, Maltese, Mongolian, Maori, Western Mari, Malay (macrolanguage), Creek, Mirandese, Burmese, Erzya, Mazanderani, Min Nan Chinese, Neapolitan, Nauru, Navajo, Ndonga, Low German, Nepali (macrolanguage), Newari, Dutch, Norwegian Nynorsk, Norwegian, Novial, Pedi, Nyanja, Occitan (post 1500), Livvi, Oriya (macrolanguage), Oromo, Ossetian, Pangasinan, Pampanga, Panjabi, Papiamento, Picard, Pennsylvania German, Pfaelzisch, Pitcairn-Norfolk, Pali, Piemontese, Western Panjabi, Pontic, Polish, Portuguese, Pushto, Quechua, Vlax Romani, Romansh, Romanian, Rusyn, Rundi, Macedo-Romanian, Russian, Sango, Yakut, Sanskrit, Sicilian, Scots, Samogitian, Sinhala, Slovak, Slovenian, Northern Sami, Samoan, Shona, Sindhi, Somali, Southern Sotho, Spanish, Albanian, Sardinian, Sranan Tongo, Serbian, Swati, Saterfriesisch, Sundanese, Swahili (macrolanguage), Swedish, Silesian, Tahitian, Tamil, Tatar, Tulu, Telugu, Tama (Colombia), Tetum, Tajik, Tagalog, Thai, Tigrinya, Tonga (Tonga Islands), Tok Pisin, Tswana, Tsonga, Turkmen, Tumbuka, Turkish, Twi, Tuvinian, Udmurt, Uighur, Ukrainian, Urdu, Uzbek, Venetian, Venda, Veps, Vietnamese, Vlaams, Volapük, Võro, Waray (Philippines), Walloon, Wolof, Wu Chinese, Kalmyk, Xhosa, Mingrelian, Yiddish, Yoruba, Yue Chinese, Zeeuws, Zhuang, Chinese, Zulu, and Dotyali
- Description:
- Wikipedia plain text data obtained from Wikipedia dumps with WikiExtractor in February 2018. The data come from all Wikipedias for which dumps could be downloaded at [https://dumps.wikimedia.org/]. This amounts to 297 Wikipedias, usually corresponding to individual languages and identified by their ISO codes. Several special Wikipedias are included, most notably "simple" (Simple English Wikipedia) and "incubator" (tiny hatching Wikipedias in various languages). For a list of all the Wikipedias, see [https://meta.wikimedia.org/wiki/List_of_Wikipedias]. The script which can be used to get new version of the data is included, but note that Wikipedia limits the download speed for downloading a lot of the dumps, so it takes a few days to download all of them (but one or a few can be downloaded fast). Also, the format of the dumps changes time to time, so the script will probably eventually stop working one day. The WikiExtractor tool [http://medialab.di.unipi.it/wiki/Wikipedia_Extractor] used to extract text from the Wikipedia dumps is not mine, I only modified it slightly to produce plaintext outputs [https://github.com/ptakopysk/wikiextractor].
- Rights:
- Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), http://creativecommons.org/licenses/by-sa/3.0/, and PUB
49. SOLC
- Publisher:
- Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra
- Type:
- toolService
- Language:
- Catalan
- Description:
- An orthologic server for Catalan. A query system for the orthologic dictionary which allows making searches using dialectal and pragmatic variables.
- Rights:
- Not specified
50. SVMTool
- Publisher:
- Centro de Tecnologías y Aplicaciones del Lenguaje y del Habla (TALP)
- Type:
- toolService
- Language:
- Catalan, English, and Spanish
- Description:
- Generator of sequential taggers based on Support Vector Machines.
- Rights:
- Not specified