« Previous |
1 - 10 of 67
|
Next »
Number of results to display per page
Search Results
2. Artificial Treebank with Ellipsis
- Creator:
- Droganova, Kira, Zeman, Daniel, Kanerva, Jenna, and Ginter, Filip
- Publisher:
- Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
- Type:
- text and corpus
- Subject:
- universal dependencies, ellipsis, and gapping
- Language:
- English, Czech, Finnish, Russian, and Slovak
- Description:
- Artificially created treebank of elliptical constructions (gapping), in the annotation style of Universal Dependencies. Data taken from UD 2.1 release, and from large web corpora parsed by two parsers. Input data are filtered, sentences are identified where gapping could be applied, then those sentences are transformed, one or more words are omitted, resulting in a sentence with gapping. Details in Droganova et al.: Parse Me if You Can: Artificial Treebanks for Parsing Experiments on Elliptical Constructions, LREC 2018, Miyazaki, Japan.
- Rights:
- Licence Universal Dependencies v2.1, https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-2.1, and PUB
3. C4Corpus (CC BY-NC part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
4. C4Corpus (CC BY-NC-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0), http://creativecommons.org/licenses/by-nc-nd/4.0/, and PUB
5. C4Corpus (CC BY-NC-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0), http://creativecommons.org/licenses/by-nc-sa/4.0/, and PUB
6. C4Corpus (CC BY-ND part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malayalam, Macedonian, Dutch, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-NoDerivatives 4.0 International (CC BY-ND 4.0), http://creativecommons.org/licenses/by-nc/4.0/, and PUB
7. C4Corpus (CC BY-SA part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0), http://creativecommons.org/licenses/by-sa/4.0/, and PUB
8. C4Corpus (CC-BY part)
- Creator:
- Gurevych, Iryna, Habernal, Ivan, and Zayed, Omnia
- Publisher:
- Technische Universität Darmstadt
- Type:
- text and corpus
- Subject:
- CommonCrawl, Creative Commons, Web corpus, and Amazon Web Services
- Language:
- Afrikaans, Arabic, Bengali, Bulgarian, Czech, Danish, German, Modern Greek (1453-), English, Estonian, Persian, Finnish, French, Gujarati, Hebrew, Hindi, Croatian, Hungarian, Indonesian, Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malayalam, Marathi, Macedonian, Nepali (macrolanguage), Dutch, Norwegian, Panjabi, Polish, Portuguese, Romanian, Russian, Slovak, Slovenian, Somali, Spanish, Albanian, Swahili (macrolanguage), Swedish, Tamil, Telugu, Tagalog, Thai, Turkish, Ukrainian, Undetermined, Urdu, Vietnamese, and Chinese
- Description:
- A large web corpus (over 10 billion tokens) licensed under CreativeCommons license family in 50+ languages that has been extracted from CommonCrawl, the largest publicly available general Web crawl to date with about 2 billion crawled URLs.
- Rights:
- Creative Commons - Attribution 4.0 International (CC BY 4.0), http://creativecommons.org/licenses/by/4.0/, and PUB
9. Československá zahraniční politika v roce 1943.
- Type:
- text, dokumenty, and edice
- Subject:
- Mezinárodní vztahy, světová politika, politika zahraniční, vztahy mezinárodní, vláda exilová, válka druhá světová (1939-1945), odboj druhý (protifašistický), Československo 1938-1945, and zahraniční politika, mezinárodní vztahy
- Language:
- Czech, English, French, Polish, Russian, and Slovak
- Description:
- Autentické dokumenty odhalující politické a diplomatické vztahy československé politické reprezentace k velmocím i dalším státům od počátku srpna do konce prosince roku 1943.
- Rights:
- unknown
10. Christianity and the development of culture :
- Publisher:
- Vydavateľstvo Prešovskej univerzity,
- Type:
- sborníky konferenční
- Subject:
- Křesťanství. Křesťanská církev všeobecně. Eklesiologie, křesťanství, kultura, tradice cyrilometodějská, tradice kulturní, zahraniční periodika a sborníky, přehledná zpracování dějin českých zemí (chronologicky), and církevní a náboženské dějiny
- Language:
- English, Czech, Russian, Slovak, and Ukrainian
- Rights:
- unknown