Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
Suchomel, Vít and Rychlý, Pavel
Publisher:
Masaryk University, NLP Centre
Type:
text and corpus
Subject:
Amharic , text corpus , Web corpus , under-resourced language , corpus annotation , and morphological tagger
Language:
Amharic
Description:
Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC corpus.
Rights:
NLP Centre Web Corpus License , https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC , and ACA
Creator:
Rychlý, Pavel
Publisher:
Masaryk University, NLP Centre
Type:
text and corpus
Subject:
text corpora , Ethiopian languages , web corpora , under-resourced languages , and Amharic
Language:
Amharic
Description:
Substantially cleaned version of existing morphologically annotated WIC Corpus.
Rights:
Creative Commons - Attribution 4.0 International (CC BY 4.0) , http://creativecommons.org/licenses/by/4.0/ , and PUB
Creator:
Suchomel, Vít and Rychlý, Pavel
Publisher:
Masaryk University, NLP Centre
Type:
text and corpus
Subject:
text corpora , Ethiopian languages , Oromo , Web corpus , and under-resourced language
Language:
Oromo
Description:
Oromo web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
Rights:
NLP Centre Web Corpus License , https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC , and ACA
Creator:
Suchomel, Vít and Rychlý, Pavel
Publisher:
Masaryk University, NLP Centre
Type:
text and corpus
Subject:
text corpora , Ethiopian languages , web corpora , under-resourced languages , and Somali
Language:
Somali
Description:
Somali web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
Rights:
NLP Centre Web Corpus License , https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC , and ACA
Creator:
Suchomel, Vít and Rychlý, Pavel
Publisher:
Masaryk University, NLP Centre
Type:
text and corpus
Subject:
text corpora , Ethiopian languages , web corpora , under-resourced languages , Tigrinya , and Tigrigna
Language:
Tigrinya
Description:
Tigrinya web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
Rights:
NLP Centre Web Corpus License , https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC , and ACA