Skip to search
Skip to main content
Skip to first result
Search
Search Results
Creator:
MEDVEĎ, MAREK and Suchomel, Vít
Publisher:
Masaryk University, NLP Centre
Type:
text and corpus
Subject:
Web corpus
Language:
Indonesian
Description:
Indonesian web corpus crawled in 2010. Encoded in UTF-8, cleaned, deduplicated, tagged by Morphind.
Rights:
NLP Centre Web Corpus License , https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC , and ACA
Creator:
Medveď, Marek and Suchomel, Vít
Publisher:
Natural Language Processing Centre, Faculty of Informatics, Masaryk University
Type:
text and corpus
Subject:
corpus , lemmatization , and PoS tagging
Language:
Indonesian
Description:
Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd (http://septinalarasati.com/morphind/).
Rights:
NLP Centre Web Corpus License , https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC , and ACA