1. Indonesian web corpus (idWac)
- Creator:
- Medveď, Marek and Suchomel, Vít
- Publisher:
- Natural Language Processing Centre, Faculty of Informatics, Masaryk University
- Type:
- text and corpus
- Subject:
- corpus, lemmatization, and PoS tagging
- Language:
- Indonesian
- Description:
- Indonesian text corpus from web. Crawling done by SpiderLing in 2017. Filtering by JusText and Onion (see http://corpus.tools/ for details). Tagged and lemmatized by MorphInd (http://septinalarasati.com/morphind/).
- Rights:
- NLP Centre Web Corpus License, https://lindat.mff.cuni.cz/repository/xmlui/page/license-NLPC-WeC, and ACA