BulTreeBank Tokenizer

Title:: BulTreeBank Tokenizer
Creator:: Simov, Kiril
Contributor:: Simov, Kiril
Publisher:: Linguistic Modeling Department, IPP, Bulgarian Academy of Sciences
Identifier:: http://hdl.handle.net/11372/LRT-1240
Type:: toolService
Description:: The tokenizer is covering all languages that use Latin1, Laitn2, Latin3 and Cyrillic tables of Unicode. Can be extended to cover other tables in Unicode if necessary. The implementation is as a cascaded regular grammar in CLaRK. It recognizes over 60 token categories. It is easy to be adapted to new token categories.
Rights:: Not specified
Coverage:: Bulgaria
Source:: http://www.bultreebank.org/clark/index.html
Harvested from:: LINDAT/CLARIAH-CZ repository
Metadata only:: true
Date:: 2014-07-30

The item or associated files might be "in copyright"; review the provided rights metadata:

and the original context.

Original context