Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 45 interactive visualizations under a user-friendly interface. Routine tasks such as text acquisition, cleaning or tagging are completely automated. The simple interface supports the use in university teaching and leads users/students to fast and substantial results. The CorpusExplorer is open for many standards (XML, CSV, JSON, R, etc.) and also offers its own software development kit (SDK).
Source code available at https://github.com/notesjor/corpusexplorer2.0
MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing.
UDPipe 2 is a POS tagger, lemmatizer and dependency parser.
Compared to UDPipe 1:
- UDPipe 2 is Python-only and tested only in Linux,
- UDPipe 2 is meant as a research tool, not as a user-friendly UDPipe 1 replacement,
- UDPipe 2 achieves much better performance, but requires a GPU for reasonable performance,
- UDPipe 2 does not perform tokenization by itself – it uses UDPipe 1 for that.
UDPipe 2 is available in the udpipe-2 branch of the UDPipe repository at https://github.com/ufal/udpipe/tree/udpipe-2. It is a free software under Mozilla Public License 2.0 (http://www.mozilla.org/MPL/2.0/) and the models are free for non-commercial use and distributed under CC BY-NC-SA (http://creativecommons.org/licenses/by-nc-sa/4.0/) license, although for some models the original data used to create the model may impose additional licensing conditions.
UDPipe 2 is also available as a REST service running at https://lindat.mff.cuni.cz/services/udpipe. If you like, you can use the https://github.com/ufal/udpipe/blob/udpipe-2/udpipe2_client.py script to interact with it.