This package contains data used in the IWPT 2020 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal Dependencies release 2.5 (http://hdl.handle.net/11234/1-3105) but some treebanks contain additional enhanced annotations. Moreover, not all of these additions became part of Universal Dependencies release 2.6 (http://hdl.handle.net/11234/1-3226), which makes the shared task data unique and worth a separate release to enable later comparison with new parsing algorithms. The package also contains a number of Perl and Python scripts that have been used to process the data during preparation and during the shared task. Finally, the package includes the official primary submission of each team participating in the shared task.
This package contains data used in the IWPT 2021 shared task. It contains training, development and test (evaluation) datasets. The data is based on a subset of Universal Dependencies release 2.7 (http://hdl.handle.net/11234/1-3424) but some treebanks contain additional enhanced annotations. Moreover, not all of these additions became part of Universal Dependencies release 2.8 (http://hdl.handle.net/11234/1-3687), which makes the shared task data unique and worth a separate release to enable later comparison with new parsing algorithms. The package also contains a number of Perl and Python scripts that have been used to process the data during preparation and during the shared task. Finally, the package includes the official primary submission of each team participating in the shared task.
Latvian fairytales and legends collected by Latvian folklorist Pēteris Šmits, published 1927-1938 (15 volumes). It is the largest published collection of Latvian folktales and legends.
Its aim is to ensure digitising the collections of the National Library of Latvia and other similar organisations, by making them accessible on the Internet. The creation of the digital library lays the foundation for uniform principles of processing, storing the digitised materials and ensuring access to them.
HMM-based tagger of Latvian texts. The tagger uses information from SemTi-Kamols morphological analyser, the tagset is derived from MULTEXT-East project.