Collection of orthographically transcribed audio recorded speech, mainly from East Anglia and the South-West, with a minor collection from Lancashire. The recordings were made in the 1970s and the 1980s by Finnish postgraduates.
The Helsinki Finite-State Transducer software is intended for the implementation of morphological analysers and other tools which are based on weighted and unweigted finite-state transducer technology. The feasibility of the HFST toolkit has been demonstrated by full-fledged open source implementations of Finnish, Swedish, English, French and Northern Sámi lexicons.
The Helsinki Finite-State Transducer software is intended for the implementation of morphological analysers and other tools which are based on weighted and unweigted finite-state transducer technology. The feasibility of the HFST toolkit has been demonstrated by full-fledged open source implementations of Finnish, Swedish, English, French and Northern Sámi lexicons.
Omorfi is free and open source project containing various tools and data for handling Finnish texts in a linguistically motivated manner. The main components of this repository are:
1) a lexical database containing hundreds of thousands of words (c.f. lexical statistics),
2) a collection of scripts to convert lexical database into formats used by upstream NLP tools (c.f. lexical processing),
3) an autotools setup to build and install (or package, or deploy): the scripts, the database, and simple APIs / convenience processing tools, and
4) a collection of relatively simple APIs for a selection of languages and scripts to apply the NLP tools and access the database