Seven French L2 corpora. Digital sound files and related transcripts formatted using CHILDES software. The database currently contains over 4000 files (sound files, transcripts and morphosyntactically tagged transcripts). .
The ultimate aim of the project is to compile a representative historical corpus of written German for the years 1650-1800. The complete GerManC corpus will contain 2000 word samples from nine genres
70K words, Non-validated sentence segmentation. Non-validated POS tagging, Manual annotation of syntactic dependencies and dependency labels, Manual annotation of semantic roles, Manual annotation of events based on a shallow domain specific ontology (only for a 31K words subset of GDT)
Collection of orthographically transcribed audio recorded speech, mainly from East Anglia and the South-West, with a minor collection from Lancashire. The recordings were made in the 1970s and the 1980s by Finnish postgraduates.