One million words of written and spoken English from Great Britain. Transcriptions aligned with digitised speech recordings. POS-tagged and parsed. Part of the International Corpus of English project. Custom-made search software: ICE-CUP
1 million words spoken and written English from UK. POS-tagged and parsed. Digitised speech recordings aligned w text. Part of the International Corpus of English (ICE).
Text preprocess (this preprocess service requires that the input text be in plain text format (file .txt) and UTF-8).
Basically, it carries out: (i) text segmentation into minor structural units (titles, paragraphs, sentences, etc.); (ii) detection of entities not found in dictionaries (numbers, abbreviations, URLs, emails, proper nouns, etc.); and (iii) the keeping of sequences of two or more words in a single block (dates, phrases, proper nouns, etc.).