A dataset intended for fully trainable natural language generation (NLG) systems in task-oriented spoken dialogue systems (SDS), covering the English public transport information domain. It includes preceding context (user utterance) along with each data instance (pair of source meaning representation and target natural language paraphrase to be generated).
Taking the form of the previous user utterance into account for generating the system response allows NLG systems trained on this dataset to entrain (adapt) to the preceding utterance, i.e., reuse wording and syntactic structure. This should presumably improve the perceived naturalness of the output, and may even lead to a higher task success rate.
Crowdsourcing has been used to obtain natural context user utterances as well as natural system responses to be generated.
Painter Alfons Mucha at Zbiroh Chateau working on The Battle of Grünwald from his Slav Epic cycle. Mucha in his studio working on a design for the windows of St. Vitus Cathedral. Mucha in the garden of his villa in Prague-Bubeneč. Mucha with his wife Marie (née Chytilová), son Jiří, and daughter Jaroslava. Mucha with painters Max Švabinský and Alois Kalvoda.
Ornithologist Alfréd Hořice with his collection of stuffed birds in a fragmented segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1945, issue no. 19.
Cold water swimmer Alfréd Nikodém as the oldest participant in a swimming race in the Vltava River in a segment from Československý zvukový týdeník Aktualita (Czechoslovak Aktualita Sound Newsreel) 1942, issue no. 36. Nikodém with a bouquet of flowers by Svatopluk Čech Bridge.
The database will contain an etymological lexicon of Saami languages complete with detailed source citations. The database will be open to the public in November 2006 and will be updated regularly.