English–Czech parallel corpus of song lyrics, aligned section by section. The songs are sourced from musical films.
The dataset is provided in JSON format with the following structure:
{
"language": {
"song_id": {
"section_id": [list of lines in the section]
}
}
}