Welcome to the Simple4All Tundra Corpus
This is an ongoing project which aims at collecting an extended number of speech resources in multiple languages and to make them freely available for the speech processing community. The first version of the Tundra corpus is a collection of 14 audiobooks in 14 languages: Bulgarian, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Polish, Portuguese, Romanian, Russian and Spanish. The sources for the speech and text data of each audiobook are listed below.
To download the segmented and aligned data please go to the Download section. You can also download only 1 hour subsets from each language by following this LINK, or listen to synthetic speech samples built from the corpus in the Demo section.

Language | Code | Title | Author | Speech URL | Text URL | Speaker Gender | Total duration [hours] | Aligned duration [hours] |
---|---|---|---|---|---|---|---|---|
Bulgarian | BG | Zhetvariat | Yordan Yovkov | LINK | LINK | M | 6.1 | 4.1 |
Danish | DA | Grimms eventyr I udvalg | Grimm Brothers | LINK | LINK | M | 2.1 | 0.7 |
Dutch | NL | Anna Karenina | Leo Tolstoy | LINK | LINK | M | 6.5 | 4.5 |
English | EN | Living Alone | Stella Benson | LINK | LINK | F | 4.5 | 2.3 |
Finnish | FN | Rautatie | Juhani Aho | LINK | LINK | F | 3.1 | 2.5 |
French | FR | Candide | Voltaire | LINK | LINK | M | 4 | 2.1 |
German | DE | Das Bildnis des Dorian Gray | Oscar Wilde | LINK | LINK | M | 9.5 | 7.9 |
Hungarian | HU | Egri csillagok | Geza Gardonyi | LINK | LINK | F | 8.5 | 5 |
Italian | IT | Galatea | Anton Giulio Barrili | LINK | LINK | M | 6.5 | 5.3 |
Polish | PL | Siedem wybranyc opowiadan | Wladyslaw Orkan | LINK | LINK | F | 3.1 | 2.6 |
Portuguese | PR | Senhora | Jose de Alencar | LINK | LINK | F | 9.2 | 5.2 |
Romanian* | RO | Mara | Ioan Slavici | LINK | LINK | F | 11.1 | 6.5 |
Russian | RU | Ucheniye Khrista | Leo Tolstoy | LINK | LINK | M | 2.1 | 1.3 |
Spanish** | ES | Don Quijote de la Mancha | Miguel de Cervantes | LINK | LINK | M | 12.1 | 8.0 |
**Only the first 35 chapters from the first part were used for alignment

This work is licensed under a Creative Commons Attribution 3.0 Unported License. The underlying audio and text are subject to their source licenses, so please check the links before using the data.