1 hour subsets and synthetic samples
The folowing subsets of the Tundra corpus were used for the development of the synthetic voices presented in the paper:
O. Watts, A. Stan, R. Clark, Y. Mamiya, M. Giurgiu, J. Yamagishi, S. King, Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis, In Proc. SSW8, Barcelona, Spain, August 2013
You can download the 1 hour data sets and corresponding synthetic sample for each language by clicking on the links from the following table.
As described in the paper, these sets consist of approximately 1 hour of neutrally-spoken data per language, selected in a semi-automatic manner from the larger corpora roughly corresponding to those available HERE.
Note that the utterances in these training subsets do not match the names of utterances in the full corpora as distributed, because they result from an application of a slightly earlier version of the same alignment tools. The synthetic samples, however, do exactly match the texts and natural speech samples in the handmadeTest/ subdirectories of the main corpus distribution.
Please refer to the README file prior to downloading the corpus. You will find a detailed description of the archives, as well as the license info.
|Language||Code||Title||Author||Segmented audio and text||Synthetic samples|
|Bulgarian||BG||Zhetvariat||Yordan Yovkov||DOWNLOAD [238MB]||DOWNLOAD|
|Danish||DA||Grimms eventyr I udvalg||Grimm Brothers||DOWNLOAD [165MB]||DOWNLOAD|
|Dutch||NL||Anna Karenina||Leo Tolstoy||DOWNLOAD [229MB]||DOWNLOAD|
|English||EN||Living Alone||Stella Benson||DOWNLOAD [241MB]||DOWNLOAD|
|Finnish||FI||Rautatie||Juhani Aho||DOWNLOAD [234MB]||DOWNLOAD|
|German||DE||Das Bildnis des Dorian Gray||Oscar Wilde||DOWNLOAD [222MB]||DOWNLOAD|
|Hungarian||HU||Egri csillagok||Geza Gardonyi||DOWNLOAD [232MB]||DOWNLOAD|
|Italian||IT||Galatea||Anton Giulio Barrili||DOWNLOAD [241MB]||DOWNLOAD|
|Polish||PL||Siedem wybranyc opowiadan||Wladyslaw Orkan||DOWNLOAD [229MB]||DOWNLOAD|
|Portuguese||PT||Senhora||Jose de Alencar||DOWNLOAD [217MB]||DOWNLOAD|
|Romanian*||RM||Mara||Ioan Slavici||DOWNLOAD [233MB]||DOWNLOAD|
|Russian||RU||Ucheniye Khrista||Leo Tolstoy||DOWNLOAD [244MB]||DOWNLOAD|
|Spanish**||ES||Don Quijote de la Mancha||Miguel de Cervantes||-||DOWNLOAD|
**Only the first 35 chapters from the first part were used for alignment. The data can not be redistributed, so please download the files from the original source found on the About page.
This work is licensed under a Creative Commons Attribution 3.0 Unported License. The underlying audio and text are subject to their source licenses, so please check the links before using the data.