1 hour subsets and synthetic samples

The folowing subsets of the Tundra corpus were used for the development of the synthetic voices presented in the paper:

O. Watts, A. Stan, R. Clark, Y. Mamiya, M. Giurgiu, J. Yamagishi, S. King, Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis, In Proc. SSW8, Barcelona, Spain, August 2013

You can download the 1 hour data sets and corresponding synthetic sample for each language by clicking on the links from the following table.

As described in the paper, these sets consist of approximately 1 hour of neutrally-spoken data per language, selected in a semi-automatic manner from the larger corpora roughly corresponding to those available HERE.

Note that the utterances in these training subsets do not match the names of utterances in the full corpora as distributed, because they result from an application of a slightly earlier version of the same alignment tools. The synthetic samples, however, do exactly match the texts and natural speech samples in the handmadeTest/ subdirectories of the main corpus distribution.

Please refer to the README file prior to downloading the corpus. You will find a detailed description of the archives, as well as the license info.

Language Code Title Author Segmented audio and text Synthetic samples
BulgarianBGZhetvariatYordan YovkovDOWNLOAD [238MB]DOWNLOAD
DanishDAGrimms eventyr I udvalgGrimm BrothersDOWNLOAD [165MB]DOWNLOAD
DutchNLAnna KareninaLeo TolstoyDOWNLOAD [229MB]DOWNLOAD
EnglishENLiving AloneStella BensonDOWNLOAD [241MB]DOWNLOAD
FinnishFIRautatieJuhani AhoDOWNLOAD [234MB]DOWNLOAD
FrenchFRCandideVoltaireDOWNLOAD [237MB]DOWNLOAD
GermanDEDas Bildnis des Dorian GrayOscar WildeDOWNLOAD [222MB]DOWNLOAD
HungarianHUEgri csillagokGeza GardonyiDOWNLOAD [232MB]DOWNLOAD
ItalianITGalateaAnton Giulio BarriliDOWNLOAD [241MB]DOWNLOAD
PolishPLSiedem wybranyc opowiadanWladyslaw OrkanDOWNLOAD [229MB]DOWNLOAD
PortuguesePTSenhoraJose de AlencarDOWNLOAD [217MB]DOWNLOAD
Romanian*RMMaraIoan SlaviciDOWNLOAD [233MB]DOWNLOAD
RussianRUUcheniye KhristaLeo TolstoyDOWNLOAD [244MB]DOWNLOAD
Spanish**ESDon Quijote de la ManchaMiguel de Cervantes-DOWNLOAD
*The Romanian data can only be used for non-commercial purposes.
**Only the first 35 chapters from the first part were used for alignment. The data can not be redistributed, so please download the files from the original source found on the About page.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License. The underlying audio and text are subject to their source licenses, so please check the links before using the data.