As the words are taken from a database of speech fragments, it’s very
difficult to modify the voice, so adding things like intonation and
emphasis is almost impossible. This is why robotic voices often sound
monotonous and decidedly different from humans.
WaveNet however overcomes this problem, by using its neural network
models to build an audio signal from the ground up, one sample at a
During training the DeepMind team gave WaveNet real waveforms recorded
from human speakers to learn from. Using a type of AI called a neural
network, the program then learns from these, much in the same way a
human brain does.