High-fidelity Speech Synthesis With WaveNet

During training, the student network starts off in a random state. It is fed random white noise as an input and is tasked with producing a continuous audio waveform as output. The generated waveform is then fed to the trained WaveNet model, which scores each sample, giving the student a signal to understand how far away it is from the teacher network’s desired output. Over time, the student network can be tuned – via backpropagation – to learn what sounds it should produce. Put another way, both the teacher and the student output a probability distribution for the value of each audio sample, and the goal of the training is to minimise the KL divergence between the teacher’s distribution and the student’s distribution.

The training method has parallels to the set-up for generative adversarial networks (GANs), with the student playing the role of generator and the teacher as the discriminator. However, unlike GANs, the student’s aim is not to “fool” the teacher but to cooperate and try to match the teacher’s performance.

Although the training technique works well, we also need to add a few extra loss functions to guide the student towards the desired behaviour. Specifically, we add a perceptual loss to avoid bad pronunciations, a contrastive loss to further reduce the noise, and a power loss to help match the energy of the human speech. Without the latter, for example, the trained model whispers rather than speaking out loud.

Adding all of these together allowed us to train the parallel WaveNet to achieve the same quality of speech as the original WaveNet, as shown by the mean opinion scores (MOS) – a scale of 1-5 that measures of how natural sounding the speech is according to tests with human listeners. Note that even human speech is rated at just 4.667 on the MOS scale.

Source: https://deepmind.com/blog/article/high-fidelity-speech-synthesis-wavenet

Generative Data Intelligence

High-fidelity speech synthesis with WaveNet

Tesla Board Pushes Backdoor Hype For Musk’s Exorbitant Pay Package – CleanTechnica

Bitcoin Value Skyrockets Beyond $70,000 Amidst Soaring Daily Trading Activity – CryptoInfoNet

Latest Intelligence

My Cupra Born Experience – CleanTechnica

US House Will Vote On FIT21 Bill Before The Holiday Weekend – CryptoInfoNet

Bitcoin Has Traded Mostly Sideways Since the Halving. What Could Boost Its Price? – Unchained

SolarX Raises $3M Capital—Prepares for CEX Listing

Dogecoin Quarterly Wallet Growth Outpaces Long-Standing Competitors Ripple’s XRP and Cardano (ADA)

Pyth and Avalanche To Unlock Over $1 Billion in Tokens This Week – Unchained