"Human perception is also highly sensitive to discontinuities and irregularities in periodic waveforms. Figure 1 shows that when the stride of the frames does not exactly equal a waveform’s periodicity, the alignment (phase) of the two precesses over time. This condition is assured as at any time there are typically many different frequencies in a given signal. This is a challenge for a synthesis network, as it must learn all the appropriate frequency and phase combinations and activate them in just the right combination to produce a coherent waveform. This phase precession is exactly the same phenomena observed with a short-time Fourier transform (STFT), which is composed of strided filterbanks just like convolutional networks. Phase precession also occurs in situations where filterbanks overlap (window or kernel size < stride)."
69.2 KB. 意见：11