Language identification based on joint decision of nonlinear spectrograms
-
Abstract
To address the problem that the gray-scale logarithmic speech spectrogram is too stretched to the fundamental frequency, which limits the improvement of short-length speech identification rate, a language identification method with joint judgment of nonlinear speech spectrogram is proposed. Firstly, the logarithmic power spectrum is extracted by energy normalization, and the nonlinear speech spectrogram is obtained by nonlinear mapping of frequency scales according to human ear perception. Then, the nonlinear speech spectrogram is split into equal intervals according to word association characteristics, and the joint judgment layer is added at the back end of the ResNet network. Finally, the language type of the speech is output. The experimental results show that the proposed method can effectively improve the shortcomings of the gray-scale logarithmic speech spectrogram, and the recognition performance is higher than that of the speech spectrogram and the improved features. The best recognition results are obtained for the sample speech with a cut time of 1.0 s, and the recognition rate reaches 94.25% in the broadcast audio data set and 98.94% in the VoxForge public corpus.
-
-