黎长江, 胡燕. 基于循环神经网络的音素识别研究[J]. 微电子学与计算机, 2017, 34(8): 47-51.
引用本文: 黎长江, 胡燕. 基于循环神经网络的音素识别研究[J]. 微电子学与计算机, 2017, 34(8): 47-51.
LI Chang-jiang, HU Yan. Research of Phoneme Recognition Based on Recurrent Neural Network[J]. Microelectronics & Computer, 2017, 34(8): 47-51.
Citation: LI Chang-jiang, HU Yan. Research of Phoneme Recognition Based on Recurrent Neural Network[J]. Microelectronics & Computer, 2017, 34(8): 47-51.

基于循环神经网络的音素识别研究

Research of Phoneme Recognition Based on Recurrent Neural Network

  • 摘要: 基于隐马尔科夫模型(HMM)和循环神经网络(RNN)的HMM-RNN混合模型在语音识别中取得了很大的成功.然而使用HMM需要知道每一帧对应的标签才能进行有效的训练,在数据的准备阶段需要将语音进行预对齐;另一方面,在语音信号的分帧过程中,相邻帧有1/2-1/3的重合部分,由于RNN的计算过程本身就是上下文相关的,相邻帧的重合部分增加了整个系统的训练时间.针对上述问题,使用连接时序分类(CTC)来代替HMM跟RNN结合,并在语音分帧过程中去除相邻帧之间的重合部分,使用TIMIT语音数据集,进行音素上的识别任务,并且实验结果表明CTC-BLSTM模型在音素上的识别率要高于HMM-BLSTM混合模型,CTC-BSLTM在去除帧重合后能够大幅提高系统的训练效率并且保证识别率大致相同.

     

    Abstract: Recently HMM-RNN hybrid system has been proved to be successful in speech recognition,But using HMM to dealing these tasks would need inputs and outputs to be pre-aligned, so the training process works effectively, on the other hand, when dividing signal into frames, each frames nearby will have a same part overlapped,since the calculation of RNN is context-dependent, the overlapped part increases the training time. This paper combines CTC with RNN instead of HMM, and remove the overlapped part during framing modeling TIMIT dataset on phone recognition tasks. The experiments show that CTC-BLSTM performs better than HMM-BLSTM on phone recognition, and removing the overlapped part of frames can make system more efficient and ensure the accuracy at a certain degree.

     

/

返回文章
返回