具有混合奖惩信号的脉冲时间依赖可塑性算法

陈运享; 冯忍; 陈云华

doi:10.19304/J.ISSN1000-7180.2022.0108

具有混合奖惩信号的脉冲时间依赖可塑性算法

Spiking timing dependent plasticity algorithm with mixed reward-modulated signals

摘要

摘要: 近年来，具有生理学基础的脉冲时间依赖可塑性(Spiking Timing-Dependent Plasticity，STDP)规则在脉冲神经网络中得到了越来越多的应用.由STDP规则和奖惩机制相结合的R-STDP(reward-modulated STDP)学习算法在改善脉冲神经网络的性能上有良好的效果.但R-STDP算法在训练多层脉冲神经网络时，仍存在反馈信号仅作用于网络末层、中间层无法获得有用奖惩信号.为此，利用自编码器的无监督特性，提出一种具有混合奖惩信号的MR-STDP(Mix Reward-modulated STDP)算法.在中间层中增加重构层以够建基于卷积自编码器的奖惩信号因子模型，通过比较卷积层和重构层的神经元脉冲发放时间，获取中间层网络权重调整的指导因子信号.指导因子信号是对比层间自编码器的输入层与重构层的相同位置神经元所发放的脉冲序列相似性度量指标，并将其与R-STDP相结合，使得中间层能够获得权重指导信号.在MNIST和COVID-19 CT数据集上的实验结果表明，该方法取得了比R-STDP更高的精度，且中间层网络的学习效率大幅提高.

Abstract: In recent years, Spiking Timing-Dependent Plasticity (STDP) rules with physiological basis have been applied more and more in spiking neural networks. The R-STDP (reward-modulated STDP) learning algorithm combining STDP with the reinforcement learning reward modulation embraces great effect on improving the performance of SNN. However, the feedback only reflects on the last layer of spiking deep convolutional neural networks as the R-STDP algorithm works, which means the middle layer cannot get feedback. Inspired by the unsupervised characteristics of the Auto-Encoder, a mix reward-modulatedSTDP (MR-STDP) algorithm with mixed reward/punishment signal was proposed. In this algorithm, the reconstruction layer was added to the middle layer to establish the rewards/punishment signal factor model. The guiding factor signal is the similarity measure of spiking sequences issued by the neurons at the same position of the input layer of the interlayer autoencoder and the reconstruction layer, and it is combined with R-STDP, so that the middle layer can obtain the weight guiding signal. Experiments on MNIST and COVID-19 CT data sets shows that the proposed method achieves higher accuracy than R-STDP, and the efficiency of learning in middle layer is greatly improved.

HTML全文

参考文献(20)

施引文献

资源附件(0)