梁文桐, 朱艳辉, 詹飞, 张旭, 欧阳康, 孔令巍, 黄雅淋. 基于伪标签置信选择的半监督医疗事件抽取[J]. 微电子学与计算机, 2022, 39(1): 71-79. DOI: 10.19304/J.ISSN1000-7180.2021.0448
引用本文: 梁文桐, 朱艳辉, 詹飞, 张旭, 欧阳康, 孔令巍, 黄雅淋. 基于伪标签置信选择的半监督医疗事件抽取[J]. 微电子学与计算机, 2022, 39(1): 71-79. DOI: 10.19304/J.ISSN1000-7180.2021.0448
LIANG Wentong, ZHU Yanhui, ZHAN Fei, ZHANG Xu, OUYANG Kang, KONG Lingwei, HUANG Yalin. Semi-supervised medical event extraction based on pseudo-label confidence selection[J]. Microelectronics & Computer, 2022, 39(1): 71-79. DOI: 10.19304/J.ISSN1000-7180.2021.0448
Citation: LIANG Wentong, ZHU Yanhui, ZHAN Fei, ZHANG Xu, OUYANG Kang, KONG Lingwei, HUANG Yalin. Semi-supervised medical event extraction based on pseudo-label confidence selection[J]. Microelectronics & Computer, 2022, 39(1): 71-79. DOI: 10.19304/J.ISSN1000-7180.2021.0448

基于伪标签置信选择的半监督医疗事件抽取

Semi-supervised medical event extraction based on pseudo-label confidence selection

  • 摘要: 医疗事件抽取是构建医疗知识图谱的重要基础.针对医疗领域有标签数据匮乏的问题,构建基于Transformer编码器、BiLSTM和注意力机制的医疗事件联合抽取模型,并提出一种用于选择高置信度数据的伪标签置信选择算法.首先,训练医疗事件联合抽取模型对无标签数据进行预测产生伪标签数据;然后,通过计算伪标签一致概率P来选择高置信度的伪标签数据,将其加入原有数据中重新训练联合抽取模型;最后,使用更新的医疗事件联合抽取模型对电子病历中肿瘤原发部位、病灶大小和转移部位事件进行抽取,并使用多数投票得到最终的抽取结果.以2020年全国知识图谱与语义计算大会(CCKS2020)中面向中文电子病历的医疗事件抽取任务语料作为实验数据,实验结果表明,本文提出方法获得了较好的医疗事件抽取结果.

     

    Abstract: Medical event extraction is an important foundation for constructing medical knowledge graphs. Aiming at the problem of lack of label data in the medical field, a joint extraction model of medical events based on Transformer Encoder, BiLSTM and attention mechanism is constructed, and a pseudo-label confidence selection algorithm for selecting high-confidence data is proposed. Firstly, the medical event joint extraction model is trained to predict unlabeled data and generate pseudo-labeled data. Secondly, , high-confidence pseudo-label data is selected by calculating the pseudo-label consensus probability P, and is added to the original data to retrain the joint extraction model. Finally, the updated medical event joint extraction model is used to extract the primary sites, focus sizes and metastatic sites events in the medical electronic medical records, and use majority voting to obtain the final extraction results. Taking the medical event extraction task corpus for Chinese electronic medical records in the 2020 National Knowledge Graph and Semantic Computing Conference (CCKS2020) as experimental data, the experimental results show that the method proposed in this paper has obtained better medical event extraction results.

     

/

返回文章
返回