生佳根, 胡雨晴. 基于灰关系分析的特征重要性评估及应用[J]. 微电子学与计算机, 2012, 29(11): 166-171.
引用本文: 生佳根, 胡雨晴. 基于灰关系分析的特征重要性评估及应用[J]. 微电子学与计算机, 2012, 29(11): 166-171.
SHENG Jia-gen, HU Yu-qing. Grey Relational Analysis Based Feature Importance Evaluation and Its Application[J]. Microelectronics & Computer, 2012, 29(11): 166-171.
Citation: SHENG Jia-gen, HU Yu-qing. Grey Relational Analysis Based Feature Importance Evaluation and Its Application[J]. Microelectronics & Computer, 2012, 29(11): 166-171.

基于灰关系分析的特征重要性评估及应用

Grey Relational Analysis Based Feature Importance Evaluation and Its Application

  • 摘要: 灰关系分析因为能够度量参考样本和比较样本间的相似性而被广泛应用于聚类和分类算法,尤其是样本信息不完全以及样本量较小时.其时间域通常为横向的每个样本的各个特征的数据.本文则从一个不同的角度,将灰关系的时间域变化为垂直方向的每个特征下各个样本的数据,并将排序问题的结果或者分类问题的标号作为其参考向量,通过特征对应列和参考向量之间的灰关系分析进而度量特征的重要性,从而为基于重要性的特征选择提供决策依据.该方法可广泛应用于大量特征但样本稀少情形下的降维,并能提供解释性良好的特征重要性参数.为验证这种重要性评估方法,在田径十项全能项目以及IRIS数据集上进行了基于灰关系的特征重要性评估实验,验证了该方法的实用价值.

     

    Abstract: Since Grey Relational Analysis (GRA) can measure the similarity between the reference samples and the compared samples, it is widely applied in clustering and classification, especially, on the condition that the information of samples is incomplete and the size of samples is small.In general, the time domain of gray relation is the data of the horizontal features of each sample.Differed from the traditional GRA, a novel aspect is revealed, where the time domain of gray relation becomes the data of the vertical features of each sample.After defining the reference vector by ranking or labels in classification, grey relational analysis is conducted between the corresponding column of each feature and the referenced vector, thus, the feature importance is computed by the grey relational grade.Then feature selection can be performed with the feature importance, so as to conduct dimension reduction for sparse and large number of features.Worthy to be pointed is that the presented method can provide the parameters of features importance with good interpretation ability.To verify the importance evaluation method, experiments are performed on Decathlon and IRIS data set and the experimental results show that it is consistent to the priror knowledge.

     

/

返回文章
返回