A review of feature selection methods
-
摘要:
在大数据时代,特征选择是对数据进行预处理的必要环节.特征选择作为一种数据降维技术,其主要目的是从原始数据中选择出对算法最有益的相关特征,降低数据的维度和学习任务的难度,提升模型的效率.现阶段,有关特征选择算法方面的研究已取得阶段性成效,但也面临着重大挑战,其中维度灾难就是特征选择与分类问题所面临的重大挑战.首先,介绍了特征选择算法的基本架构, 依次描述了子集的生成、子集的评估、终止条件、结果验证四个过程;其次,综述了特征选择领域的研究方法及研究成果,对特征选择方法分别依据评价策略、搜索策略、监督信息进行分类阐述,并对这些传统方法进行比较,指出它们的优势和不足;最后对特征选择进行了总结,并对其未来的研究方向进行了展望.
Abstract:In the era of big data, feature selection is a necessary part of data preprocessing. Feature selection is a data dimensionality reduction technology. Its main purpose is to select the most beneficial relevant features for the algorithm from the original data, reduce the dimensionality of the data and the difficulty of learning tasks, and improve the efficiency of the model. At this stage, the research on feature selection algorithms has achieved initial results, but it is also facing major challenges. Among them, the disaster of dimensionality is the major challenge faced by the feature selection and classification problem. First, the basic architecture of the feature selection algorithm is introduced, and the four processes of the generation of the subset, the evaluation of the subset, the termination condition, and the result verification are described in sequence Second, the methods are classified according to evaluation strategy, search strategy, and supervision information respectively, and compare these traditional methods, and point out their advantages and disadvantages. Finally, the feature selection is summarized and its future research direction is prospected.
-
Key words:
- machine learning /
- feature selection /
- evaluation strategy /
- search strategy /
- supervision information
-
表 1 基于评价策略方法的比较
Table 1. Comparison of methods based on evaluation strategies
特征选择方法 过滤式方法 包装式方法 嵌入式方法 效率 较高 基于序列搜索的较高。基于随机搜索的较低。 较高 使用场景 适用于大规模数据集 不适合高维数据集 可处理高维数据集 优点 不依赖于特定分类器,算法的通用性强、复杂性低。特征子集冗余度较低。 特征子集分类性能通常更好,效率较高。 特征子集性能较好,效率较高。 缺点 算法的评估准则独立于特定的学习算法,所选的特征子集在分类精确率方面通常低于包装式方法。 通用性弱,计算复杂度很高。 依赖具体的学习算法,可能出现过拟合情况。 表 2 三种搜索策略方法的比较
Table 2. Comparison of three search strategy methods
搜索策略方式 全局搜索策略 随机搜索策略 序列搜索策略 效率 低 较高 高 使用场景 仅适用于低维度特征集 均适用 均适用 优点 可实现全局最优 时间复杂度比全局全局搜索低,可获得优于序列搜索的近似最优解。 时间复杂度最低 缺点 只在低维数据集中有使用价值 时间复杂度比序列搜索高 获得的特征子集仅是局部最优 表 3 基于不同监督信息特征选择方法的比较
Table 3. Comparison of Feature SelectionMethods Based on Different Supervision Information
特征选择方法 有监督特征选择方法 半监督特征选择方法 无监督特征选择方法 优点 不依赖于样本空间的分布 只需获取少量具有标签信息的样本,有效减少标注代价。 不适用任何标签信息,运算成本大幅度降低。 缺点 需要获得大量的具有标签信息的样本,且新应用的特征无法提前预测;运算成本高;无法对未知数据进行处理。 对有噪声干扰数据的处理效果不理想 训练样本的歧义性高 -
[1] KIRA K, RENDELL L A. The feature selection problem: traditional methods and a new algorithm[C]//Proceedings of the Tenth National Conference on Artificial Intelligence. San Jose, California: AAAI Press, 1992. [2] KOLLER D, SAHAMI M. Toward optimal feature selection[C]//Proceedings of the Thirteenth International Conference on International Conference on Machine Learning. Bari Italy, 1996. [3] ALMUALLIM H, DIETTERICH T G. Learning Boolean concepts in the presence of many irrelevant features[J]. Artificial Intelligence, 1994, 69(1-2): 279-305. DOI: 10.1016/0004-3702(94)90084-1. [4] DASH M, LIU H. Consistency-based search in feature selection[J]. Artificial Intelligence, 2003, 151(1-2): 155-176. DOI: 10.1016/S0004-3702(03)00079-1. [5] KWAK N, CHOI C H. Input feature selection by mutual information based on Parzen window[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(12): 1667-1671. DOI: 10.1109/TPAMI.2002.1114861. [6] PENG H C, LONG F H, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. DOI: 10.1109/TPAMI.2005.159. [7] ESTEVEZ P A, TESMER M, PEREZ C A, et al. Normalized mutual information feature selection[J]. IEEE Transactions on Neural Networks, 2009, 20(2): 189-201. DOI: 10.1109/TNN.2008.2005601. [8] BRUNATO M, BATTITI R. X-MIFS: exact mutual information for feature selection[C]//Proceedings of 2016 International Joint Conference on Neural Networks (IJCNN). Vancouver: IEEE, 2016. DOI: 10.1109/IJCNN.2016.7727644. [9] 崔文岩, 孟相如, 李纪真, 等. 基于粗糙集粒子群支持向量机的特征选择方法[J]. 微电子学与计算机, 2015, 32(1): 120-123. https://www.cnki.com.cn/Article/CJFDTOTAL-WXYJ201501026.htmCUI W Y, MENG X R, LI J Z, et al. Feature selection based on RS-PSO-SVM[J]. Microelectronics & Computer, 2015, 32(1): 120-123. https://www.cnki.com.cn/Article/CJFDTOTAL-WXYJ201501026.htm [10] 董红斌, 滕旭阳, 杨雪. 一种基于关联信息熵度量的特征选择方法[J]. 计算机研究与发展, 2016, 53(8): 1684-1695. DOI: 10.7544/issn1000-1239.2016.20160172.DONG H B, TENG X Y, YANG X. Feature selection based on the measurement of correlation information entropy[J]. Journal of Computer Research and Development, 2016, 53(8): 1684-1695. DOI: 10.7544/issn1000-1239.2016.20160172. [11] KOHAVI R, JOHN G H. Wrappers for feature subset selection[J]. Artificial Intelligence, 1997, 97(1-2): 273-324. DOI: 10.1016/S0004-3702(97)00043-X. [12] SINDHWANI V, RAKSHIT S, DEODHARE D, et al. Feature selection in MLPs and SVMs based on maximum output information[J]. IEEE Transactions on Neural Networks, 2004, 15(4): 937-948. DOI: 10.1109/TNN.2004.828772. [13] YANG J B, SHEN K Q, ONG C J, et al. Feature selection for MLP neural network: the use of random permutation of probabilistic outputs[J]. IEEE Transactions on Neural Networks, 2009, 20(12): 1911-1922. DOI: 10.1109/TNN.2009.2032543. [14] 叶婷婷, 刘明霞, 张道强. 基于有效距离的多模态特征选择[J]. 模式识别与人工智能, 2016, 29(7): 658-664. DOI: 10.16451/j.cnki.issn1003-6059.201607010.YE T T, LIU M X, ZHANG D Q. Effective distance based multi-modality feature selection[J]. Pattern Recognition and Artificial Intelligence, 2016, 29(7): 658-664. DOI: 10.16451/j.cnki.issn1003-6059.201607010. [15] CAO T T, ZHANG M J, ANDREAE P, et al. A wrapper feature selection approach to classification with missing data[C]//Proceedings of the 19th European Conference on the Applications of Evolutionary Computation. Springer International Publishing. Porto, Portugal: Springer, 2016. DOI: 10.1007/978-3-319-31204-0_44. [16] QUINLAN J R. Introduction of decision trees[J]. Machine Learning, 1986, 1(1): 81-106. [17] BREIMAN L, FRIEDMAN J H, OLSHEN R A, et al. Classification and regression trees[J]. Biometrics, 1984, 40(3): 874. DOI: 10.2307/2530946. [18] SETIONO R, LIU H. Neural-network feature selector[J]. IEEE Transactions on Neural Networks, 1997, 8(3): 654-662. DOI: 10.1109/72.572104. [19] SHEN K Q, ONG C J, LI X P, et al. Feature selection via sensitivity analysis of SVM probabilistic outputs[J]. Machine Learning, 2008, 70(1): 1-20. DOI: 10.1007/s10994-007-5025-7. [20] 周红标, 乔俊飞. 基于高维k-近邻互信息的特征选择方法[J]. 智能系统学报, 2017, 12(5): 595-600. DOI: 10.11992/tis.201609020.ZHOU H B, QIAO J F. Feature selection method based on high dimensional k-nearest neighbors mutual information[J]. CAAI Transactions on Intelligent Systems, 2017, 12(5): 595-600. DOI: 10.11992/tis.201609020. [21] SOMOL P, PUDIL P, KITTLER J. Fast branch & bound algorithms for optimal feature selection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(7): 900-912. DOI: 10.1109/TPAMI.2004.28. [22] 张永波, 游录金, 陈杰新. 基于模拟退火的多标记数据特征选择[J]. 计算机工程与设计, 2011, 32(7): 2494-2496. DOI: 10.16208/j.issn1000-7024.2011.07.084.ZHANG Y B, YOU L J, CHEN J X. Feature selection for multi-label data by using simulated annealing[J]. Computer Engineering and Design, 2011, 32(7): 2494-2496. DOI: 10.16208/j.issn1000-7024.2011.07.084. [23] 张翠军, 陈贝贝, 周冲, 等. 基于多目标骨架粒子群优化的特征选择算法[J]. 计算机应用, 2018, 38(11): 3156-3160. DOI: 10.11772/j.issn.1001-9081.2018041358.ZHANG C J, CHEN B B, ZHOU C, et al. Feature selection algorithm based on multi-objective bare-bones particle swarm optimization[J]. Journal of Computer Applications, 2018, 38(11): 3156-3160. DOI: 10.11772/j.issn.1001-9081.2018041358. [24] 叶志伟, 郑肇葆, 万幼川, 等. 基于蚁群优化的特征选择新方法[J]. 武汉大学学报·信息科学版, 2007, 32(12): 1127-1130. DOI: 10.3969/j.issn.1671-8860.2007.12.010.YE Z W, ZHENG Z B, WAN Y C, et al. A novel approach for feature selection based on ant colony optimization algorithm[J]. Geomatics and Information Science of Wunan University, 2007, 32(12): 1127-1130. DOI: 10.3969/j.issn.1671-8860.2007.12.010. [25] WUTZL B, LEIBNITZ K, RATTAY F, et al. Genetic algorithms for feature selection when classifying severe chronic disorders of consciousness[J]. PLoS One, 2019, 14(7): e0219683. DOI: 10.1371/journal.pone.0219683. [26] LIU H, YU L. Toward integrating feature selection algorithms for classification and clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(4): 491-502. DOI: 10.1109/TKDE.2005.66. [27] WU X X, CHENG Q. Top-k regularization for supervised feature selection[J]. arXiv: 2106.02197, 2021. [28] WANG C, CHEN X J, YUAN G W, et al. Semisupervised feature selection with sparse discriminative least squares regression[J]. IEEE Transactions on Cybernetics, 2021: 1-12. DOI: 10.1109/TCYB.2021.3060804. [29] WU X P, CHEN H M, LI T R, et al. Semi-supervised feature selection with minimal redundancy based on local adaptive[J]. Applied Intelligence, 2021, 51(11): 8542-8563. DOI: 10.1007/s10489-021-02288-4. [30] KUMAGAI A, IWATA T, FUJIWARA Y. Few-shot learning for unsupervised feature selection[J]. arXiv: 2107.00816, 2021. -