基于HASH表的多谓词约束下频繁项集挖掘

Multi-predicate Constrained Frequent Itemsets Mining Based on Hash Table

摘要: 针对在交易数据库中挖掘出指定顾客相关属性的频繁项集这一问题, 提出了基于维约束进行求解的构想.采用模式增长的挖掘方法, 但与传统的模式树不同的是将原先每一节点频繁计数值设为在所有可能的谓词约束下该项的计数形成的向量, 并利用HASH表进行向量值及项所在层的位置映射, 因此, 在不同的约束组合下的频繁项集挖掘将不再需要扫描数据库.仿真实验表明该挖掘算法的完备性, 通过与先筛选再挖掘的算法进行比较, 证明该挖掘算法具有更高的效率.

Abstract: Aiming at how to mine frequent itemsets from affair database after specifying customer characters, the conception of mining with multidimensional constrained is brought forward.The FP_Growth algorithm is employed, but the way for constructing FP_Tree is different in every item-node, it's not the count of the item occurring in the database, but using a vector that makes of counts of the item under every constructing, the node vector and level structure are saved in HASH table, it can support mining the frequent itemsets in facultative constructing without scanning the database again.The result of experiment proves this algorithm is self-contained and has higher efficiency by comparing to what mine after selecting.