Abstract:
This paper presents a kind of architecture of cluster monitoring system with self-healing ability, which enables cluster monitoring system to find and eliminate faults of a cluster system automatically and timely, and strengthens the intelligence and active control ability of a cluster monitoring system itself greatly, and then improves the reliability of a cluster system.A cluster monitoring system developed by employing this proposed architecture has been applied to Linux-based cluster storage system and achieved good effectiveness.