Software defect prediction


Predicting where defects are in software is the holly grail of most software testers. Finding and fixing defects is estimated to cost billions of pounds per year, so any automated help in reliably predicting where faults are, and focusing the efforts of testers, will have a significant impact on the cost of production and maintenance of software.

Defect prediction research has been ongoing for many years using regression techniques and, recently, machines learning algorithms to predict where defects are. This work has provided some insight into where defects can be found, however it does not appear to have been taken up by practitioners. One reason for this may be due to the difficulty of choosing and building defect prediction models.

Software defect prediction at the University of Hertfordshire has developed greatly over the last five years. We have built models to predict where defects are, using Support Vector Machines with some success. Our recent work has focused on analysing models produced by other researchers and improving the algorithms/protocols used to build defect prediction models.

Our collaborative work with other Universities has led to some significant analysis in the area of defect prediction and our work has resulted in an important discussion relating to the quality of data used in defect prediction and how defect prediction models should be built.


This work has applications in Software testing and machine learning in general. We have contributed practical advice to practitioners though articles in IEEE Software Voice of Evidence and other leading Journals on building practical defect prediction models. We have contributed to open discussions with both researchers and practitioners in open workshops.