A Pruning of Random Forests: a diversity-based heuristic measure to simplify a random forest ensemble
Main Article Content
Abstract
Random forests are among the most successful ensemble methods. They are fast, noise-resistant and do not suffer from over-learning. Moreover, they offer possibilities of explanation and visualization. In this paper, we propose to simplify a set of random forests using an entropy function that measures the diversity of trees in the forest. The function is used in two types of paths: an SFS path and a path based on genetic algorithms. The proposed methods are applied to datasets of the UCI Repository. The results are encouraging and provide ensembles of smaller sizes with performances that are similar to or even,in some cases,exceed the performances of the initial forest. Moreover, the comparison between the two methods shows that in most cases SFS provides reduced ensemble compared to GA, but the latter gives better success rates in the majority of cases.
Article Details
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.