Opis projektu
This project covers the topic of designing efficient machine learning methods for the multi-class scenarios
suffering from uneven distribution of training samples in classes. Typically supervised learning methods are
designed to work with reasonably balanced data set, but many real world applications have to face imbalanced
data sets. A data set is said to be imbalanced when several classes are under-represented (minority classes) in
comparison with others (majority classes).
Learning from imbalanced data is among the contemporary challenges in machine learning and multi-class
imbalance stands out as the most difficult scenario. In binary imbalanced learning the relationships between
classes are easy to be defined: one class is the majority one, while the other is the minority one. However, in
multi-class scenarios this is no longer obvious, as the correlations between classes may vary and one class can
be at the same time minority and majority one with respect to different classes. Therefore canonical methods
designed for binary cases cannot be directly applied in such scenarios.
In this project we form a hypothesis that it is possible to design efficient multi-class methods for such
compound imbalance problems that could process all of classes at once. It aims at exploring three main
directions in multi-class imbalanced learning:
The presented literature survey allows us to conclude that there is a need to develop novel methodologies for
handling multi-class imbalanced problems and exploring the characteristics of examples within class
structures. This project aims at filling this area, by conducting a general investigation on how to analyse multiclass
imbalanced problems and design novel data preprocessing and classification algorithms dedicated to this
area.