KSSK — Metody klasyfikacji wieloklasowej danych niezbalansowanych

ID Metody klasyfikacji wieloklasowej danych niezbalansowanych Kierownik prof. dr hab. inż. Michał Woźniak Kierownik M. Woźniak 2016-07-22— 2020-01-21 PROJEKT NIEAKTYWNY

Opis projektu

This project covers the topic of designing efficient machine learning methods for the multi-class scenarios
suffering from uneven distribution of training samples in classes. Typically supervised learning methods are
designed to work with reasonably balanced data set, but many real world applications have to face imbalanced
data sets. A data set is said to be imbalanced when several classes are under-represented (minority classes) in
comparison with others (majority classes).

Learning from imbalanced data is among the contemporary challenges in machine learning and multi-class
imbalance stands out as the most difficult scenario. In binary imbalanced learning the relationships between
classes are easy to be defined: one class is the majority one, while the other is the minority one. However, in
multi-class scenarios this is no longer obvious, as the correlations between classes may vary and one class can
be at the same time minority and majority one with respect to different classes. Therefore canonical methods
designed for binary cases cannot be directly applied in such scenarios.
In this project we form a hypothesis that it is possible to design efficient multi-class methods for such
compound imbalance problems that could process all of classes at once. It aims at exploring three main
directions in multi-class imbalanced learning:

how to analyse the structure of classes and identify the difficult examples,
how to design imbalanced pre-processing methods (such as under and oversampling) specifically for multi-class problems,
how to train efficient classifiers and ensemble learners with balanced performance on all of classes. We plan to identify general rules for designing efficient methods for learning from multi-class imbalanced data, proposing novel algorithms for this task and developing dedicated software packages that could be used in this area of research.

The presented literature survey allows us to conclude that there is a need to develop novel methodologies for
handling multi-class imbalanced problems and exploring the characteristics of examples within class
structures. This project aims at filling this area, by conducting a general investigation on how to analyse multiclass
imbalanced problems and design novel data preprocessing and classification algorithms dedicated to this
area.