Slide 1: Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain
Slide 2: Aim
Enhance the applicability of LCSs to knowledge discovery from datasets Classification problems Real-world domains
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 3: Framework
Dataset
LCS
model
+
estimated performance
• Representativity of the target concept • Geometrical complexity • Class imbalance • Noise
Learning Classifier Systems for Class Imbalance Problems
• Evolutionary pressures • Interpretability • Domain of applicability
Ester Bernadó-Mansilla
Slide 4: Class Imbalance
When one class is represented by a small number of examples, compared to other class/es. Usually the class of that describes the circumscribed concept (positive class) is the minority class Where?
Rare medical diagnoses Fraud detection Oil spills in satellite images
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 5: Class Imbalance and Classifiers
Is there a bias towards the majority class? Probably, because…
Most classifier schemes are trained to minimize the global error
As a result
They classify accurately the examples from the majority class They tend to misclassify the examples of the minority class, which are often those representing the target concept.
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 6: Measures of Performance
Confusion matrix Prediction A Actual A B true positive (TP) false positive (FP) B false negative (FN) true negative (TN)
Accuracy = (TP+TN)/(TP+FN+FP+TN) TN rate = TN / (TN + FP) TP rate = TP / (FN + TP)
ROC curves
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
Slide 7: The Higher Class Imbalance: the Higher Bias?
Dataset 1 concept: 15 counterpart: 150 ratio: 10:1 Dataset 2 concept: 15 counterpart: 45 ratio: 3:1
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 8: XCS
XCS
input Set of Rules search Genetic Algorithms update Reinforcement Learning class
reward Environment
Dataset
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 9: Our Approach with XCS
Bounding XCS’s parameters for unbalanced datasets Online identification of small disjuncts Adaptation of parameters for the discovery of small disjuncts
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 10: XCS’s Behavior in Unbalanced Datasets
Unbalanced 11-multiplexer problem
ir=16:1
ir=32:1
ir=64:1
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 11: XCS’s Population
Most numerous rules, ir=128:1
Classifier ###########:0 ###########:1 P 1000 1.2 10-4 Error 0.12 0.074 F 0.98 0.98 Num 385 366
overgeneral classifiers
estimated prediction: 992.24 7.75
estimated error: 15.38
high fitness
too high numerosity
Test examples are classified as belonging to the majority class
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 12: How Imbalance Affects XCS
Classifier’s error Stability of prediction and error estimates Occurrence-based reproduction
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 13: Classifier’s Error in Unbalanced Datasets
Will an overgeneral classifier be detected as inaccurate if the imbalance ratio is high?
Bound for inaccurate classifier: Given the estimated prediction and error:
!"!0
P = Pc (cl ) Rmax + (1 ! Pc (cl )) Rmin "=| P ! Rmax | Pc (cl )+ | P ! Rmin | (1 ! Pc (cl ))
We derive:
# "o p 2 + 2 p ( Rmax # "0 )# "0 ! 0
where For
p =!C / C
!"!0
Rmax = 1000 !0 = 1
we get maximum imbalance ratio:
irmax = 1998
Learning Classifier Systems for Class Imbalance Problems Ester Bernadó-Mansilla
Slide 14: Prediction and Error Estimates and Learning Rate
ir=128:1, ###########:0
Prediction Error
Learning Classifier Systems for Class Imbalance Problems
β=0.002
β=0.2
Ester Bernadó-Mansilla
Slide 15: Occurrence-based Reproduction
Probability of occurrence (pocc) Given ir=maj/min:
0,6
Classifier ########### :0 ########### :1 0000#######:0 0001#######:1
poccB 1/2 1/2 1/32 1/32
poccI 1/2
probability of occurrence
1/2
0,5
0,4
0,3
0,2
0,1
0 1 2 4 8 16 32 64 128 256
22 ir p occB ir + 1
imbalance ratio 00001######:1 00000######:0 ###########:0 ###########:1
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 16: Occurrence-based Reproduction
Probability of reproduction (pGA)
pGA = 1 TGA if Tocc < % GA otherwise
#% GA where TGA $ " !Tocc
With θGA=20:
Tocc
GA
T (# # # # # # # # # # #: 0) ! " GA GA
θGA
…
Tocc
T (0000# # # # # # #: 0) ! T 1 GA occ
θGA
1 Assuming
…
GA
non-overlapping Ester Bernadó-Mansilla
Learning Classifier Systems for Class Imbalance Problems
Slide 17: Guidelines for Parameter Tuning
Rmax and є0 determine the threshold between negligible noise and imbalance ratio β determines the size of the moving window. The window should be high enough to allow computing examples from both classes:
! =k
f min f maj
θGA can counterbalance the reproduction opportunities of most frequent (majority) and least frequent niches (minority):
! GA = k '
1 f min
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 18: XCS with Parameters Tuning
XCS with standard settings
XCS with parameter tuning
ir=16:1
ir=32:1
ir=64:1
ir=64:1
ir=256:1
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 19: XCS Tuning for Real-world Datasets
How we can estimate the niche frequency?
Estimate from the ratio of majority class instances and minority class instances Problem:
• This may not be related to the distribution of niches in the feature space
Take the approach to the small disjuncts problem
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 20: Online Identification of Small Disjuncts
We search for regions that promote overgeneral classifiers Estimate ircl based on the classifier’s experience on each class:
ircl = exp max exp min
Adapt β and θGA according to ircl
ircl = 20 / 4
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 21: Online Parameter Adaptation
ir=256:1
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 22: What about UCS?
Supervised XCS:
Needs less exploration
Avoids XCS’s fitness dilemma More robust to parameter settings Overgeneral classifiers also tend to overcome the population
Their probability of occurrence depends on the imbalance ratio Partially minimized with fitness sharing
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 23: What about UCS?
ir=256:1
ir=512:1
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 24: Are LCSs more error-prone to class imbalance than other classifier schemes?
TP rate
C4.5 Bal2c1 Bal2c2 Bal2c3 bpa gls2c1 gls2c2 gls2c3 gls2c4 gls2c5 gls2c6 h-s pim tao thy2c1 thy2c2 thy2c3 wav2c1 wav2c2 wab2c3 wbdc wdbc wine2c1 wine2c2 wine2c3 wpbc 0,00% 81,65% 81,90% 42,95% 80,00% 35,00% 30,00% 75,00% 77,14% 59,82% 75,83% 55,37% 95,23% 90,00% 94,17% 90,95% 75,74% 72,34% 77,64% 92,95% 92,47% 89,00% 95,00% 90,18% 41,00% ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0,00% 6,83% 6,04% 14,09% 42,16% 47,43% 42,16% 32,63% 16,77% 15,13% 13,29% 13,27% 2,14% 16,10% 12,45% 10,34% 4,06% 3,89% 2,38% 3,42% 5,09% 16,63% 8,05% 11,70% 12,87% 0,00% 93,72% 93,77% 0,00% 0,00% 15,00% 0,00% 81,67% 10,00% 0,00% 80,00% 53,38% 84,11% 76,67% 54,17% 33,81% 88,51% 84,57% 89,97% 95,42% 94,81% 100,00% 98,33% 97,14% 9,50% SMO ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0,00% 4,64% 5,59% 0,00% 0,00% 33,75% 0,00% 25,40% 9,64% 0,00% 7,03% 6,42% 6,17% 22,50% 24,92% 21,35% 3,20% 4,05% 3,48% 5,36% 2,71% 0,00% 5,27% 6,02% 17,07% 0,00% 81,96% 83,99% 61,38% 50,00% 55,00% 5,00% 81,67% 84,29% 81,79% 80,00% 55,93% 92,58% 90,00% 90,83% 90,71% 87,24% 78,72% 87,86% 95,83% 93,83% 100,00% 98,33% 98,57% 30,50% XCS ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± 0,00% 6,00% 6,88% 9,10% 52,70% 49,72% 15,81% 25,40% 14,21% 13,95% 9,78% 9,75% 5,72% 16,10% 14,93% 8,05% 3,43% 2,57% 3,65% 5,89% 6,37% 0,00% 5,27% 4,52%
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
24,99%
Slide 25: How can we Minimize the Effects of Small Disjuncts?
Resampling the dataset:
Classical methods:
• Random oversampling • Random undersampling
Addresses small disjuncts Assumes that clusterization will find small disjuncts and match classifier’s approximation Could XCS benefit from the online identification of small disjuncts?
Ester Bernadó-Mansilla
Heuristic methods:
• • • • Tomek links CNN One-sided selection Smote
Cluster-based oversampling
Cost-sensitive classifiers
Learning Classifier Systems for Class Imbalance Problems
Slide 26: Domains of Applicability
Should we use some counterbalancing scheme? Which learning scheme should we use? Is there a combination of counterbalancing scheme+learner that beats all others? How can we know the presence of small disjuncts? Are there other complexity factors mixed up with the small disjuncts problem?
Ester Bernadó-Mansilla
Learning Classifier Systems for Class Imbalance Problems
Slide 27: Domains of Applicability
Resampling/ Classifier/ Resampling+classifier
Learn it!
Where are LCSs placed?
Dataset
Dataset characterization
Prediction
Suggested approach
Type of dataset: Geometrical distribution of classes Possible presence of small disjuncts Other complexity factors
Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla
Slide 28: Future Directions
Potential benefit of XCS to discover small disjuncts
…and learn from it online
Further analyze UCS How do LCSs perform w.r.t. other classifiers for unbalanced datasets? Measures for small disjuncts identification
… and other possible complexity factors
What is noise and what is a small disjunct? In which cases a LCS is applicable?
Ester Bernadó-Mansilla
Learning Classifier Systems for Class Imbalance Problems
Slide 29: Learning Classifier Systems for Class Imbalance Problems
Ester Bernadó-Mansilla Research Group in Intelligent Systems Enginyeria i Arquitectura La Salle Universitat Ramon Llull Barcelona, Spain