Pitfalls in Benchmarking Data Stream
Classification and How to Avoid Them
Albert Bifet1, Jesse Read2, Indr˙e ˇZliobait˙e3
Bernhard Pfahringer4, Geoff Holmes4
1Yahoo! Research Barcelona
2Universidad Carlos III, Madrid, Spain
3Aalto University and Helsinki Institute for Information Technology (HIIT), Finland
4University of Waikato, Hamilton, New Zealand
ECML-PKDD 2013, 25 September 2013
Data Streams
Data Streams
Sequence is potentially infinite
High amount of data: sublinear space
High speed of arrival: sublinear time per example
Once an element from a data stream has been processed
it is discarded or archived
Big Data & Real Time
1. Motivation
Electricity Dataset
Popular benchmark for testing adaptive classifiers
Collected from the Australian New South Wales Electricity
Market.
Contains 45,312 instances which record electricity prices
at 30 minute intervals.
The class label identifies the change of the price (UP or
DOWN) related to a moving average of the last 24 hours.
Electricity Dataset, Accuracy
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
Accuracy,%
VFDT Majority Class
Naive Bayes
Electricity Dataset, Accuracy
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
Accuracy,%
Magic Classifier VFDT
Majority Class Naive Bayes
Electricity Dataset, Kappa Statistic
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
KappaStatistic,%
VFDT Naive Bayes
Electricity Dataset, Kappa Statistic
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
KappaStatistic,%
Magic Classifier VFDT
Naive Bayes
Electricity Dataset, Accuracy
Algorithm name Acc. (%) Algorithm name Acc. (%)
DDM 89.6* Local detection 80.4
Learn++.CDS 88.5 Perceptron 79.1
KNN-SPRT 88.0 AUE2 77.3
GRI 88.0 ADWIN 76.6
FISH3 86.2 EAE 76.6
EDDM-IB1 85.7 Prop. method 76.1
Magic classifier 85.3 Cont. λ-perc. 74.1
ASHT 84.8 CALDS 72.5
bagADWIN 82.8 TA-SVM 68.9
DWM-NB 80.8
* tested on a subset
2. Problem
No-Change classifier: Weather classifier
Prediction for tomorrow: the same as
today
Electricity Dataset, Accuracy
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
Accuracy,%
No-Change VFDT
Majority Class Naive Bayes
Electricity Dataset, Kappa Statistic
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
KappaStatistic,%
No-Change VFDT
Naive Bayes
Characteristics of the Electricity Dataset
0.5 1 1.5 2 2.5 3 3.5 4 4.5
·104
20
30
40
50
60
Time, instances
Classprior,%
Characteristics of the Electricity Dataset
20 40 60 80 100 120 140 160 180 200
0
0.5
1
Lag, instances
Autocorrelation
3. Proposal
New Evaluation for Stream Classifiers
Kappa Statistic
p0: classifier’s prequential accuracy
pc: probability that a chance classifier makes a correct
prediction.
κ statistic
κ =
p0 − pc
1 − pc
κ = 1 if the classifier is always correct
κ = 0 if the predictions coincide with the correct ones as
often as those of the chance classifier
New Evaluation for Stream Classifiers
Kappa Plus Statistic
p0: classifier’s prequential accuracy
pe: no-change classifier’s prequential accuracy
κ+ statistic
κ+
=
p0 − pe
1 − pe
κ+ = 1 if the classifier is always correct
κ+ = 0 if the predictions coincide with the correct ones as
often as those of the no-change classifier
Electricity Market Dataset Accuracy
0 1 2 3 4
·104
60
80
100
Time, instances
Accuracy,%
No-Change HAT
Lev. Bagging
Electricity Market Dataset κ
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
KappaStatistic,%
No-Change HAT
Lev. Bagging
Electricity Market Dataset κ+
0 1 2 3 4
·104
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change HAT
Lev. Bagging
SWT: Temporally Augmented Classifier
SWT: meta strategy that builds meta instances by augmenting
the original input attributes with the values of recent class
labels from the past
Pr[class is c] ≡ h(xt
, ct−
, . . . , ct−1
)
for the t-th test instance, where is the size of the sliding
window over the most recent true labels.
Electricity Market Dataset κ+
0 1 2 3 4
·104
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change SWT HAT
SWT Lev. Bagging
Electricity Market Dataset κ+
0 1 2 3 4
·104
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change HAT
Lev. Bagging
Electricity Market Dataset κ+
0 1 2 3 4
·104
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change SWT HAT
SWT Lev. Bagging
Forest Cover Type Dataset
0 2 4
·105
60
80
100
Time, instances
Accuracy,%
No-Change HAT
Lev. Bagging
0 2 4
·105
0
20
40
60
80
100
Time, instances
KappaStatistic,% No-Change HAT
Lev. Bagging
0 2 4
·105
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change HAT
Lev. Bagging
0 2 4
·105
0
20
40
60
80
100
Time, instances
Accuracy,%
No-Change SWT HAT
SWT Lev. Bagging
0 2 4
·105
0
20
40
60
80
100
Time, instances
KappaStatistic,%
No-Change SWT HAT
SWT Lev. Bagging
0 2 4
·105
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change SWT HAT
SWT Lev. Bagging
Conclusions
Temporal dependence in data stream mining
new κ+ measure
a wrapper classifier SWT
Pitfalls in Benchmarking Data Stream
Classification and How to Avoid Them
Thanks!
Pitfalls in Benchmarking Data Stream
Classification and How to Avoid Them

Pitfalls in benchmarking data stream classification and how to avoid them

  • 1.
    Pitfalls in BenchmarkingData Stream Classification and How to Avoid Them Albert Bifet1, Jesse Read2, Indr˙e ˇZliobait˙e3 Bernhard Pfahringer4, Geoff Holmes4 1Yahoo! Research Barcelona 2Universidad Carlos III, Madrid, Spain 3Aalto University and Helsinki Institute for Information Technology (HIIT), Finland 4University of Waikato, Hamilton, New Zealand ECML-PKDD 2013, 25 September 2013
  • 2.
    Data Streams Data Streams Sequenceis potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Big Data & Real Time
  • 3.
  • 4.
    Electricity Dataset Popular benchmarkfor testing adaptive classifiers Collected from the Australian New South Wales Electricity Market. Contains 45,312 instances which record electricity prices at 30 minute intervals. The class label identifies the change of the price (UP or DOWN) related to a moving average of the last 24 hours.
  • 5.
    Electricity Dataset, Accuracy 01 2 3 4 ·104 0 20 40 60 80 100 Time, instances Accuracy,% VFDT Majority Class Naive Bayes
  • 6.
    Electricity Dataset, Accuracy 01 2 3 4 ·104 0 20 40 60 80 100 Time, instances Accuracy,% Magic Classifier VFDT Majority Class Naive Bayes
  • 7.
    Electricity Dataset, KappaStatistic 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances KappaStatistic,% VFDT Naive Bayes
  • 8.
    Electricity Dataset, KappaStatistic 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances KappaStatistic,% Magic Classifier VFDT Naive Bayes
  • 9.
    Electricity Dataset, Accuracy Algorithmname Acc. (%) Algorithm name Acc. (%) DDM 89.6* Local detection 80.4 Learn++.CDS 88.5 Perceptron 79.1 KNN-SPRT 88.0 AUE2 77.3 GRI 88.0 ADWIN 76.6 FISH3 86.2 EAE 76.6 EDDM-IB1 85.7 Prop. method 76.1 Magic classifier 85.3 Cont. λ-perc. 74.1 ASHT 84.8 CALDS 72.5 bagADWIN 82.8 TA-SVM 68.9 DWM-NB 80.8 * tested on a subset
  • 10.
  • 11.
    No-Change classifier: Weatherclassifier Prediction for tomorrow: the same as today
  • 12.
    Electricity Dataset, Accuracy 01 2 3 4 ·104 0 20 40 60 80 100 Time, instances Accuracy,% No-Change VFDT Majority Class Naive Bayes
  • 13.
    Electricity Dataset, KappaStatistic 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances KappaStatistic,% No-Change VFDT Naive Bayes
  • 14.
    Characteristics of theElectricity Dataset 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ·104 20 30 40 50 60 Time, instances Classprior,%
  • 15.
    Characteristics of theElectricity Dataset 20 40 60 80 100 120 140 160 180 200 0 0.5 1 Lag, instances Autocorrelation
  • 16.
  • 17.
    New Evaluation forStream Classifiers Kappa Statistic p0: classifier’s prequential accuracy pc: probability that a chance classifier makes a correct prediction. κ statistic κ = p0 − pc 1 − pc κ = 1 if the classifier is always correct κ = 0 if the predictions coincide with the correct ones as often as those of the chance classifier
  • 18.
    New Evaluation forStream Classifiers Kappa Plus Statistic p0: classifier’s prequential accuracy pe: no-change classifier’s prequential accuracy κ+ statistic κ+ = p0 − pe 1 − pe κ+ = 1 if the classifier is always correct κ+ = 0 if the predictions coincide with the correct ones as often as those of the no-change classifier
  • 19.
    Electricity Market DatasetAccuracy 0 1 2 3 4 ·104 60 80 100 Time, instances Accuracy,% No-Change HAT Lev. Bagging
  • 20.
    Electricity Market Datasetκ 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances KappaStatistic,% No-Change HAT Lev. Bagging
  • 21.
    Electricity Market Datasetκ+ 0 1 2 3 4 ·104 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change HAT Lev. Bagging
  • 22.
    SWT: Temporally AugmentedClassifier SWT: meta strategy that builds meta instances by augmenting the original input attributes with the values of recent class labels from the past Pr[class is c] ≡ h(xt , ct− , . . . , ct−1 ) for the t-th test instance, where is the size of the sliding window over the most recent true labels.
  • 23.
    Electricity Market Datasetκ+ 0 1 2 3 4 ·104 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change SWT HAT SWT Lev. Bagging
  • 24.
    Electricity Market Datasetκ+ 0 1 2 3 4 ·104 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change HAT Lev. Bagging
  • 25.
    Electricity Market Datasetκ+ 0 1 2 3 4 ·104 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change SWT HAT SWT Lev. Bagging
  • 26.
    Forest Cover TypeDataset 0 2 4 ·105 60 80 100 Time, instances Accuracy,% No-Change HAT Lev. Bagging 0 2 4 ·105 0 20 40 60 80 100 Time, instances KappaStatistic,% No-Change HAT Lev. Bagging 0 2 4 ·105 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change HAT Lev. Bagging 0 2 4 ·105 0 20 40 60 80 100 Time, instances Accuracy,% No-Change SWT HAT SWT Lev. Bagging 0 2 4 ·105 0 20 40 60 80 100 Time, instances KappaStatistic,% No-Change SWT HAT SWT Lev. Bagging 0 2 4 ·105 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change SWT HAT SWT Lev. Bagging
  • 27.
    Conclusions Temporal dependence indata stream mining new κ+ measure a wrapper classifier SWT Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them
  • 28.
    Thanks! Pitfalls in BenchmarkingData Stream Classification and How to Avoid Them