Supervised Machine Learning Approaches for
Log-Based Anomaly Detection: A Case Study on the
Spirit Dataset
Bekkouche Mohammed Meski Melissa Khodja Yousra
Benslimane Sidi Mohammed
LabRI-SBA Laboratory, École Supérieure en Informatique, Sidi Bel
Abbes 22000, Algeria
TACC 2025
November 20-22, 2025
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 1 / 24
Structure
1 Introduction
2 Related Work
3 Dataset: Spirit
4 Methodology
5 Results and Analysis
6 Conclusion
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 2 / 24
Introduction
Introduction
Anomaly detection is essential for building secure and reliable
computer systems.
Increasing system complexity ⇒ higher risk of bugs and vulnerabilities.
Failures may lead to user dissatisfaction or substantial financial losses.
Logs are valuable resources:
Capture system events and states during runtime.
Provide insights into operational behavior.
Enable automated anomaly detection.
Manual log inspection is impractical ⇒ automated, ML-based
approaches are required.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 3 / 24
Introduction
Introduction
Machine learning for log anomaly detection:
Unsupervised methods: work without labels, but commonly prone to
false positives.
Supervised methods: higher accuracy when labeled data is available.
Spirit dataset:
Real system logs with ground-truth anomaly labels.
Underexplored in supervised learning studies.
Our contribution: Evaluation of four supervised models (SVM, DT,
RF, XGBoost) on five dataset versions with TF-IDF and Word2Vec.
Important findings:
Shorter windows improve detection.
Tree-based models perform best with TF-IDF; SVM with Word2Vec.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 4 / 24
Related Work
Related Work
Machine learning methods for log anomaly detection are commonly
divided into:
Supervised methods:
Require labeled training data.
Typically use classification algorithms (Decision Trees, SVMs, Random
Forests, XGBoost).
Recent advances include CNN-based classifiers and Transformer-based
approaches.
Unsupervised methods:
Do not require labeled data.
Detect anomalies as rare or unusual patterns in feature space.
Include PCA, Isolation Forest, Autoencoders, LSTMs (e.g., DeepLog,
LogAnomaly), and Transformer-based models.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 5 / 24
Related Work
Related Work
Semi-supervised approaches:
Use a small amount of labeled data to enhance anomaly detection.
Strategies include semi-supervised classifiers, integrating known
anomalies, or pseudo-labeling.
Datasets:
HDFS and Thunderbird are widely studied.
The Spirit dataset is significantly less explored, especially with
supervised learning.
Our focus:
Provide a detailed study of the Spirit dataset using supervised machine
learning.
Evaluate the effectiveness of traditional models in detecting abnormal
log sequences.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 6 / 24
Dataset: Spirit
Dataset: Spirit
# Log # Log
Grouping
# Log # Avg. seq. Training Data(80%) Testing Data(20%)
Messages Events sequences length # Log sequences # Anomaly # Log sequences # Anomaly
5,000,000 2,880
1 hour 1,173 4262.57 937 761(81.22%) 236 191(80.93%)
30 minutes 2,345 2,132.20 1,875 1,437(76.64%) 470 360(76.60%)
15 minutes 4,690 1,066.10 3,751 2,747(73.23%) 939 687(73.16%)
5 minutes 14,068 355.42 11,253 7901(70.21%) 2,815 1976(70.20%)
1 minute 70,327 71.10 56261 38486(68.41%) 14066 9622(68.41%)
The Spirit supercomputing system at Sandia National Laboratories (1,028
processors, 1,024 GB memory).
Original dataset: over 172 million log messages, each labeled as normal or
anomalous.
Subset used:
1 GB of continuous log lines (first 5M entries).
764,891 anomalies ⇒ anomaly ratio ≈ 15.3%.
Event templates: 2,880.
Sequence generation: fixed time windows (1m, 5m, 15m, 30m, 1h).
Shorter windows ⇒ more sequences, shorter length.
Labeling rule: sequence is anomalous if it contains ≥ 1 anomalous message.
Dataset split: 80% training, 20% testing (uniform distribution of
normal/anomalous).
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 7 / 24
Methodology
Log-Based Anomaly Detection System
Log Parsing
Extract structured events from raw logs
(e.g., using Drain or Spell parsers)
Feature Extraction / Engineering
Transform parsed logs into numerical vectors
(e.g., TF-IDF, Word2Vec embeddings, frequency counts)
Model Training
Train supervised learning models using labeled data
(e.g., SVM, Decision Tree, Random Forest)
Anomaly Detection
Apply the trained model to detect abnormal patterns
(e.g., binary classification: normal vs anomaly)
Figure: Process of log-based anomaly detection using supervised machine learning
models
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 8 / 24
Methodology
Log-Based Anomaly Detection System
Log parsing:
Transforms unstructured log messages into a structured format.
Produces an event template (constant part) + parameters (variable
parts).
Example:
Raw: sendmail[17795]: j0170NVv017795: from=root,
size=117, class=0, nrcpts=1,
msgid=<200501010700.j0170NVv017795@sn209>,
relay=#2#@localhost
Template: <*> <*> from=root, <*> class=0, nrcpts=1, <*>
relay=#2#@localhost
In our work: a pre-parsed version of the Spirit dataset is used, with
fields such as:
Timestamp, Event ID, Event Template, Anomaly Label
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 9 / 24
Methodology
Log-Based Anomaly Detection System
Feature Engineering:
Goal: transform log entries into numerical representations for
machine learning.
Logs are grouped into sequences using timestamps:
Each sequence captures a snapshot of system behavior.
This step is crucial: quality of features strongly impacts anomaly
detection performance.
In the Spirit dataset:
No explicit identifiers (e.g., session IDs).
Grouping performed using timestamp-based strategies.
Strategies:
Fixed window partitioning.
Sliding window partitioning (defined by window size + step size).
Our work: adopt the fixed time window approach with
non-overlapping intervals.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 10 / 24
Methodology
Log-Based Anomaly Detection System
Feature Engineering:
Algorithm 1: Time Window Log Grouping
Input : log dataset: list of log entries with timestamp, event ID,
event template, and anomaly label (True or False)
time window: fixed time interval (e.g., 15 minutes)
Output: log sequences: list of log sequences grouped by time window
1 Sort log dataset by timestamp in ascending order; Initialize
log sequences ← [ ];
2 Initialize current sequence ← [ ]; Set window start ← timestamp of
first log entry;
3 Set window end ← window start + time window;
4 label seq ← True; // True means the sequence is normal
5 foreach log entry in log dataset do
6 if log entry.timestamp < window end then
7 Append log entry.event ID or event template to
current sequence;
8 current sequence.label ← label seq ∧ log entry.label;
9 end
10 else
11 Append current sequence to log sequences;
12 current sequence ← [log entry.event ID] or
[log entry.event template];
13 current sequence.label ← log entry.label;
14 window start ← log entry.timestamp;
15 window end ← window start + time window;
16 end
17 end
18 if current sequence is not empty then
19 Append current sequence to log sequences;
20 end
21 return log sequences;
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 11 / 24
Methodology
Log-Based Anomaly Detection System
Feature Engineering:
Each log sequence can be represented using:
Event Templates
Event IDs (one ID per template, 2,880 in Spirit dataset)
The representation choice depends on the feature extraction
technique.
Techniques used in this study:
TF-IDF: sequences represented with event IDs.
Word2Vec: sequences represented with event templates.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 12 / 24
Methodology
Log-Based Anomaly Detection System
Feature Engineering: TF-IDF
Log sequences must be converted into numerical vectors to be used
with ML models.
TF-IDF measures the importance of an event ID within a sequence
compared to across all sequences.
Each sequence is transformed into a sparse vector of size 2, 880
(number of unique event IDs).
Formula:
TF-IDF(ei , sj ) = TF(ei , sj ) × log

N
n

N: total number of log sequences
n: number of sequences containing ei
TF(ei , sj ): frequency of ei in sj
Output: a 2,880-dimensional sparse vector per sequence.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 13 / 24
Methodology
Log-Based Anomaly Detection System
Feature Engineering: Word2Vec
Word2Vec learns dense vector embeddings for event templates based
on their context in sequences.
Trained on all log sequences, treating each event template as a
“word”.
Learns semantic similarities: events appearing in similar contexts get
similar vectors.
Representation:
Each template → 500-dimensional embedding.
Sequence → average of its templates’ embeddings.
Output: a 500-dimensional dense vector encoding semantic
patterns.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 14 / 24
Methodology
Log-Based Anomaly Detection System
Classifiers
Based on the labeled feature vectors, supervised models are trained to
distinguish between normal and anomalous log sequences.
Models used:
SVM – boundary-based, finds optimal hyperplane, effective on dense
data (e.g., Word2Vec).
DT – rule-based, interpretable, works well on sparse data (e.g.,
TF-IDF).
RF – ensemble of decision trees, improves generalization and
robustness.
XGBoost – gradient-boosted decision trees, sequential correction of
errors, efficient and scalable.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 15 / 24
Methodology
Log-Based Anomaly Detection System
Training and Evaluation
Datasets (Spirit logs) are split into:
80% training set
20% test set
Uniform split is applied that ensures consistent proportion of
anomalies across sets.
Models are trained on training data and evaluated on test data.
Evaluation metrics:
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1-Score = Harmonic mean of Precision and Recall
Performance is compared across:
Feature extraction methods (TF-IDF vs. Word2Vec).
Different dataset versions (various time window sizes).
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 16 / 24
Results and Analysis
Results and Analysis
Table: Precision, Recall, and F1-Score for SVM, DT, RF, and XB supervised
learning approaches in detecting anomalies across different versions of the Spirit
log dataset, using TF-IDF and Word2Vec for feature extraction. These versions
are generated using a fixed-size window grouping strategy with varying time
window lengths.
Model
(With TF-IDF/With Word2Vec)
1 hour 30 minutes 15 minutes
Precision Recall F1-Score Precision Recall F1-Score Precision Recall F1-Score
SVM 0.724/0.699 0.353/0.356 0.472/0.472 0.742/0.721 0.587/0.515 0.655/0.601 0.743/0.753 0.742/0.697 0.742/0.724
DT 0.841/0.716 0.227/0.390 0.334/0.505 0.894/0.713 0.470/0.509 0.605/0.594 1.000/0.708 0.622/0.612 0.767/0.656
RF 0.733/0.714 0.387/0.393 0.507/0.507 0.782/0.736 0.578/0.572 0.665/0.644 0.851/0.736 0.707/0.706 0.773/0.721
XB 0.691/0.678 0.340/0.309 0.456/0.424 0.736/0.746 0.550/0.581 0.630/0.653 1.000/0.744 0.620/0.741 0.765/0.743
Model
(With TF-IDF/With Word2Vec)
5 minutes 1 minute
Precision Recall F1-Score Precision Recall F1-Score
SVM 0.746/0.754 0.861/0.862 0.800/0.804 0.782/0.986 0.959/0.948 0.860/0.966
DT 1.000/0.751 0.820/0.849 0.901/0.797 1.000/0.948 0.947/0.935 0.973/0.941
RF 0.843/0.751 0.870/0.857 0.857/0.801 0.851/0.961 0.965/0.934 0.904/0.947
XB 1.000/0.755 0.820/0.864 0.901/0.806 1.000/0.858 0.947/0.948 0.973/0.901
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 17 / 24
Results and Analysis
Results and Analysis
Comparison of SVM, DT, RF, and XB.
Evaluation across:
Five fixed time windows: 1h, 30m, 15m, 5m, 1m
Two feature extraction methods: TF-IDF, Word2Vec
Metrics used: Precision, Recall, F1-Score.
Important Observation:
Recall and F1-Score improve with shorter time windows.
Smaller windows → more sequences with fewer events.
Anomalies become easier to isolate and distinguish.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 18 / 24
Results and Analysis
Results and Analysis
1h 30m 15m 5m 1m
0.4
0.6
0.8
1
Time Window
F1-Score
SVM (TF-IDF)
SVM (Word2Vec)
DT (TF-IDF)
DT (Word2Vec)
RF (TF-IDF)
RF (Word2Vec)
XB (TF-IDF)
XB (Word2Vec)
Figure: F1-Score evolution of SVM, DT, RF, and XB across different time
windows using TF-IDF and Word2Vec.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 19 / 24
Results and Analysis
Results and Analysis
SVM
Moderate overall performance.
Improves with shorter windows and Word2Vec.
Best F1-Score: 0.966 (1 min, Word2Vec).
Weak at longer windows (0.472 at 1h).
Decision Tree (DT)
Excels at short windows with TF-IDF.
Perfect precision (1.000) at 5m and 1m (TF-IDF).
Best F1-Score: 0.973 (1 min, TF-IDF).
Struggles with recall at long windows (0.227 at 1h).
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 20 / 24
Results and Analysis
Results and Analysis
Random Forest (RF)
Strong and stable across all settings.
Best F1-Score: 0.947 (1 min, Word2Vec).
Ensemble nature improves robustness.
XGBoost (XB)
Competitive, especially with TF-IDF.
Best F1-Score: 0.973 (1 min, TF-IDF).
Slightly below RF on Word2Vec at 1m (0.901 vs 0.947).
General Insights
TF-IDF favors tree-based models (DT, RF, XB).
Word2Vec benefits SVM.
Shorter windows consistently improve performance.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 21 / 24
Conclusion
Conclusion
Comparative evaluation on the Spirit dataset:
Models: SVM, Decision Tree (DT), Random Forest (RF), XGBoost
(XB).
Feature extraction: TF-IDF and Word2Vec.
Log segmentation with time-based fixed windows.
Experimental Results:
Shorter windows ⇒ improved detection accuracy.
Best performance at 1-minute window.
DT and XB (TF-IDF): F1-Score = 0.973.
SVM (Word2Vec): F1-Score = 0.966.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 22 / 24
Conclusion
Conclusion
Important Findings:
Tree-based supervised models + structured features are highly
effective.
Adjusting log grouping parameters greatly enhances anomaly
detection.
Future Work:
Explore ensemble techniques (voting among RF, DT, XB).
Investigate hybrid features (TF-IDF + Word2Vec).
Apply Explainable AI to interpret model decisions.
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 23 / 24
Conclusion
Thank You
Questions?
Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 24 / 24

Supervised Machine Learning Approaches for Log-Based Anomaly Detection: A Case Study on the Spirit Dataset

  • 1.
    Supervised Machine LearningApproaches for Log-Based Anomaly Detection: A Case Study on the Spirit Dataset Bekkouche Mohammed Meski Melissa Khodja Yousra Benslimane Sidi Mohammed LabRI-SBA Laboratory, École Supérieure en Informatique, Sidi Bel Abbes 22000, Algeria TACC 2025 November 20-22, 2025 Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 1 / 24
  • 2.
    Structure 1 Introduction 2 RelatedWork 3 Dataset: Spirit 4 Methodology 5 Results and Analysis 6 Conclusion Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 2 / 24
  • 3.
    Introduction Introduction Anomaly detection isessential for building secure and reliable computer systems. Increasing system complexity ⇒ higher risk of bugs and vulnerabilities. Failures may lead to user dissatisfaction or substantial financial losses. Logs are valuable resources: Capture system events and states during runtime. Provide insights into operational behavior. Enable automated anomaly detection. Manual log inspection is impractical ⇒ automated, ML-based approaches are required. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 3 / 24
  • 4.
    Introduction Introduction Machine learning forlog anomaly detection: Unsupervised methods: work without labels, but commonly prone to false positives. Supervised methods: higher accuracy when labeled data is available. Spirit dataset: Real system logs with ground-truth anomaly labels. Underexplored in supervised learning studies. Our contribution: Evaluation of four supervised models (SVM, DT, RF, XGBoost) on five dataset versions with TF-IDF and Word2Vec. Important findings: Shorter windows improve detection. Tree-based models perform best with TF-IDF; SVM with Word2Vec. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 4 / 24
  • 5.
    Related Work Related Work Machinelearning methods for log anomaly detection are commonly divided into: Supervised methods: Require labeled training data. Typically use classification algorithms (Decision Trees, SVMs, Random Forests, XGBoost). Recent advances include CNN-based classifiers and Transformer-based approaches. Unsupervised methods: Do not require labeled data. Detect anomalies as rare or unusual patterns in feature space. Include PCA, Isolation Forest, Autoencoders, LSTMs (e.g., DeepLog, LogAnomaly), and Transformer-based models. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 5 / 24
  • 6.
    Related Work Related Work Semi-supervisedapproaches: Use a small amount of labeled data to enhance anomaly detection. Strategies include semi-supervised classifiers, integrating known anomalies, or pseudo-labeling. Datasets: HDFS and Thunderbird are widely studied. The Spirit dataset is significantly less explored, especially with supervised learning. Our focus: Provide a detailed study of the Spirit dataset using supervised machine learning. Evaluate the effectiveness of traditional models in detecting abnormal log sequences. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 6 / 24
  • 7.
    Dataset: Spirit Dataset: Spirit #Log # Log Grouping # Log # Avg. seq. Training Data(80%) Testing Data(20%) Messages Events sequences length # Log sequences # Anomaly # Log sequences # Anomaly 5,000,000 2,880 1 hour 1,173 4262.57 937 761(81.22%) 236 191(80.93%) 30 minutes 2,345 2,132.20 1,875 1,437(76.64%) 470 360(76.60%) 15 minutes 4,690 1,066.10 3,751 2,747(73.23%) 939 687(73.16%) 5 minutes 14,068 355.42 11,253 7901(70.21%) 2,815 1976(70.20%) 1 minute 70,327 71.10 56261 38486(68.41%) 14066 9622(68.41%) The Spirit supercomputing system at Sandia National Laboratories (1,028 processors, 1,024 GB memory). Original dataset: over 172 million log messages, each labeled as normal or anomalous. Subset used: 1 GB of continuous log lines (first 5M entries). 764,891 anomalies ⇒ anomaly ratio ≈ 15.3%. Event templates: 2,880. Sequence generation: fixed time windows (1m, 5m, 15m, 30m, 1h). Shorter windows ⇒ more sequences, shorter length. Labeling rule: sequence is anomalous if it contains ≥ 1 anomalous message. Dataset split: 80% training, 20% testing (uniform distribution of normal/anomalous). Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 7 / 24
  • 8.
    Methodology Log-Based Anomaly DetectionSystem Log Parsing Extract structured events from raw logs (e.g., using Drain or Spell parsers) Feature Extraction / Engineering Transform parsed logs into numerical vectors (e.g., TF-IDF, Word2Vec embeddings, frequency counts) Model Training Train supervised learning models using labeled data (e.g., SVM, Decision Tree, Random Forest) Anomaly Detection Apply the trained model to detect abnormal patterns (e.g., binary classification: normal vs anomaly) Figure: Process of log-based anomaly detection using supervised machine learning models Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 8 / 24
  • 9.
    Methodology Log-Based Anomaly DetectionSystem Log parsing: Transforms unstructured log messages into a structured format. Produces an event template (constant part) + parameters (variable parts). Example: Raw: sendmail[17795]: j0170NVv017795: from=root, size=117, class=0, nrcpts=1, msgid=<200501010700.j0170NVv017795@sn209>, relay=#2#@localhost Template: <*> <*> from=root, <*> class=0, nrcpts=1, <*> relay=#2#@localhost In our work: a pre-parsed version of the Spirit dataset is used, with fields such as: Timestamp, Event ID, Event Template, Anomaly Label Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 9 / 24
  • 10.
    Methodology Log-Based Anomaly DetectionSystem Feature Engineering: Goal: transform log entries into numerical representations for machine learning. Logs are grouped into sequences using timestamps: Each sequence captures a snapshot of system behavior. This step is crucial: quality of features strongly impacts anomaly detection performance. In the Spirit dataset: No explicit identifiers (e.g., session IDs). Grouping performed using timestamp-based strategies. Strategies: Fixed window partitioning. Sliding window partitioning (defined by window size + step size). Our work: adopt the fixed time window approach with non-overlapping intervals. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 10 / 24
  • 11.
    Methodology Log-Based Anomaly DetectionSystem Feature Engineering: Algorithm 1: Time Window Log Grouping Input : log dataset: list of log entries with timestamp, event ID, event template, and anomaly label (True or False) time window: fixed time interval (e.g., 15 minutes) Output: log sequences: list of log sequences grouped by time window 1 Sort log dataset by timestamp in ascending order; Initialize log sequences ← [ ]; 2 Initialize current sequence ← [ ]; Set window start ← timestamp of first log entry; 3 Set window end ← window start + time window; 4 label seq ← True; // True means the sequence is normal 5 foreach log entry in log dataset do 6 if log entry.timestamp < window end then 7 Append log entry.event ID or event template to current sequence; 8 current sequence.label ← label seq ∧ log entry.label; 9 end 10 else 11 Append current sequence to log sequences; 12 current sequence ← [log entry.event ID] or [log entry.event template]; 13 current sequence.label ← log entry.label; 14 window start ← log entry.timestamp; 15 window end ← window start + time window; 16 end 17 end 18 if current sequence is not empty then 19 Append current sequence to log sequences; 20 end 21 return log sequences; Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 11 / 24
  • 12.
    Methodology Log-Based Anomaly DetectionSystem Feature Engineering: Each log sequence can be represented using: Event Templates Event IDs (one ID per template, 2,880 in Spirit dataset) The representation choice depends on the feature extraction technique. Techniques used in this study: TF-IDF: sequences represented with event IDs. Word2Vec: sequences represented with event templates. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 12 / 24
  • 13.
    Methodology Log-Based Anomaly DetectionSystem Feature Engineering: TF-IDF Log sequences must be converted into numerical vectors to be used with ML models. TF-IDF measures the importance of an event ID within a sequence compared to across all sequences. Each sequence is transformed into a sparse vector of size 2, 880 (number of unique event IDs). Formula: TF-IDF(ei , sj ) = TF(ei , sj ) × log N n N: total number of log sequences n: number of sequences containing ei TF(ei , sj ): frequency of ei in sj Output: a 2,880-dimensional sparse vector per sequence. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 13 / 24
  • 14.
    Methodology Log-Based Anomaly DetectionSystem Feature Engineering: Word2Vec Word2Vec learns dense vector embeddings for event templates based on their context in sequences. Trained on all log sequences, treating each event template as a “word”. Learns semantic similarities: events appearing in similar contexts get similar vectors. Representation: Each template → 500-dimensional embedding. Sequence → average of its templates’ embeddings. Output: a 500-dimensional dense vector encoding semantic patterns. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 14 / 24
  • 15.
    Methodology Log-Based Anomaly DetectionSystem Classifiers Based on the labeled feature vectors, supervised models are trained to distinguish between normal and anomalous log sequences. Models used: SVM – boundary-based, finds optimal hyperplane, effective on dense data (e.g., Word2Vec). DT – rule-based, interpretable, works well on sparse data (e.g., TF-IDF). RF – ensemble of decision trees, improves generalization and robustness. XGBoost – gradient-boosted decision trees, sequential correction of errors, efficient and scalable. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 15 / 24
  • 16.
    Methodology Log-Based Anomaly DetectionSystem Training and Evaluation Datasets (Spirit logs) are split into: 80% training set 20% test set Uniform split is applied that ensures consistent proportion of anomalies across sets. Models are trained on training data and evaluated on test data. Evaluation metrics: Precision = TP / (TP + FP) Recall = TP / (TP + FN) F1-Score = Harmonic mean of Precision and Recall Performance is compared across: Feature extraction methods (TF-IDF vs. Word2Vec). Different dataset versions (various time window sizes). Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 16 / 24
  • 17.
    Results and Analysis Resultsand Analysis Table: Precision, Recall, and F1-Score for SVM, DT, RF, and XB supervised learning approaches in detecting anomalies across different versions of the Spirit log dataset, using TF-IDF and Word2Vec for feature extraction. These versions are generated using a fixed-size window grouping strategy with varying time window lengths. Model (With TF-IDF/With Word2Vec) 1 hour 30 minutes 15 minutes Precision Recall F1-Score Precision Recall F1-Score Precision Recall F1-Score SVM 0.724/0.699 0.353/0.356 0.472/0.472 0.742/0.721 0.587/0.515 0.655/0.601 0.743/0.753 0.742/0.697 0.742/0.724 DT 0.841/0.716 0.227/0.390 0.334/0.505 0.894/0.713 0.470/0.509 0.605/0.594 1.000/0.708 0.622/0.612 0.767/0.656 RF 0.733/0.714 0.387/0.393 0.507/0.507 0.782/0.736 0.578/0.572 0.665/0.644 0.851/0.736 0.707/0.706 0.773/0.721 XB 0.691/0.678 0.340/0.309 0.456/0.424 0.736/0.746 0.550/0.581 0.630/0.653 1.000/0.744 0.620/0.741 0.765/0.743 Model (With TF-IDF/With Word2Vec) 5 minutes 1 minute Precision Recall F1-Score Precision Recall F1-Score SVM 0.746/0.754 0.861/0.862 0.800/0.804 0.782/0.986 0.959/0.948 0.860/0.966 DT 1.000/0.751 0.820/0.849 0.901/0.797 1.000/0.948 0.947/0.935 0.973/0.941 RF 0.843/0.751 0.870/0.857 0.857/0.801 0.851/0.961 0.965/0.934 0.904/0.947 XB 1.000/0.755 0.820/0.864 0.901/0.806 1.000/0.858 0.947/0.948 0.973/0.901 Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 17 / 24
  • 18.
    Results and Analysis Resultsand Analysis Comparison of SVM, DT, RF, and XB. Evaluation across: Five fixed time windows: 1h, 30m, 15m, 5m, 1m Two feature extraction methods: TF-IDF, Word2Vec Metrics used: Precision, Recall, F1-Score. Important Observation: Recall and F1-Score improve with shorter time windows. Smaller windows → more sequences with fewer events. Anomalies become easier to isolate and distinguish. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 18 / 24
  • 19.
    Results and Analysis Resultsand Analysis 1h 30m 15m 5m 1m 0.4 0.6 0.8 1 Time Window F1-Score SVM (TF-IDF) SVM (Word2Vec) DT (TF-IDF) DT (Word2Vec) RF (TF-IDF) RF (Word2Vec) XB (TF-IDF) XB (Word2Vec) Figure: F1-Score evolution of SVM, DT, RF, and XB across different time windows using TF-IDF and Word2Vec. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 19 / 24
  • 20.
    Results and Analysis Resultsand Analysis SVM Moderate overall performance. Improves with shorter windows and Word2Vec. Best F1-Score: 0.966 (1 min, Word2Vec). Weak at longer windows (0.472 at 1h). Decision Tree (DT) Excels at short windows with TF-IDF. Perfect precision (1.000) at 5m and 1m (TF-IDF). Best F1-Score: 0.973 (1 min, TF-IDF). Struggles with recall at long windows (0.227 at 1h). Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 20 / 24
  • 21.
    Results and Analysis Resultsand Analysis Random Forest (RF) Strong and stable across all settings. Best F1-Score: 0.947 (1 min, Word2Vec). Ensemble nature improves robustness. XGBoost (XB) Competitive, especially with TF-IDF. Best F1-Score: 0.973 (1 min, TF-IDF). Slightly below RF on Word2Vec at 1m (0.901 vs 0.947). General Insights TF-IDF favors tree-based models (DT, RF, XB). Word2Vec benefits SVM. Shorter windows consistently improve performance. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 21 / 24
  • 22.
    Conclusion Conclusion Comparative evaluation onthe Spirit dataset: Models: SVM, Decision Tree (DT), Random Forest (RF), XGBoost (XB). Feature extraction: TF-IDF and Word2Vec. Log segmentation with time-based fixed windows. Experimental Results: Shorter windows ⇒ improved detection accuracy. Best performance at 1-minute window. DT and XB (TF-IDF): F1-Score = 0.973. SVM (Word2Vec): F1-Score = 0.966. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 22 / 24
  • 23.
    Conclusion Conclusion Important Findings: Tree-based supervisedmodels + structured features are highly effective. Adjusting log grouping parameters greatly enhances anomaly detection. Future Work: Explore ensemble techniques (voting among RF, DT, XB). Investigate hybrid features (TF-IDF + Word2Vec). Apply Explainable AI to interpret model decisions. Bekkouche Mohammed (ESI-SBA) Log-based anomaly detection TACC 2025 23 / 24
  • 24.
    Conclusion Thank You Questions? Bekkouche Mohammed(ESI-SBA) Log-based anomaly detection TACC 2025 24 / 24