System logs provide important insights into the behavior and reliability of computing systems. Detecting anomalies through log analysis is essential for identifying failures and security issues. However, manual inspection is time-consuming and prone to error, particularly in large-scale environments. Machine learning techniques aim to automate anomaly detection and improve its accuracy. While most existing approaches are unsupervised due to the lack of labeled data, supervised methods can achieve higher accuracy when labeled logs are available by learning from both normal and abnormal examples.
The Spirit dataset is a real-world labeled log dataset that enables the evaluation of supervised learning approaches. In this work, we present a comparative evaluation of four supervised models: Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and XGBoost (XB), applied to the Spirit dataset. We examine the effect of varying time-based fixed window sizes used to group log messages and compare two feature extraction techniques: TF-IDF (Term Frequency–Inverse Document Frequency) and Word2Vec.
Results show that smaller window sizes improve detection performance by producing more sequences and enabling better identification of anomalous behavior. Tree-based models (DT, RF, XB) generally achieve better performance than SVM, especially when using TF-IDF features. SVM performs well with Word2Vec when short time windows are used. The 1-minute window configuration yields the best results across all models and feature types. These findings demonstrate the effectiveness of supervised models for log-based anomaly detection and highlight the importance of selecting appropriate log grouping strategies and feature representations.