The document provides a comprehensive overview of anomaly detection, explaining its definition, types of anomalies, and various detection methods such as supervised, semi-supervised, and unsupervised techniques. It highlights the importance of understanding data context, types of input data, and relationships between instances for effective anomaly detection in applications like credit card fraud and medical data. Additionally, it discusses the use of neural networks, statistical methods, and graph-based techniques for identifying different kinds of anomalies.
Introduction by Ken Graham; overview of anomaly detection definitions, importance, methods, and applications.
Defining anomalies with various applications including fraud and intrusion detection; challenges in defining normal regions and handling imbalanced data.
Discussion on types of data (binary, categorical, continuous) and the three major anomaly types: point, contextual, and collective.
Three types of anomaly detection methods: supervised, semi-supervised, unsupervised, along with applications in credit card, cellphone fraud, and healthcare.
Overview of classification and various methods including RNNs, LSTMs, autoencoders, k-nearest neighbors, and statistical methods for anomaly detection.
Methods for detecting contextual and collective anomalies, emphasizing the importance of context and relationships among data instances.
Final note on understanding problem context before choosing anomaly detection methods.
What we’ll cover
•Whatis Anomaly Detection?
•What’s an anomaly?
•Detecting Anomalies
•Methods and Applications
3.
What is AnomalyDetection?
credit card fraud insurance fraud
image processing intrusion detection (cybersecurity)
text analysis sensor networks
insider threats industrial damage
• Trying to find patterns in data that are different from the
expected.
• Some applications:
11.
Detecting Anomalies
So, howwould we detect some of these? Let’s take a
naive approach.
1. Define a “normal” region.
2. Observations not in the “normal” region are
anomalies.
12.
Will this work?
•Boundary hard to define
• Definitions change over time
• Definitions are domain-dependent
• Labeled training data is hard to find
• Training data, is often heavily imbalanced
13.
Types of Data
•Collection of data instances
• a data instance has a set of attributes
• Attributes can be of different types
• binary
• categorical
• continuous
14.
• The attributeshelp determine the detection
method.
• The relationship between data instances is
important.
• Most existing anomaly detection techniques don’t
assume any particular relationship between the
data instances. We have to identify relationships.
15.
Types of inputdata
• Sequential
• time-series, sequences of symbols
• Spatial
• each data instance is related to its neighbors
• images, vehicular traffic
• Graph
• data instances are nodes in a graph or network
16.
Three Types ofAnomalies
• 😃 There are only three.
• 😔 No, that doesn’t make it any easier to detect
them.
• Point anomaly
• Contextual anomaly
• Collective anomaly
17.
Point Anomaly
• Generallya single data instance.
• Anomalous compared to the entirety of the data
• Most research focuses on point anomalies
• Can occur in any dataset
18.
Contextual Anomaly
• Anomalousin relation to a specific context
• Context comes from how data is structured
• Context has to be specified as a part of the problem
formulation
• Each data instance can be defined using two sets of
attributes:
• contextual: determines the context (e.g. lat/long or time)
• behavioral: non-contextual characteristics of an instance
19.
• Anomalous behavioris determined by the
behavioral attributes within a specific context
• A data instance might be a contextual anomaly in a
given context, but a data instance with identical
behavioral attributes could be considered normal in
a different context.
20.
• Contextual anomaliesare generally found in time-
series data. Example:
• Avg monthly temp. of an area over last few years.
• 35 degrees F in winter might be normal
• 35 degrees F in summer in same place is
anomalous
22.
• Another example:Credit card fraud
• Contextual attribute: time of purchase.
• $100 average weekly shopping bill, except during
the Christmas week, when it reaches $1000.
• A new purchase of $1000 in July would be
considered a contextual anomaly, since it’s
unusual for July.
• The same amount spent during Christmas week
will be considered normal.
23.
Collective Anomaly
• Agroup of data instances are anomalous
• They need not be anomalies by themselves
• Again, the relationship between the data matters
• Point | Collective problem + context = Contextual
problem
24.
Three Types of
Anomaly DetectionMethods
• Supervised
• Use labeled training data to build a predictive model
• Imbalanced data (many normal, few anomalies)
• Semi-Supervised
• Only need normal data
• Model learns how to classify normal data
• Unsupervised (no labeled data)
Credit Card Fraud
Dataused
• user ID
• amount spent
• time between consecutive card usage
Credit card companies have complete, labeled data and
user profiles
27.
Kinds of anomalies
•point anomalies in transaction records
◦high payments
◦items never before purchased by the user
◦high rate of purchase
• contextual anomalies
◦User defines the context
▪ Each credit card user is profiled based on card usage
history.
▪ Each new transaction compared to user profile,
flagged if it doesn’t match
◦Location defines the context
▪ Detects anomalies among transactions at a specific
geographic location.
28.
Cellphone Fraud
Data used
•Call data records (CDRs)
• CDR = vector of features
◦continuous (e.g., CALL-DURATION)
◦discrete (e.g., CALLING-CITY).
Kinds of anomalies
• point anomalies from aggregated CDR data
◦aggregated by time, user, or area
◦high volume of calls
◦calls made to unlikely destinations
29.
Insider Trading
Data used
•Option trading data
• Stock trading data
• News
• Data is time-series or otherwise temporally sequenced.
30.
Medical
• Patient records
◦ ElectronicHealth Records (EHRs)
◦demographics, medical history, medication and allergies,
immunization status, laboratory test results, radiology images,
vital signs, personal statistics like age and weight, and
billing information
◦ Electrocardiograms (ECG) and Electroencephalograms
(EEG)
• Temporal and/or spatial data
31.
Types of anomalies
• pointanomalies
◦ e.g., abnormal patient condition, instrumentation errors,
recording errors
• contextual
◦ Disease outbreaks can be contextual anomalies
(e.g. geo-temporal pattern of viral infections)
• collective
33.
• False negativescan cost $$$ and lives
• A colleague (David Gilmore) said:
• "Precision saves money, recall saves lives."
Classification
• Train amodel from labeled data (supervised)
• Use the model to classify other data
• Many different ways to do this
◦SVMs, PGMs, Rules
◦Neural nets have shown much promise
▪ LSTMs learn features across a sequence
▪ Autoencoders reconstruct the data, reconstruction error tells
you if data is anomalous
36.
Recurrent Neural Netsand
LSTMs
Now we’ll look at a method or two for time-series data.
• Method needs to learn patterns present in the sequence
• Sequences can have patterns of unknown length
• Recurrent neural networks (RNNs)[1][2] let you address
sequences of data
37.
• Detect deviationsfrom normalcy
• Steps
◦Train the NN to predict several time steps into the future
◦Each point in the sequence has several corresponding
predicted values made at different points in the past,
resulting in multiple error values.
◦Compute error distribution
• More generally, to detect anomalies in a time series
◦Anomalous if prediction error is larger than expected
◦Can pick an error threshold, e.g. 2 std. dev. from the mean
• Train theautoencoder.
• If the data is sequential, you can incorporate RNNs
or LSTMs.
• Use the model to reconstruct the input.
• If the reconstruction error is above some threshold,
label it as an anomaly
40.
Nearest-Neighbor Methods
Assumption
• Normaldata are close together, while anomalies are far away
Two Methods
1. Anomaly score is distance to kth nearest neighbor.
2. Anomaly score is the density of the neighborhood of each
point
• Distance metric affects computational complexity
• Easy to adapt to different problem domain. Just define the
distance metric
41.
Statistical Methods
• Assumption
•Normal data lies in high probability regions,
anomalies in low probability regions
• Parametric and non-parametric methods
42.
Parametric
• Assumes normaldata is distributed according to a parametric
distribution
• Anomaly score is inverse of the PDF
• Or, use a hypothesis test. Anomaly score can be test statistic
43.
Examples:
• Gaussian models=> maximum likelihood estimation (MLE),
Grubb’s test and variants
• Regression models => ARIMA, ARMA
• mixtures of models
◦Assume each data point has prob. p of being an anomaly
◦N = PDF of normal data
◦A = PDF of anomalies (assume to be uniform)
◦D = PDF of all the data = pA + (1-p)N
◦Start with all points in N
◦Anomaly score comes from how much the distributions
change if you move point to A.
44.
Non-parametric
• Histogram models
◦Doestest instance fit into an existing bin?
◦Or, how determine score from the bin in which it lands
• Kernel methods estimate the data PDF and are similar to
parametric methods
45.
Spectral Methods
Assumption
• "Datacan be embedded into a lower dimensional subspace
in which normal instances and anomalies appear significantly
different.” - Anomaly Detection: A Survey
Main idea:
Find a subspace where the anomalies are easy to see and
project data onto it.
46.
Methods
• Unsupervised orsemi-supervised
• PCA
◦Project data along low variance principal components.
Anomaly projections will be high
◦In graphs, PCA on a graph’s adjacency matrix at different
points in time, differences in principal components determines
anomaly status
• Errors in Compact Matrix Decomposition (CMD) of a graph’s
adjacency matrix determined an anomalous graph
• PCA can be expensive
47.
Contextual Anomalies
Contextual attributesare key
• sequential: position in sequence is the context
◦time-series
◦event data (timestamped)
▪ inter-arrival time between events can be uneven
• spatial: location is the context
• graphs: the edges between data instance (the nodes) are the
context
• profiles (user defines context, like for credit card fraud)
48.
Contextual Methods
• Convertto a point anomaly problem
• 1. identify a context for a data instance
• 2. compute anomaly score within the context with
a point anomaly method
• Use the structure of the data when breaking data
into contexts is hard (time-series and sequences)
49.
• time-series
◦regression, RNNs
•sequences
◦Use events occurring before a particular time to predict the
event occurring at that time.
◦If the prediction doesn't match the actual event, it's labeled rare.
◦Finite State Automata (FSA) and Hidden Markov Models
(HMMs)
to compute conditional probabilities for events in the sequence
based on previous events.
◦Model event sequence as a Poisson process
• graphs
50.
Collective Anomalies
• Hardestto detect because theirs is collective behavior.
• Relationship between data points is important
◦Sequential => find an anomalous subsequence
▪ lots of research here b/c lots of time-series and
event sequence data in the wild
◦Spatial => find an anomalous subregion
▪ image/video processing
◦Graph => find an anomalous subgraph
◦The task is to find an anomalous subset
51.
Detecting Collective
Sequential Anomalies
Reduceto point anomaly problem:
• transform subsequences and then use a point anomaly method
• FSA, Markov Models, HMMs, CRFs for symbols
Neural Nets would be powerful here
• RNNs + LSTMs + Autoencoders: Could use a sequence to
sequence model on the subsequences and compute
reconstruction error
• For every example we’ve looked at that used FSA or HMMs,
you could use neural nets instead
52.
Detecting Collective Spatial
Anomalies
•Most work here has been on images
• Anomaly detection in videos would likely be a combination of
techniques for spatial and sequential anomalies (collective or
otherwise).
◦Video = sequence of images + an audio stream
• Convolutional neural networks (CNNs) have been used for
anomaly detection in images
◦Fully Convolutional Neural Network for Fast Anomaly
Detection in Crowded Scenes (2016): https://arxiv.org/abs/
1609.00866
53.
Most important thing…
•Understand your problem before picking a method.
• Just because a method is the most accurate doesn’t
automatically make it the best solution for your problem.