Other classification methods in data mining



Classification



Classification is the process of finding a model that
describes and distinguishes data classes or concept .
for the purpose of being able to use the model to
predict the class of objects whose class label is
unknown.



 predicts categorical class labels (discrete or

nominal)

 classifies data (constructs a model) based on the

training set and the values (class labels) in a
classifying attribute and uses it in classifying new
data
2

Classification
Algorithms

Training
Data

name

age

Mike
Mary
Bill
Jim
Dave
Anne

young

incomeloan decision

low
young
low
midage high
midage low
senior
low
senior medium

risky

Classifier
(Model)

risky
safe
risky
safe
safe

IF age=youth THEN loan_deci=risky
IF income=high then loan_deci=safe
IF age=mid AND income=low THEN
Loan_deci=risky
3

Classifier
Testing
Data

Unseen Data
(john,mid_age,low)

name age income loan_deci
Tom
low
Safe
senior
Mariya mid_age low
risky
George mid_age high
safe
......
.....
......
.......

Loan deci?





Genetic Algorithms
Rough Set Approach
Fuzzy set Approach






Genetic algorithms are examples of
evolutionary computing methods and are
optimization-type algorithms.
Given a population of potential problem
solutions (individuals).
evolutionary computing expands this
population with new and potentially
better solutions.





The basis for evolutionary computing
algorithms is biological evolution, where
over time evolution produces the best or
“fittest” individuals.
In Data mining, genetic algorithms may
be used for clustering, prediction, and
even association rules.



Individual (chromosome):

• feasible solution in an optimization problem



Population
• Set of individuals
• Should be maintained in each generation





The most important starting point to
develop a genetic algorithm
Each gene has its special meaning
Based on this representation, we can
define
• fitness evaluation function,
• crossover operator,
• mutation operator.

The fitness function takes a
single chromosome as input
and returns a measure of the
goodness of the
solution
represented
by
the
chromosome.



In genetic algorithms, reproduction is defined
by precise algorithms that indicate how to
combine the given set of individuals to produce
new ones. These are called “crossover
algorithms”.



Given two individuals; parents from a
population, the crossover technique generates
new individuals (offspring or children) by
switching subsequences of the string



Single-point Crossover

1 1 1 0 1 0 0 1 0 0 0
0 0 0



1 1 1 0 1 0 1 0 1 0 1
0 0 0

0 1

0 1 0 1 0 1

0 1

0 0 1 0 0 0

Two-point Crossover
1 1 1 0 1 0 0 1 0 0 0



1 1 0 0 1 0 1 1 0 0 0

0 0 0

0 0 1 0 1 0 0 0 1 0 1

0 1

0 1 0 1 0 1

Uniform Crossover

1 0 0 1 1 0 1 0 0 1 1

Crossover template
Crossover template

1 1 1 0 1 0 0 1 0 0 0

1 0 0 0 1 0 0 0 1 0 0

0 0

0 1 1 0 1 0 1 1 0 0 1

0 0 1

0 1 0 1 0 1




Usually change a single bit in a bit string
This operator should happen with very
low probability.
0

1

1

0

1
Mutation point
(random)

0

1

1

1

1

0 1 0 0 1
1 1 1 0 0
0 0 1 1 1
0 1 1 0 1
1 1 1 0 0
1 1 1 0 1

old generation

1 1

1 0 1

1 1

0 0 1

0 1

0 0 1

0 1

1 0 1

Crossover point
randomly selected

Probabilistically select individuals
Probabilistically select individuals


Crossover mates are probabilistically
selected based on their fitness value.

new generation

Mutation point
(random)
0 1

1 1 1

1 1 0
0 1 1

0 1
1 1

0 1 1 0 1
1 1 1 0 0
1 1 1 0 1





A rough set is a formal approximation of a
crisp set in terms of a pair of sets which give
the lower and the upper approximation of the
original set.
The tuple composed of the lower and upper
approximation is called a rough set.

•

A Rough Set Definition for a given class C is
approximated by two sets1. Lower Approximation of C consist of
all of
the data tuples that based on the
knowledge of the attributes, are certain
belong to C without ambiguity.
2. Upper Approximation of C consist of
all of the data tuples that based on the
knowledge of the attributes, cannot be
described as not belonging to
C.

One of the new data mining theories is the rough set
theories that can be used for
1.Classification to discover structured relationship within
noisy data.
2.Attributes subset selection.
3.Reduction of data set.
4.Finding hidden data patterns
5. Generation of decision rules









Fuzzy logic uses truth values between 0.0 and 1.0 to
represent the degree of membership (such as using
fuzzy membership graph)
Attribute values are converted to fuzzy values
• e.g., income is mapped into the discrete categories
{low, medium, high} with fuzzy values calculated
For a given new sample, more than one fuzzy value may
apply
Each applicable rule contributes a vote for
membership in the categories
Typically, the truth values for each predicted category
are summed, and these sums are combined
18

Other classification methods in data mining

Other classification methods in data mining

More Related Content

What's hot

Similar to Other classification methods in data mining

Recently uploaded

Other classification methods in data mining

Editor's Notes