SUPPORT VECTOR
MACHINES
CONTENTS
* MACHINE LEARNING
ALGORITHM
* CLASSIFICATION USING SVM
* KERNELS IN SVM
* ISSUES
* SVM REGRESSION
* IMPLEMENTATION
In general, a machine learning algorithm is used to predict the output for future based upon the
previously collected data.
Support vector machines is a supervised machine learning algorithm applied for both regression and
classification problems. SVMs are based upon statistical methods. The robust and accurate nature of
SVM made it the most efficient algorithm in the area of machine learning.
MACHINE LEARNING ALGORITHM
CLASSIFICATION USING SVM
• In this algorithm of classification, we plot each data item as a point in n-dimensional space (where n is number of
features you have) with the value of each feature being the value of a particular coordinate. Then, we perform
classification by finding the hyper-plane that differentiate the two classes.
• A hyper-plane is a line that separates data belonging to various classes or properties.
HYPER PLANE
CLASSIFICATION OF LINEAR DATA
Consider a linearly separable data which can be represented in a graphical format with two dimensions only.
This linear data can be easily classified by using a hyper-plane
Plot the data on graph
• There a two hyper planes which divide the data
• The efficiency of classification depends upon the best hyperplane chosen
CHOOSING THE BEST HYPER-PLANE
• According to the SVM algorithm we find the points closest to the line from both the classes.These points are called
support vectors.
• Now, we compute the distance between the line and the support vectors. This distance is called the margin.
• Our goal is to maximize the margin.
• The hyperplane for which the margin is maximum is the optimal hyperplane.
• Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide
as possible.
CLASSIFICATION OF NON-LINEAR DATA
Classifying the data which is linearly separable is easy by constructing a hyper plane.
But seperating a non linear data with numerous dimensions is not possible by drawing a straight line
The solution is to map the data into another space that can be separated linearly
Non linearly separable data
Add one more dimension z-axis
Now the data is clearly linearly separable. Let the purple line separating the data in higher dimension be z=k, where k is
a constant. Since, z=x²+y² we get x² + y² = k; which is an equation of a circle. So, we can project this linear separator in
higher dimension back in original dimensions using this transformation.
KERNELS IN SVM
Non-linear data can be classified by adding an extra dimension but finding the correct transformation is not easy everytime.
For this purpose we use kernels.
Kernels are used because constructing feature space which is
highly dimensional is costly.
DIFFERENT TYPES OF KERNEL FUNCTIONS
A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded
feature space:
Now we only need to compute k(xi,xj) and we don’t need to perform computations in high dimensional space explicitly.
This is what is called the Kernel Trick.
Some commonly used kernel functions are
Ploynomial kernel
Guassian kernel
Sigmoid kernel
Linear Kernel:
The Linear kernel is the simplest kernel function. It is given by the inner product plus an optional constant c.
Polynomial Kernel:
The Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the
training data is normalized.
IMPORTANT KERNEL ISSUES
For most of the kernel function we don’t know the corresponding mapping function so we don’t know to which
dimension we rose the data. So even though rising to higher dimension increases the likelihood that they will be
separable we can’t guarantee that . We will see a compromising solution for this problem.
Secondly,a strong kernel ,which lifts the data to infinite dimension, sometimes may lead us the
severe problem of Overfitting:
Symptoms of overfitting:
1-Low margin -> poor classification performance.
2-Large number of support vectors->Slows down the computation.
The biggest limitation of SVM lies in the choice of the kernel (the best choice of
kernel for a given problem is still a research problem). A second limitation is
speed and size (mostly in training - for large training sets, it typically selects a
small number of support vectors, thereby minimizing the computational
requirements during testing). The optimal design for multiclass SVM classifiers
is also a research area.
OVERFITTING PROBLEM
A well known problem with machine learning methods is overtraining.
In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of
data, and may therefore fail to fit additional data or predict future observations reliably".
The green line represents an overfitted model and the black line represents a regularized model. While the green line best
follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data,
compared to the black line.
Linearly separable
But low margin!
All these problems leads us to the compromising solution: SOFT MARGIN
The allowance of softness in margins (i.e. a low cost setting) allows for errors to be made while fitting the model
(support vectors) to the training/discovery data set.
Conversely, hard margins will result in fitting of a model that allows zero errors.
Sometimes it can be helpful to allow for errors in the training set, because it may produce a more generalizable model when
applied to new datasets.
SUPPORT VECTOR REGRESSION (SVR)
Coming to regression which tries to fit the data into a model in order to predict a quantity for future.
The Support Vector Regression (SVR) process is much similar to classification with minute differences.
The output for regression will be a real number which is predicted by substituting its values to the equation that is
constructed based upon the relationship between the attributes of already available data.
IMPLEMENTING SVM IN PYTHON
REAL TIME APPLICATIONS
The aim of using SVM is to correctly classify unseen data. SVMs have a number of applications in several fields.
Some common applications of SVM are-
• Face detection
• Text and hypertext categorization
• Classification of images
• Bioinformatics
• Protein fold and remote homology detection
• Handwriting recognition
• Generalized predictive control(GPC)
QUERIES?
THE END
MANASWINI MYSORE
15BD1A0531

Support vector machines

  • 1.
  • 2.
    CONTENTS * MACHINE LEARNING ALGORITHM *CLASSIFICATION USING SVM * KERNELS IN SVM * ISSUES * SVM REGRESSION * IMPLEMENTATION
  • 3.
    In general, amachine learning algorithm is used to predict the output for future based upon the previously collected data. Support vector machines is a supervised machine learning algorithm applied for both regression and classification problems. SVMs are based upon statistical methods. The robust and accurate nature of SVM made it the most efficient algorithm in the area of machine learning. MACHINE LEARNING ALGORITHM
  • 4.
    CLASSIFICATION USING SVM •In this algorithm of classification, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two classes. • A hyper-plane is a line that separates data belonging to various classes or properties. HYPER PLANE
  • 5.
    CLASSIFICATION OF LINEARDATA Consider a linearly separable data which can be represented in a graphical format with two dimensions only. This linear data can be easily classified by using a hyper-plane Plot the data on graph • There a two hyper planes which divide the data • The efficiency of classification depends upon the best hyperplane chosen
  • 6.
    CHOOSING THE BESTHYPER-PLANE • According to the SVM algorithm we find the points closest to the line from both the classes.These points are called support vectors. • Now, we compute the distance between the line and the support vectors. This distance is called the margin. • Our goal is to maximize the margin. • The hyperplane for which the margin is maximum is the optimal hyperplane. • Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible.
  • 7.
    CLASSIFICATION OF NON-LINEARDATA Classifying the data which is linearly separable is easy by constructing a hyper plane. But seperating a non linear data with numerous dimensions is not possible by drawing a straight line The solution is to map the data into another space that can be separated linearly Non linearly separable data Add one more dimension z-axis Now the data is clearly linearly separable. Let the purple line separating the data in higher dimension be z=k, where k is a constant. Since, z=x²+y² we get x² + y² = k; which is an equation of a circle. So, we can project this linear separator in higher dimension back in original dimensions using this transformation.
  • 8.
    KERNELS IN SVM Non-lineardata can be classified by adding an extra dimension but finding the correct transformation is not easy everytime. For this purpose we use kernels. Kernels are used because constructing feature space which is highly dimensional is costly.
  • 9.
    DIFFERENT TYPES OFKERNEL FUNCTIONS A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space: Now we only need to compute k(xi,xj) and we don’t need to perform computations in high dimensional space explicitly. This is what is called the Kernel Trick. Some commonly used kernel functions are Ploynomial kernel Guassian kernel Sigmoid kernel
  • 10.
    Linear Kernel: The Linearkernel is the simplest kernel function. It is given by the inner product plus an optional constant c. Polynomial Kernel: The Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data is normalized.
  • 11.
    IMPORTANT KERNEL ISSUES Formost of the kernel function we don’t know the corresponding mapping function so we don’t know to which dimension we rose the data. So even though rising to higher dimension increases the likelihood that they will be separable we can’t guarantee that . We will see a compromising solution for this problem. Secondly,a strong kernel ,which lifts the data to infinite dimension, sometimes may lead us the severe problem of Overfitting: Symptoms of overfitting: 1-Low margin -> poor classification performance. 2-Large number of support vectors->Slows down the computation. The biggest limitation of SVM lies in the choice of the kernel (the best choice of kernel for a given problem is still a research problem). A second limitation is speed and size (mostly in training - for large training sets, it typically selects a small number of support vectors, thereby minimizing the computational requirements during testing). The optimal design for multiclass SVM classifiers is also a research area.
  • 12.
    OVERFITTING PROBLEM A wellknown problem with machine learning methods is overtraining. In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably". The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line.
  • 13.
    Linearly separable But lowmargin! All these problems leads us to the compromising solution: SOFT MARGIN The allowance of softness in margins (i.e. a low cost setting) allows for errors to be made while fitting the model (support vectors) to the training/discovery data set. Conversely, hard margins will result in fitting of a model that allows zero errors. Sometimes it can be helpful to allow for errors in the training set, because it may produce a more generalizable model when applied to new datasets.
  • 14.
    SUPPORT VECTOR REGRESSION(SVR) Coming to regression which tries to fit the data into a model in order to predict a quantity for future. The Support Vector Regression (SVR) process is much similar to classification with minute differences. The output for regression will be a real number which is predicted by substituting its values to the equation that is constructed based upon the relationship between the attributes of already available data.
  • 15.
  • 16.
    REAL TIME APPLICATIONS Theaim of using SVM is to correctly classify unseen data. SVMs have a number of applications in several fields. Some common applications of SVM are- • Face detection • Text and hypertext categorization • Classification of images • Bioinformatics • Protein fold and remote homology detection • Handwriting recognition • Generalized predictive control(GPC)
  • 17.
  • 18.