Demo Lecture for Assistant Professor Position
Branch Cse
Topic: Machine Learning
Name : Rohit Kumar
Date : 29 June 2024
AI vs ML
Artificial Intelligence (AI): A broader field focused on creating
intelligent systems that can mimic human thinking and behavior.
Machine Learning (ML): A subset of AI that enables machines to
learn from data and improve their performance over time without
being explicitly programmed.
Introduction to Machine Learning
A subset of artificial intelligence known as machine learning focuses
primarily on the creation of algorithms that enable a computer to
independently learn from data and previous experiences. Arthur
Samuel first used the term "machine learning" in 1959
Without being explicitly programmed, machine learning enables a
machine to automatically learn from data, improve performance from
experiences, and predict things.
Classification of Machine Learning
At a broad level, machine learning can be classified into three types:
 Supervised learning
 Unsupervised learning
 Reinforcement learning
SupervisedLearning:
Training with Labeled Data: In supervised learning, the machine is trained using
labeled data, where each input has a known output. This helps the system learn to
predict the output for new data based on the patterns it learned from the training
data.
Model Building: The system uses the labeled data to build a model that
understands the relationship between inputs and outputs. After training, we test the
model with new sample data to check if it can accurately predict the output.
Objective: The goal of supervised learning is to map input data to the correct
output. This process is similar to a student learning under the guidance of a
teacher.
Example: Spam filtering is a common example of supervised learning, where the
system is trained to distinguish between spam and non-spam emails.
Categories/ Techniques: Supervised learning algorithms can be divided into two
main categories: classification (predicting a discrete label , o/p here is binary or
categorical) and regression (predicting a continuous value ,o/p here is continuous
represents numeric quantity ex-30k).
Unsupervised Learning:
No Supervision: In unsupervised learning, the machine learns without
any guidance or labeled data.
Training Data: The machine is given data that hasn't been labeled,
classified, or categorized. The algorithm must find patterns and
relationships in this data on its own.
Goal: The aim is to reorganize the input data into new features or group
similar objects together. There is no predetermined result; the machine
looks for useful insights from large datasets.
Examples are online fraud detection, grouping customers into different
segments based on their purchase history
Categories: Unsupervised learning algorithms are mainly divided into
two types:
• Clustering: Grouping similar data points together.
• Association: Finding relationships between data items.
Reinforcement Learning:
Feedback-Based Learning: Reinforcement learning is a method where a
learning agent receives rewards for correct actions and penalties for
incorrect actions.
Automatic Learning: Using these feedback signals, the agent learns and
improves its behavior over time without explicit programming.
Interaction with Environment: The agent interacts with its environment,
exploring and learning from its actions.
Objective: The goal of reinforcement learning is for the agent to
maximize its cumulative reward by making optimal decisions.
Example: A robotic dog learning to navigate its environment by receiving
rewards for successful movements of its limbs exemplifies reinforcement
learning.
History of Machine Learning
Early Beginnings:
• 1950s: Alan Turing’s proposal of the Turing Test.
• 1956: Dartmouth Conference, where the term "Artificial Intelligence"
was coined.
• 1960s-70s: Development of early machine learning algorithms, such as
perceptron.
Evolution:
• 1980s: Introduction of neural networks and backpropagation.
• 1990s: Rise of support vector machines (SVM) and ensemble methods.
• 2000s-present: Growth of big data and deep learning, advancements in
computational power, and development of powerful ML frameworks.
Some Important machine learning algorithms
K-NN :
 KNN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
 K-Nearest Neighbor is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
 It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
 Example: Suppose, we have an image of a creature that looks similar to cat
and dog, but we want to know either it is a cat or dog. So for this
identification, we can use the KNN algorithm, as it works on a similarity
measure. Our KNN model will find the similar features of the new data set
to the cats and dogs images and based on the most similar features it will
put it in either cat or dog category.
How does K-NN work?
Suppose we have a new data point and we need to put it in the
required category. Consider the given image:
 Firstly, we will choose the number of neighbors, so we will
choose the k=5.
 Next, we will calculate the Euclidean distance between the
data points. The Euclidean distance is the distance between
two points, which we have already studied in geometry. It
can be calculated as: √[(x2 – x1)2 + (y2 – y1)2]
 By calculating the Euclidean distance we got the nearest
neighbors, as three nearest neighbors in category A and two
nearest neighbors in category B.
 As we can see the 3 nearest neighbors are from category A,
hence this new data point must belong to category A.
Decision Tree :
It is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is
a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
How does the Decision Tree algorithm Work?
In a decision tree, for predicting the class of the given dataset, the algorithm starts from
the root node of the tree. This algorithm compares the values of root attribute with the
record (real dataset) attribute and, based on the comparison, follows the branch and
jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-
nodes and move further. It continues the process until it reaches the leaf node of the
tree.
Find the best attribute in the dataset using Attribute Selection Measure (ASM).
*two popular techniques for ASM, which are:
•Information Gain
•Gini Index
Random forest:
Random forest is a machine learning technique consisting of multiple decision trees,
collectively used for classification and regression tasks. Each tree independently classifies
new objects based on their attributes, and the final prediction is determined by a majority
vote among all trees. This approach, known as ensemble learning, leverages the diversity
of individual trees to improve overall model accuracy and robustness. Random forest's
ability to handle both types of problems makes it a widely adopted method in various
domains, contributing to its popularity in machine learning applications.
Linear Regression:
Purpose: Predicts continuous real values by establishing a relationship between
independent and dependent variables through fitting a best-fit line, known as the
regression line, represented as Y=aX+b
•Components:
•Y: Dependent variable being predicted.
•a (slope): Determines the steepness of the line.
•X: Independent variable influencing the dependent variable.
•b (intercept): Represents the point where the line intercepts the Y-axis.
Logistic Regression:
Purpose: Performs classification by estimating discrete values (such as 0/1,
yes/no) based on given independent variables. It produces output values between
0 and 1 using the sigmoid function.
Output Range: The predicted values lie within the range of 0 to 1, indicating the
probability of belonging to a particular class.
Functionality: Unlike linear regression, which predicts continuous outcomes,
logistic regression is tailored for binary classification tasks, determining the
probability of an event occurring based on input variables.
Support Vector Machine (SVM) :
 It is a widely used supervised learning algorithm primarily for classification tasks in
machine learning. Its main objective is to find the optimal hyperplane that separates
data points into different classes in n-dimensional space.
 This hyperplane ensures future data can be correctly classified.
 SVM identifies support vectors, which are critical points closest to the decision
boundary, defining the hyperplane.
 These vectors are pivotal in determining the algorithm's effectiveness.
 The diagram illustrates how SVM establishes this boundary, making it a powerful tool
for classification tasks.
 SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
SVM Example
Support Vector Machine (SVM) functions like a model trained to distinguish
between cats and dogs based on their features. Using images of cats and dogs,
SVM learns to draw a decision boundary between the two classes, guided by
support vector extreme cases crucial for defining this boundary. When presented
with a strange animal sharing characteristics of both, SVM uses these support
vectors to classify it, ensuring accurate identification.
Linear vs. Non-linear SVM
Linear SVM is applied to linearly separable data, where classes can be
distinguished by a single straight line. It uses a Linear SVM classifier. Non-
linear SVM, on the other hand, handles data that isn't separable by a straight
line, employing a Non-linear SVM classifier for more complex patterns and
boundaries.
Naïve Bayes algorithm:
 It is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems.
 It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
 Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Bayes' Theorem:
Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on
the conditional probability.
The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
P(B) is Marginal Probability: Probability of Evidence.
Machine Learning Workflow of CVD Predictions
Applications of Machine Learning
Healthcare:
• Disease diagnosis
• Personalized treatment plans
• Predictive analytics for patient outcomes
Finance:
• Fraud detection
• Algorithmic trading
• Credit scoring
Retail:
• Customer segmentation
• Inventory management
• Personalized recommendations
Transportation:
• Autonomous vehicles
• Traffic prediction
• Route optimization
Manufacturing:
• Predictive maintenance
• Quality control
• Supply chain optimization
Entertainment:
• Content recommendation systems
• Personalized advertising
• Sentiment analysis
Agriculture:
• Crop yield prediction
• Precision farming
• Pest and disease detection
demo lecture for foundation class for btech

demo lecture for foundation class for btech

  • 1.
    Demo Lecture forAssistant Professor Position Branch Cse Topic: Machine Learning Name : Rohit Kumar Date : 29 June 2024
  • 2.
    AI vs ML ArtificialIntelligence (AI): A broader field focused on creating intelligent systems that can mimic human thinking and behavior. Machine Learning (ML): A subset of AI that enables machines to learn from data and improve their performance over time without being explicitly programmed.
  • 3.
    Introduction to MachineLearning A subset of artificial intelligence known as machine learning focuses primarily on the creation of algorithms that enable a computer to independently learn from data and previous experiences. Arthur Samuel first used the term "machine learning" in 1959 Without being explicitly programmed, machine learning enables a machine to automatically learn from data, improve performance from experiences, and predict things.
  • 4.
    Classification of MachineLearning At a broad level, machine learning can be classified into three types:  Supervised learning  Unsupervised learning  Reinforcement learning
  • 5.
    SupervisedLearning: Training with LabeledData: In supervised learning, the machine is trained using labeled data, where each input has a known output. This helps the system learn to predict the output for new data based on the patterns it learned from the training data. Model Building: The system uses the labeled data to build a model that understands the relationship between inputs and outputs. After training, we test the model with new sample data to check if it can accurately predict the output. Objective: The goal of supervised learning is to map input data to the correct output. This process is similar to a student learning under the guidance of a teacher. Example: Spam filtering is a common example of supervised learning, where the system is trained to distinguish between spam and non-spam emails. Categories/ Techniques: Supervised learning algorithms can be divided into two main categories: classification (predicting a discrete label , o/p here is binary or categorical) and regression (predicting a continuous value ,o/p here is continuous represents numeric quantity ex-30k).
  • 6.
    Unsupervised Learning: No Supervision:In unsupervised learning, the machine learns without any guidance or labeled data. Training Data: The machine is given data that hasn't been labeled, classified, or categorized. The algorithm must find patterns and relationships in this data on its own. Goal: The aim is to reorganize the input data into new features or group similar objects together. There is no predetermined result; the machine looks for useful insights from large datasets. Examples are online fraud detection, grouping customers into different segments based on their purchase history Categories: Unsupervised learning algorithms are mainly divided into two types: • Clustering: Grouping similar data points together. • Association: Finding relationships between data items.
  • 7.
    Reinforcement Learning: Feedback-Based Learning:Reinforcement learning is a method where a learning agent receives rewards for correct actions and penalties for incorrect actions. Automatic Learning: Using these feedback signals, the agent learns and improves its behavior over time without explicit programming. Interaction with Environment: The agent interacts with its environment, exploring and learning from its actions. Objective: The goal of reinforcement learning is for the agent to maximize its cumulative reward by making optimal decisions. Example: A robotic dog learning to navigate its environment by receiving rewards for successful movements of its limbs exemplifies reinforcement learning.
  • 8.
    History of MachineLearning Early Beginnings: • 1950s: Alan Turing’s proposal of the Turing Test. • 1956: Dartmouth Conference, where the term "Artificial Intelligence" was coined. • 1960s-70s: Development of early machine learning algorithms, such as perceptron. Evolution: • 1980s: Introduction of neural networks and backpropagation. • 1990s: Rise of support vector machines (SVM) and ensemble methods. • 2000s-present: Growth of big data and deep learning, advancements in computational power, and development of powerful ML frameworks.
  • 9.
    Some Important machinelearning algorithms
  • 10.
    K-NN :  KNNalgorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems.  K-Nearest Neighbor is one of the simplest Machine Learning algorithms based on Supervised Learning technique.  It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.  Example: Suppose, we have an image of a creature that looks similar to cat and dog, but we want to know either it is a cat or dog. So for this identification, we can use the KNN algorithm, as it works on a similarity measure. Our KNN model will find the similar features of the new data set to the cats and dogs images and based on the most similar features it will put it in either cat or dog category.
  • 11.
    How does K-NNwork? Suppose we have a new data point and we need to put it in the required category. Consider the given image:  Firstly, we will choose the number of neighbors, so we will choose the k=5.  Next, we will calculate the Euclidean distance between the data points. The Euclidean distance is the distance between two points, which we have already studied in geometry. It can be calculated as: √[(x2 – x1)2 + (y2 – y1)2]  By calculating the Euclidean distance we got the nearest neighbors, as three nearest neighbors in category A and two nearest neighbors in category B.  As we can see the 3 nearest neighbors are from category A, hence this new data point must belong to category A.
  • 12.
    Decision Tree : Itis a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. How does the Decision Tree algorithm Work? In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on the comparison, follows the branch and jumps to the next node. For the next node, the algorithm again compares the attribute value with the other sub- nodes and move further. It continues the process until it reaches the leaf node of the tree. Find the best attribute in the dataset using Attribute Selection Measure (ASM). *two popular techniques for ASM, which are: •Information Gain •Gini Index
  • 13.
    Random forest: Random forestis a machine learning technique consisting of multiple decision trees, collectively used for classification and regression tasks. Each tree independently classifies new objects based on their attributes, and the final prediction is determined by a majority vote among all trees. This approach, known as ensemble learning, leverages the diversity of individual trees to improve overall model accuracy and robustness. Random forest's ability to handle both types of problems makes it a widely adopted method in various domains, contributing to its popularity in machine learning applications.
  • 14.
    Linear Regression: Purpose: Predictscontinuous real values by establishing a relationship between independent and dependent variables through fitting a best-fit line, known as the regression line, represented as Y=aX+b •Components: •Y: Dependent variable being predicted. •a (slope): Determines the steepness of the line. •X: Independent variable influencing the dependent variable. •b (intercept): Represents the point where the line intercepts the Y-axis.
  • 15.
    Logistic Regression: Purpose: Performsclassification by estimating discrete values (such as 0/1, yes/no) based on given independent variables. It produces output values between 0 and 1 using the sigmoid function. Output Range: The predicted values lie within the range of 0 to 1, indicating the probability of belonging to a particular class. Functionality: Unlike linear regression, which predicts continuous outcomes, logistic regression is tailored for binary classification tasks, determining the probability of an event occurring based on input variables.
  • 16.
    Support Vector Machine(SVM) :  It is a widely used supervised learning algorithm primarily for classification tasks in machine learning. Its main objective is to find the optimal hyperplane that separates data points into different classes in n-dimensional space.  This hyperplane ensures future data can be correctly classified.  SVM identifies support vectors, which are critical points closest to the decision boundary, defining the hyperplane.  These vectors are pivotal in determining the algorithm's effectiveness.  The diagram illustrates how SVM establishes this boundary, making it a powerful tool for classification tasks.  SVM algorithm can be used for Face detection, image classification, text categorization, etc.
  • 17.
    SVM Example Support VectorMachine (SVM) functions like a model trained to distinguish between cats and dogs based on their features. Using images of cats and dogs, SVM learns to draw a decision boundary between the two classes, guided by support vector extreme cases crucial for defining this boundary. When presented with a strange animal sharing characteristics of both, SVM uses these support vectors to classify it, ensuring accurate identification. Linear vs. Non-linear SVM Linear SVM is applied to linearly separable data, where classes can be distinguished by a single straight line. It uses a Linear SVM classifier. Non- linear SVM, on the other hand, handles data that isn't separable by a straight line, employing a Non-linear SVM classifier for more complex patterns and boundaries.
  • 18.
    Naïve Bayes algorithm: It is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems.  It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.  Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles. Bayes' Theorem: Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the probability of a hypothesis with prior knowledge. It depends on the conditional probability. The formula for Bayes' theorem is given as: Where, P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B. P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a hypothesis is true. P(A) is Prior Probability: Probability of hypothesis before observing the evidence. P(B) is Marginal Probability: Probability of Evidence.
  • 19.
    Machine Learning Workflowof CVD Predictions
  • 20.
    Applications of MachineLearning Healthcare: • Disease diagnosis • Personalized treatment plans • Predictive analytics for patient outcomes Finance: • Fraud detection • Algorithmic trading • Credit scoring Retail: • Customer segmentation • Inventory management • Personalized recommendations Transportation: • Autonomous vehicles • Traffic prediction • Route optimization Manufacturing: • Predictive maintenance • Quality control • Supply chain optimization
  • 21.
    Entertainment: • Content recommendationsystems • Personalized advertising • Sentiment analysis Agriculture: • Crop yield prediction • Precision farming • Pest and disease detection