Python for ML
MOST BASIC LIBRARIES
1
Libraries And ML Scope
ML
Data Gathering
Data Cleaning
Exploring DataBuilding Model
Visualization
2
Data Gathering
Beautiful Soup
• Is a Python library for pulling data
out of HTML and XML files. It works
with your favorite parser to provide
idiomatic ways of navigating,
searching, and modifying the parse
tree. It commonly saves
programmers hours or days of work.
Requests
• Is the de facto standard for making
HTTP requests in Python. It abstracts
the complexities of making requests
behind a beautiful, simple API so that
you can focus on interacting with
services and consuming data in your
application.
Pandas
• Is an open source, BSD-licensed
library providing high-performance,
easy-to-use data structures and data
analysis tools for
the Python programming language.
3
Data Cleaning 4
NumPy
• Is the fundamental package for scientific computing with
Python. It contains among other things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random
number capabilities
Pandas
• Is an open source, BSD-licensed library providing high-
performance, easy-to-use data structures and data analysis
tools for the Python programming language.
Exploring Data 5
Seaborn
• is a Python data visualization library
based on matplotlib. It provides a
high-level interface for drawing
attractive and informative statistical
graphics.
Matplotlib.pyplot
• is a collection of command style
functions that make matplotlib work
like MATLAB. Each pyplot function
makes some change to a figure: e.g.,
creates a figure, creates a plotting
area in a figure, plots some lines in a
plotting area, decorates the plot with
labels, etc.
Pandas
• Is an open source, BSD-licensed
library providing high-performance,
easy-to-use data structures and data
analysis tools for
the Python programming language.
Building Model 6
SciKit-learn
• Is an open source machine learning library that
that supports supervised and unsupervised
learning. It also provides various tools for
model fitting, data preprocessing, model
selection and evaluation, and many other
utilities.
Statsmodels
• Is a Python module that provides classes and
functions for the estimation of many different
statistical models, as well as for conducting
statistical tests, and statistical data exploration.
An extensive list of result statistics are
available for each estimator.
Visualization 7
Seaborn
• is a Python data
visualization library based
on matplotlib. It provides a
high-level interface for
drawing attractive and
informative statistical
graphics.
Matplotlib.pyplot
• is a collection of command
style functions that make
matplotlib work like
MATLAB.
Each pyplot function
makes some change to a
figure: e.g.,
Plotly
• is a web-based toolkit to
form data visualizations.
Plotly can also be accessed
from a Python Notebook
and has a great API.
Geoplotlib
• Is a toolbox for creating
maps and plotting
geographical data. You
can use it to create a
variety of map-types, like
choropleths, heatmaps,
and dot density maps.

Python for ML

  • 1.
    Python for ML MOSTBASIC LIBRARIES 1
  • 2.
    Libraries And MLScope ML Data Gathering Data Cleaning Exploring DataBuilding Model Visualization 2
  • 3.
    Data Gathering Beautiful Soup •Is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. Requests • Is the de facto standard for making HTTP requests in Python. It abstracts the complexities of making requests behind a beautiful, simple API so that you can focus on interacting with services and consuming data in your application. Pandas • Is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. 3
  • 4.
    Data Cleaning 4 NumPy •Is the fundamental package for scientific computing with Python. It contains among other things: • a powerful N-dimensional array object • sophisticated (broadcasting) functions • tools for integrating C/C++ and Fortran code • useful linear algebra, Fourier transform, and random number capabilities Pandas • Is an open source, BSD-licensed library providing high- performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • 5.
    Exploring Data 5 Seaborn •is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Matplotlib.pyplot • is a collection of command style functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. Pandas • Is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • 6.
    Building Model 6 SciKit-learn •Is an open source machine learning library that that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection and evaluation, and many other utilities. Statsmodels • Is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator.
  • 7.
    Visualization 7 Seaborn • isa Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Matplotlib.pyplot • is a collection of command style functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., Plotly • is a web-based toolkit to form data visualizations. Plotly can also be accessed from a Python Notebook and has a great API. Geoplotlib • Is a toolbox for creating maps and plotting geographical data. You can use it to create a variety of map-types, like choropleths, heatmaps, and dot density maps.