Introduction to Google
Colaboratory
Yomna Hassan
Misr International University
Why this session?
This is session is to introduce you to cloud based IDE’s such as
Colab and Jupyter. This is to help in facilitating your coding
experience while minimizing the amount of setup effort
needed to run standard AI and machine learning based
projects using Python.
We are focused on python as it is one of the main languages
focused on machine learning applications.
Survey - What’s your knowledge
level?
1- Python
2-Machine Learning
3- Google Colab
Difference between Jupyter and
Colab
Jupyter is the open source project on which Colab is based.
Colab allows you to use and share Jupyter notebooks with
others without having to download, install, or run anything.
Google Colab
In this session, we will focus on Google Colab. Colab has
already installed many of the popular libraries you may need
to run your code. In addition to this you can access and run
your code from anywhere with an internet connection.
Google Colab also facilitates teamwork/sharing between
multiple programmers.
Colab Pros and Cons
Pros:
1. Fast with no hassle GPU support
2. Can be accessed from anywhere
3. Leaves you free to work on your computer without using resources
4. File can be saved out as Jupiter file
5. Most popular libraries are pre-installed
Colab Pros and Cons
Cons:
1. Can be hard to upload local file
2. Files need to be downloaded which may take time if your internet
is slow
Today’s Program
1- How to start a Google Colab Notebook (file is saved automatically on
your google drive)
2- Running basic code with a “Local” file
3- Uploading files to Google Drive and connecting files to the code
4- Python Libraries
5- Simple Classifications Tasks :
- What is Classification
- CSV Data Classification
- Image Classification
New notebook
https://colab.research.google.com/
- Notebooks Location
- Playground mode
- External Files
- Github and Colab: ex: continualai
- Revisions, Show Diff
- Cells
- Sharing
- Comments
Colab Runtime
- What is runtime?
- GPU versus TPU
- Resources
Note:
Colab offers optional accelerated compute environments, including GPU and TPU. Executing code in a GPU or
TPU runtime does not automatically mean that the GPU or TPU is being utilized. To avoid hitting your GPU
usage limits, we recommend switching to a standard runtime if you are not utilizing the GPU. Choose
Runtime > Change Runtime Type and set Hardware Accelerator to None.
Colab is able to provide free resources in part by having dynamic usage limits that sometimes fluctuate,
and by not providing guaranteed or unlimited resources. This means that overall usage limits as well as
idle timeout periods, maximum VM lifetime, GPU types available, and other factors vary over time. Colab
does not publish these limits, in part because they can (and sometimes do) vary quickly.
Runtime Accelerators
● CPU: Central Processing Unit. Manage all the functions of a
computer.
● GPU: Graphical Processing Unit. Enhance the graphical
performance of the computer.
● TPU: Tensor Processing Unit. Custom build ASIC to accelerate
TensorFlow projects.
Python
Python is an interpreted high-level general-purpose
programming language.
Its language constructs as well as its object-oriented approach
aim to help programmers write clear, logical code for small
and large-scale projects.
Let’s start Coding!
- Basic line of code/ Printing (Input/Output consoles)
- Files Location
- Google Drive
- from google.colab import drive
drive.mount('/content/gdrive'
)
Try:
with open('/content/gdrive/My Drive/foo.txt'
, 'w') as f:
f.write('Hello Google Drive!'
)
!cat /content/gdrive/My Drive/foo.txt
Note: Upload Zipped folder and extract on drive
Classification
Type of Data
- Images/ Videos
- Tables / Structures
Datasets
UCI Machine Learning repository
Kaggle
Google Dataset Search
Visual Data
Tensor Flow Catalog
https://data.mendeley.com/
Python Libraries
Numpy : Matrices and numbers
PIL: Images
os: file systems
Scikit-learn: Machine learning
TensorFlow: machine learning
Terminologies
Data Frame
Algorithms:knn, kmeans , DT
Tuning
Features
Training
Testing
Accuracy
Epoch
Neural networks
KNN
The k-Nearest Neighbor classifier is by far the most simple machine learning and image
classification algorithm. It doesn’t actually “learn” anything. Instead, this algorithm
directly relies on the distance between feature vectors (which in our case, are the raw
RGB pixel intensities of the images).
k-NN algorithm classifies unknown data points by finding the most common class among
the k closest examples. Each data point in the k closest data points casts a vote, and the
category with the highest number of votes wins. Or, in plain English: “Tell me who your
neighbors are, and I’ll tell you who you are”
Choosing K Value - Bias-Variance
Tradoff
● The bias error is an error from erroneous assumptions in the learning
algorithm. High bias can cause an algorithm to miss the relevant
relations between features and target outputs (underfitting).
● The variance is an error from sensitivity to small fluctuations in the
training set. High variance may result from an algorithm modeling the
random noise in the training data (overfitting).
Choosing K Value
Writing Code
#Import Libraries
#Import dataset
#Shape dataset into dataframe /struct
#Design model
#Fit data model
#Compute accuracy
--------
Extra Steps: pre and post processing
Task 1 - Heart Disease Classification
Dataset: https://www.kaggle.com/ronitf/heart-disease-uci
Code:
https://colab.research.google.com/drive/19XE4497Gh-Xth
U9YE12ucuOPw3jywVtL
Exercise
Dataset:
https://www.kaggle.com/uciml/iris/tasks?taskId=1732
Code???
Text Versus Images
Task 2 - Image Classification
Dataset:
https://www.kaggle.com/c/dogs-vs-cats
Code:
https://colab.research.google.com/drive/1ZdRYZIMtMcBelTn4lCzx
aT9_ZD5Foe0s?usp=sharing
Calculate Accuracy
Downloading Data from Link Directly
on Colab
!wget -cq
https://s3.amazonaws.com/content.udacity-data.com/courses/nd188/flo
wer_data.zip
!unzip -qq flower_data.zip
Running Colab Code Offline
Colaboratory lets you connect to a local runtime using Jupyter. This allows you to
execute code on your local hardware and have access to your local file system.
Instructions for setting up Jupyter for the local runtime available here :
https://research.google.com/colaboratory/local-runtimes.html
Where to go from there?
Reading papers related to your topic to identify the most suitable tools/ methods to use
If you want to extend your knowledge about machine learning using python:
https://cognitiveclass.ai/courses/machine-learning-with-python
https://www.youtube.com/watch?v=ihK_YRMMHQM
https://www.analyticsvidhya.com/blog/2021/06/how-to-learn-mathematics-for-machin
e-learning-what-concepts-do-you-need-to-master-in-data-science/?fbclid=IwAR0gsww
28F2sppZ95LVAmQFEG2N0ohhsDMigE3g7HTiNG5ly_QYk30PMsQ4
Questions

Introduction to Google Colaboratory.pdf

  • 1.
    Introduction to Google Colaboratory YomnaHassan Misr International University
  • 2.
    Why this session? Thisis session is to introduce you to cloud based IDE’s such as Colab and Jupyter. This is to help in facilitating your coding experience while minimizing the amount of setup effort needed to run standard AI and machine learning based projects using Python. We are focused on python as it is one of the main languages focused on machine learning applications.
  • 3.
    Survey - What’syour knowledge level? 1- Python 2-Machine Learning 3- Google Colab
  • 4.
    Difference between Jupyterand Colab Jupyter is the open source project on which Colab is based. Colab allows you to use and share Jupyter notebooks with others without having to download, install, or run anything.
  • 5.
    Google Colab In thissession, we will focus on Google Colab. Colab has already installed many of the popular libraries you may need to run your code. In addition to this you can access and run your code from anywhere with an internet connection. Google Colab also facilitates teamwork/sharing between multiple programmers.
  • 6.
    Colab Pros andCons Pros: 1. Fast with no hassle GPU support 2. Can be accessed from anywhere 3. Leaves you free to work on your computer without using resources 4. File can be saved out as Jupiter file 5. Most popular libraries are pre-installed
  • 7.
    Colab Pros andCons Cons: 1. Can be hard to upload local file 2. Files need to be downloaded which may take time if your internet is slow
  • 8.
    Today’s Program 1- Howto start a Google Colab Notebook (file is saved automatically on your google drive) 2- Running basic code with a “Local” file 3- Uploading files to Google Drive and connecting files to the code 4- Python Libraries 5- Simple Classifications Tasks : - What is Classification - CSV Data Classification - Image Classification
  • 9.
    New notebook https://colab.research.google.com/ - NotebooksLocation - Playground mode - External Files - Github and Colab: ex: continualai - Revisions, Show Diff - Cells - Sharing - Comments
  • 10.
    Colab Runtime - Whatis runtime? - GPU versus TPU - Resources Note: Colab offers optional accelerated compute environments, including GPU and TPU. Executing code in a GPU or TPU runtime does not automatically mean that the GPU or TPU is being utilized. To avoid hitting your GPU usage limits, we recommend switching to a standard runtime if you are not utilizing the GPU. Choose Runtime > Change Runtime Type and set Hardware Accelerator to None. Colab is able to provide free resources in part by having dynamic usage limits that sometimes fluctuate, and by not providing guaranteed or unlimited resources. This means that overall usage limits as well as idle timeout periods, maximum VM lifetime, GPU types available, and other factors vary over time. Colab does not publish these limits, in part because they can (and sometimes do) vary quickly.
  • 11.
    Runtime Accelerators ● CPU:Central Processing Unit. Manage all the functions of a computer. ● GPU: Graphical Processing Unit. Enhance the graphical performance of the computer. ● TPU: Tensor Processing Unit. Custom build ASIC to accelerate TensorFlow projects.
  • 12.
    Python Python is aninterpreted high-level general-purpose programming language. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.
  • 13.
    Let’s start Coding! -Basic line of code/ Printing (Input/Output consoles) - Files Location - Google Drive - from google.colab import drive drive.mount('/content/gdrive' ) Try: with open('/content/gdrive/My Drive/foo.txt' , 'w') as f: f.write('Hello Google Drive!' ) !cat /content/gdrive/My Drive/foo.txt Note: Upload Zipped folder and extract on drive
  • 15.
  • 16.
    Type of Data -Images/ Videos - Tables / Structures
  • 17.
    Datasets UCI Machine Learningrepository Kaggle Google Dataset Search Visual Data Tensor Flow Catalog https://data.mendeley.com/
  • 18.
    Python Libraries Numpy :Matrices and numbers PIL: Images os: file systems Scikit-learn: Machine learning TensorFlow: machine learning
  • 19.
    Terminologies Data Frame Algorithms:knn, kmeans, DT Tuning Features Training Testing Accuracy Epoch Neural networks
  • 20.
    KNN The k-Nearest Neighborclassifier is by far the most simple machine learning and image classification algorithm. It doesn’t actually “learn” anything. Instead, this algorithm directly relies on the distance between feature vectors (which in our case, are the raw RGB pixel intensities of the images). k-NN algorithm classifies unknown data points by finding the most common class among the k closest examples. Each data point in the k closest data points casts a vote, and the category with the highest number of votes wins. Or, in plain English: “Tell me who your neighbors are, and I’ll tell you who you are”
  • 21.
    Choosing K Value- Bias-Variance Tradoff ● The bias error is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting). ● The variance is an error from sensitivity to small fluctuations in the training set. High variance may result from an algorithm modeling the random noise in the training data (overfitting).
  • 22.
  • 23.
    Writing Code #Import Libraries #Importdataset #Shape dataset into dataframe /struct #Design model #Fit data model #Compute accuracy -------- Extra Steps: pre and post processing
  • 24.
    Task 1 -Heart Disease Classification Dataset: https://www.kaggle.com/ronitf/heart-disease-uci Code: https://colab.research.google.com/drive/19XE4497Gh-Xth U9YE12ucuOPw3jywVtL
  • 25.
  • 26.
  • 28.
    Task 2 -Image Classification Dataset: https://www.kaggle.com/c/dogs-vs-cats Code: https://colab.research.google.com/drive/1ZdRYZIMtMcBelTn4lCzx aT9_ZD5Foe0s?usp=sharing
  • 29.
  • 30.
    Downloading Data fromLink Directly on Colab !wget -cq https://s3.amazonaws.com/content.udacity-data.com/courses/nd188/flo wer_data.zip !unzip -qq flower_data.zip
  • 31.
    Running Colab CodeOffline Colaboratory lets you connect to a local runtime using Jupyter. This allows you to execute code on your local hardware and have access to your local file system. Instructions for setting up Jupyter for the local runtime available here : https://research.google.com/colaboratory/local-runtimes.html
  • 32.
    Where to gofrom there? Reading papers related to your topic to identify the most suitable tools/ methods to use If you want to extend your knowledge about machine learning using python: https://cognitiveclass.ai/courses/machine-learning-with-python https://www.youtube.com/watch?v=ihK_YRMMHQM https://www.analyticsvidhya.com/blog/2021/06/how-to-learn-mathematics-for-machin e-learning-what-concepts-do-you-need-to-master-in-data-science/?fbclid=IwAR0gsww 28F2sppZ95LVAmQFEG2N0ohhsDMigE3g7HTiNG5ly_QYk30PMsQ4
  • 33.