Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
6.S094: Deep Learning for Self-Driving Cars
Learning to Drive: Convolutional Neural Networks
and End-to-End Learning of the Full Driving Tasks
cars.mit.edu
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Administrative
• Website: cars.mit.edu
• Contact Email: deepcars@mit.edu
• Required:
• Create an account on the website.
• Follow the tutorial for each of the 2 projects.
• Recommended:
• Ask questions
• Win competition!
• Office hours: Friday, 5-7pm
(more info coming soon)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Administrative
• Website: cars.mit.edu
• Contact Email: deepcars@mit.edu
• Required:
• Create an account on the website.
• Follow the tutorial for each of the 2 projects.
• Recommended:
• Ask questions
• Win competition!
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Schedule
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
DeepTraffic Leaderboard
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Illustrative Case Study: Traffic Light Detection
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
DeepTesla: End-to-End Learning from Human and Autopilot Driving
(in ConvnetJS)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
DeepTesla: End-to-End Learning from Human and Autopilot Driving
(in TensorFlow)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Supervised
Learning
Unsupervised
Learning
Semi-Supervised
Learning
Reinforcement
Learning
Standard supervised learning pipeline:
Computer Vision is Machine Learning
References: [81]
Computer Vision
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Images are Numbers
References: [89]
• Regression: The output variable takes continuous values
• Classification: The output variable takes class labels
• Underneath it may still produce continuous values such as
probability of belonging to a particular class.
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Computer Vision is Hard
References: [66, 69, 89]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Image Classification Pipeline
References: [81, 89]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Famous Computer Vision Datasets
References: [90, 91, 92, 93]
MNIST: handwritten digits ImageNet: WordNet hierarchy
CIFAR-10(0): tiny images Places: natural scenes
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Let’s Build an Image Classifier for CIFAR-10
References: [89, 91]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Let’s Build an Image Classifier for CIFAR-10
References: [89, 91]
Accuracy
Random: 10%
Our image-diff (with L1): 38.6%
Our image-diff (with L2): 35.4%
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
K-Nearest Neighbors: Generalizing the Image-Diff Classifier
References: [89]
Tuning (hyper)parameters:
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
K-Nearest Neighbors: Generalizing the Image-Diff Classifier
References: [89, 94]
Accuracy
Random: 10%
Training and testing on the same data: 35.4%
7-Nearest Neighbors: ~30%
Human: ~94%
…
Convolutional Neural Networks: ~95%
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Reminder: Weighing the Evidence
References: [78]
Evidence
Decisions
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Reminder: Classify and Image of a Number
References: [80]
Input:
(28x28)
Network:
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Reminder: “Learning” is Optimization of a Function
References: [63, 80]
Ground truth for “6”:
“Loss” function:
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Convolutional Neural Networks
References: [95]
Regular neural network (fully connected):
Convolutional neural network:
Each layer takes a 3d volume, produces 3d volume with some
smooth function that may or may not have parameters.
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Convolutional Neural Networks: Layers
• INPUT [32x32x3] will hold the raw pixel values of the image, in this case an image of width 32, height 32, and
with three color channels R,G,B.
• CONV layer will compute the output of neurons that are connected to local regions in the input, each computing
a dot product between their weights and a small region they are connected to in the input volume. This may
result in volume such as [32x32x12] if we decided to use 12 filters.
• RELU layer will apply an elementwise activation function, such as the max(0,x) thresholding at zero. This leaves
the size of the volume unchanged ([32x32x12]).
• POOL layer will perform a downsampling operation along the spatial dimensions (width, height), resulting in
volume such as [16x16x12].
• FC (i.e. fully-connected) layer will compute the class scores, resulting in volume of size [1x1x10], where each of
the 10 numbers correspond to a class score, such as among the 10 categories of CIFAR-10. As with ordinary
Neural Networks and as the name implies, each neuron in this layer will be connected to all the numbers in the
previous volume.
References: [95]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Dealing with Images: Local Connectivity
Same neuron. Just more focused (narrow “receptive field”).
The parameters on a each filter are spatially “shared”
(if a feature is useful in one place, it’s useful elsewhere)
References: [95]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNets: Spatial Arrangement of Output Volume
• Depth: number of filters
• Stride: filter step size (when we “slide” it)
• Padding: zero-pad the input
References: [95]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
References: [95]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNets: Pooling
References: [95]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Computer Vision:
Object Recognition / Classification
References: [4]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Original Ground Truth FCN-8
Computer Vision:
Segmentation
References: [96]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Computer Vision:
Object Detection
References: [97]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
How Can Convolutional Neural Networks Help Us Drive?
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Driving: The Numbers
(in United States, in 2014)
• All drivers: 10,658 miles
(29.2 miles per day)
• Rural drivers: 12,264 miles
• Urban drivers: 9,709 miles
• Fatal crashes: 29,989
• All fatalities: 32,675
• Car occupants: 12,507
• SUV occupants: 8,320
• Pedestrians: 4,884
• Motorcycle: 4,295
• Bicyclists: 720
• Large trucks: 587
Miles Fatalities
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Cars We Drive
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Human at the Center of Automation:
The Way to Full Autonomy Includes the Human
Ford F150 Tesla Model S Google Self-Driving Car
Fully
Human
Controlled
Fully
Machine
Controlled
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Human at the Center of Automation:
The Way to Full Autonomy Includes the Human
• Emergency
• Automatic emergency breaking (AEB)
• Warnings
• Lane departure warning (LDW)
• Forward collision warning (FCW)
• Blind spot detection
• Longitudinal
• Adaptive cruise control (ACC)
• Lateral
• Lane keep assist (LKA)
• Automatic steering
• Control and Planning
• Automatic lane change
• Automatic parking
Tesla Autopilot
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Distracted Humans
• Injuries and fatalities:
3,179 people were killed and 431,000 were
injured in motor vehicle crashes involving
distracted drivers
(in 2014)
• Texts:
169.3 billion text messages were sent in the
US every month.
(as of December 2014)
• Eye off road:
5 seconds is the average time your eyes are
off the road while texting. When traveling
at 55mph, that's enough time to cover the
length of a football field blindfolded.
What is distracted driving?
• Texting
• Using a smartphone
• Eating and drinking
• Talking to passengers
• Grooming
• Reading, including maps
• Using a navigation system
• Watching a video
• Adjusting a radio
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
4 D’s of Being Human:
Drunk, Drugged, Distracted, Drowsy
• Drunk Driving: In 2014, 31 percent of traffic fatalities
involved a drunk driver.
• Drugged Driving: 23% of night-time drivers tested positive
for illegal, prescription or over-the-counter medications.
• Distracted Driving: In 2014, 3,179 people (10 percent of
overall traffic fatalities) were killed in crashes involving
distracted drivers.
• Drowsy Driving: In 2014, nearly three percent of all traffic
fatalities involved a drowsy driver, and at least 846 people
were killed in crashes involving a drowsy driver.
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
In Context: Traffic Fatalities
Total miles driven in U.S. in 2014:
3,000,000,000,000 (3 million million)
Fatalities: 32,675
(1 in 90 million)
Tesla Autopilot miles driven since October 2015:
300,000,000 (300 million)
(as of December 2016)
Fatalities: 1
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
In Context: Traffic Fatalities
Total miles driven in U.S. in 2014:
3,000,000,000,000 (3 million million)
Fatalities: 32,675
(1 in 90 million)
Tesla Autopilot miles driven since October 2015:
300,000,000 (300 million)
Fatalities: 1
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
In Context: Traffic Fatalities
Total miles driven in U.S. in 2014:
3,000,000,000,000 (3 million million)
Fatalities: 32,675
(1 in 90 million)
Tesla Autopilot miles driven since October 2015:
300,000,000 (300 million)
Fatalities: 1
We (increasingly) understand this
We do not understand this (yet)
We need A LOT of real-world semi-autonomous driving data!
Computer Vision + Machine Learning + Big Data = Understanding
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
The Data
Teslas instrumented: 17
Hours of data: 5,000+ hours
Distance traveled: 70,000+ miles
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
The Data
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Camera and Lens Selection
Fisheye: Capture full range of head, body
movement inside vehicle.
2.8-12mm Focal Length: “Zoom” on the face
without obstructing the driver’s view.
Logitech C920:
On-board H264 Compression
Case for C-Mount Lens:
Flexibility in lens selection
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Semi-Autonomous Vehicle Components
External
1. Radar
2. Visible-light camera
3. LIDAR
4. Infrared camera
5. Stereo vision
6. GPS/IMU
7. CAN
8. Audio
Internal
1. Visible-light camera
2. Infrared camera
3. Audio
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Self-Driving Car Tasks
• Localization and Mapping:
Where am I?
• Scene Understanding:
Where is everyone else?
• Movement Planning:
How do I get from A to B?
• Driver State:
What’s the driver up to?
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Self-Driving Car Tasks
• Localization and Mapping:
Where am I?
• Scene Understanding:
Where is everyone else?
• Movement Planning:
How do I get from A to B?
• Driver State:
What’s the driver up to?
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Visual Odometry
• 6-DOF: freed of movement
• Changes in position:
• Forward/backward: surge
• Left/right: sway
• Up/down: heave
• Orientation:
• Pitch, Yaw, Roll
• Source:
• Monocular: I moved 1 unit
• Stereo: I moved 1 meter
• Mono = Stereo for far away objects
• PS: For tiny robots everything is “far away” relative to inter-camera
distance
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
SLAM: Simultaneous Localization and Mapping
What works: SIFT and optical flow
References: [98, 99]
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Visual Odometry in Parts
• (Stereo) Undistortion, Rectification
• (Stereo) Disparity Map Computation
• Feature Detection (e.g., SIFT, FAST)
• Feature Tracking (e.g., KLT: Kanade-Lucas-Tomasi)
• Trajectory Estimation
• Use rigid parts of the scene (requires outlier/inlier detection)
• For mono, need more info* like camera orientation and height of
off the ground
* Kitt, Bernd Manfred, et al. "Monocular visual odometry using a planar road model to solve scale ambiguity." (2011).
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
End-to-End Visual Odometry
Konda, Kishore, and Roland Memisevic. "Learning visual odometry with a convolutional
network." International Conference on Computer Vision Theory and Applications. 2015.
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Self-Driving Car Tasks
• Localization and Mapping:
Where am I?
• Scene Understanding:
Where is everyone else?
• Movement Planning:
How do I get from A to B?
• Driver State:
What’s the driver up to?
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Object Detection
• Past approaches: cascades classifiers (Haar-like features)
• Where deep learning can help:
recognition, classification, detection
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Full Driving Scene Segmentation
Fully Convolutional Network implementation:
https://github.com/tkuanlun350/Tensorflow-SegNet
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Road Texture and Condition from Audio
(with Recurrent Neural Networks)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Self-Driving Car Tasks
• Localization and Mapping:
Where am I?
• Scene Understanding:
Where is everyone else?
• Movement Planning:
How do I get from A to B?
• Driver State:
What’s the driver up to?
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
• Previous approaches: optimization-based control
• Where deep learning can help: reinforcement learning
Deep Reinforcement Learning implementation:
https://github.com/nivwusquorum/tensorflow-deepq
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Self-Driving Car Tasks
• Localization:
Where am I?
• Object detection:
Where is everyone else?
• Movement planning:
How do I get from A to B?
• Driver state:
What’s the driver up to?
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Drive State Detection:
A Multi-Resolutional View
Gaze
Classification
Blink
Rate
Blink
Duration
Head
Pose
Eye
Pose
Pupil
Diameter
Micro
Saccades
Increasing level of detection resolution and difficulty
Body
Pose
Blink
Dynamics
Micro
Glances
Cognitive
Load
Drowsiness
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Gaze Region and Autopilot State
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Driver Emotion
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
If Driving is a Conversation, this is an
End-to-End Natural Language Generation
1. Natural language processing to enable
it to communicate successfully
2. Knowledge representation to store
information provided before or during
the interrogation
3. Automated reasoning to use the stored
information to answer questions and to
draw new conclusions
Turing Test:
Can a computer be mistaken for a
human more than 30% of the time?
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Autonomous Driving: End-to-End
Magic
Happens
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Stairway to Automation
Ford F150
Tesla Model S
Google Self-Driving Car
Training Dataset
Testing Dataset
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Autonomous Driving: End-to-End
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Autonomous Driving: End-to-End
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Autonomous Driving: End-to-End
• 9 layers
• 1 normalization layer
• 5 convolutional layers
• 3 fully connected layers
• 27 million connections
• 250 thousand parameters
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
End-to-End Driving with ConvnetJS
Tutorial on http://cars.mit.edu/deeptesla
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
End-to-end Steering
• By the end of this lecture, you’ll be able to train a model
that can steer a vehicle
• Our input to our network will be a single image of the
forward roadway from a Tesla
• Our output will be a steering wheel value between -20 and
20
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Creating the Dataset
• We recorded and extracted 10 video clips of highway driving
from a Tesla
• The wheel value was extracted from the in-vehicle CAN
• We cropped/extracted a window from each video frame and
provide a CSV linking the window to a wheel value
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Lighting and Road Conditions
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNetJS Overview
• ConvNetJS is a Javascript
implementation for using
and training neural
networks within the
browser
• It supports simple networks
with several different layer
types and training
algorithms
• Constructing and training a
network can be performed
in very few lines of code,
great for demonstrations
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNetJS – Neural Network Representation
• The network is represented
by a single Javascript object
which contains a list of
layers
• Each layer contains a plain
array of weights (w), the
activation/activation
gradients of the last
forward pass, as well as the
shape and layer type
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Layer Types
• ConvNetJS implements several different layer types:
convolutional, pooling, fully-connected, local contrast
normalization, and loss layers
• There are three available output types: regression, softmax,
and SVM
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNetJS – Training Overview
• To train a network, you first must
initialize a “Trainer” object
• var trainer = new
convnetjs.SGDTrainer(net, {
method: ‘adadelta’, batch_size:
1, l2_decay: 0.0001});
• There are three training algorithms
available: SGD, Adadelta, and
Adagrad.
• Training is performed by manually
calling trainer.train(input_volume,
expected_output)
• Returns an object containing timing
and loss function information
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
DeepTesla Overview
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Model Metrics
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Network Designer
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Training Interaction
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Layer Visualization
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Input Layer
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Convolutional Layer Visualization
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Video Visualization
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Information Bar
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Input Box
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Barcodes
• 17 bit, sign-magnitude
• Encoded into actual video
• 0 = black, 1 = white
• Frame on top, wheel on
bottom (divided by two)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Image Batches
• Each image loaded of the
network contains an entire
batch
• There is one image per row,
and 250 rows in total
• These images are
reassembled into volumes
upon download
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Training Explanation
• One web worker used for loading examples
• Each batch of training images is one large image with each row as a
single training example
• After an image finishes loading asynchronously, sends the training
examples to another worker
• One web worker used for training network
• Train on each image and push the network/outputs to visualization
worker
• One web worker used for visualization
• For a specified training example interval, blit the activation/gradient
output of each training example onto a canvas
• Each web worker behaves as a single thread, and we use
message passing to communicate state between the
workers
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNetJS Evaluation - Video Explanation
• The videos are encoded as 1280x820 in H264/MKV with 17
bit sign-magnitude barcodes
• The video main video frame is stored in the box (0, 1280, 0,
720)
• The frame barcode is in box (1144, 720, 1280, 770)
• The wheel value barcode is in box (1144, 770, 1280, 820)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNetJS Evaluation - Creating the Video
• Each epoch is synchronized to 30 fps
• We extract the wheel value from the CAN data and
synchronize each message to a frame (both the frame and
the CAN message are timestamped)
• Using OpenCV, we process the data
• Generate a bar code for the frame containing the wheel data
• Crop the image portion used for training
• Create single images containing batches of training data
• The epochs and associated data are copied to our web
server which serves
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNetJS Evaluation - Playing the Video
• To be able to use the video in the neural network, we need
to do some preprocessing
• First, we have a hidden video element and rely on modern
HTML5 video implementations
• When the user requests the video to play, we begin tracking
each redraw of the page
• With each redraw, we grab the currently rendered video
frame, extract the RGBA values and blit them to two
different canvases: one canvas, which the user sees, and
another canvas which is hidden and only contains a cropped
portion of the frame (the part we will use for the neural
network)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNetJS Evaluation - Playing the Video
• Next, we read the image data from the hidden canvas and
shape it into a ConvNetJS volume
• For each image we first create a volume:
• var image_vol = new convnetjs.Vol(x_size, y_size, depth,
default_value)
• Next, we extract each pixel from the canvas and set the
equivalent voxel (volume pixel) to the value (skipping the
alpha value)
• We can also extract the expected steer value by parsing the
barcode (a 17 bit, signed-magnitude barcode, where white =
1, black = 0)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
ConvNetJS Evaluation - Forward Pass
• Now we can use our extracted volume in the forward pass
by calling net.forward(our_volume)
• The predicted value is stored in the output neuron:
• var prediction = net.forward(vol);
• var raw_regression_value = prediction.w[0];
• Because we min-max normalized our inputs while training
the network, we need to transform our outputs – this is just
the reverse transformation we performed on input:
• Wheel value = (raw_regression_value * total_wheel_range) –
wheel_min
• We visualize the predicted and actual steering wheel values
and calculate the error
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
End-to-End Driving with ConvnetJS
Tutorial on http://cars.mit.edu/deeptesla
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
End-to-End Driving with TensorFlow
Available on http://github.com/lexfridman/deeptesla
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Build the Model: Input and Output
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W, stride):
return tf.nn.conv2d(x, W, strides=[1, stride, stride, 1],
padding='VALID')
x = tf.placeholder(tf.float32, shape=[None, 66, 200, 3])
y_ = tf.placeholder(tf.float32, shape=[None, 1])
x_image = x
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Build the Model: Convolutional Layers
#first convolutional layer
W_conv1 = weight_variable([5, 5, 3, 24])
b_conv1 = bias_variable([24])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1, 2) + b_conv1)
#second convolutional layer
W_conv2 = weight_variable([5, 5, 24, 36])
b_conv2 = bias_variable([36])
h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2) + b_conv2)
#third convolutional layer
W_conv3 = weight_variable([5, 5, 36, 48])
b_conv3 = bias_variable([48])
h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 2) + b_conv3)
#fourth convolutional layer
W_conv4 = weight_variable([3, 3, 48, 64])
b_conv4 = bias_variable([64])
h_conv4 = tf.nn.relu(conv2d(h_conv3, W_conv4, 1) + b_conv4)
#fifth convolutional layer
W_conv5 = weight_variable([3, 3, 64, 64])
b_conv5 = bias_variable([64])
h_conv5 = tf.nn.relu(conv2d(h_conv4, W_conv5, 1) + b_conv5)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Build the Model: Fully Connected Layers
# fully connected layer 1
W_fc1 = weight_variable([1152, 1164])
b_fc1 = bias_variable([1164])
h_conv5_flat = tf.reshape(h_conv5, [-1, 1152])
h_fc1 = tf.nn.relu(tf.matmul(h_conv5_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# fully connected layer 2
W_fc2 = weight_variable([1164, 100])
b_fc2 = bias_variable([100])
h_fc2 = tf.nn.relu(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
h_fc2_drop = tf.nn.dropout(h_fc2, keep_prob)
# fully connected layer 3
W_fc3 = weight_variable([100, 50])
b_fc3 = bias_variable([50])
h_fc3 = tf.nn.relu(tf.matmul(h_fc2_drop, W_fc3) + b_fc3)
h_fc3_drop = tf.nn.dropout(h_fc3, keep_prob)
# fully connected layer 4
W_fc4 = weight_variable([50, 10])
b_fc4 = bias_variable([10])
h_fc4 = tf.nn.relu(tf.matmul(h_fc3_drop, W_fc4) + b_fc4)
h_fc4_drop = tf.nn.dropout(h_fc4, keep_prob)
#Output
W_fc5 = weight_variable([10, 1])
b_fc5 = bias_variable([1])
y = tf.mul(tf.atan(tf.matmul(h_fc4_drop, W_fc5) + b_fc5), 2)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Train the Model
sess = tf.InteractiveSession()
loss = tf.reduce_mean(tf.square(tf.sub(model.y_, model.y)))
train_step = tf.train.AdamOptimizer(1e-4).minimize(loss)
sess.run(tf.initialize_all_variables())
saver = tf.train.Saver()
for i in range(int(driving_data.num_images * 0.3)):
xs, ys = driving_data.LoadTrainBatch(100)
train_step.run(feed_dict={model.x: xs, model.y_: ys, model.keep_prob: 0.8})
if i % 10 == 0:
xs, ys = driving_data.LoadValBatch(100)
print("step %d, val loss %g"%(i, loss.eval(feed_dict={
model.x:xs, model.y_: ys, model.keep_prob: 1.0})))
if i % 100 == 0:
if not os.path.exists(LOGDIR):
os.makedirs(LOGDIR)
checkpoint_path = os.path.join(LOGDIR, "model.ckpt")
filename = saver.save(sess, checkpoint_path)
print("Model saved in file: %s" % filename)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Run the Model
import tensorflow as tf
import scipy.misc
import model
import cv2
sess = tf.InteractiveSession()
saver = tf.train.Saver()
saver.restore(sess, "save/model.ckpt")
img = cv2.imread('steering_wheel_image.jpg',0)
rows,cols = img.shape
smoothed_angle = 0
cap = cv2.VideoCapture(0)
while(cv2.waitKey(10) != ord('q')):
ret, frame = cap.read()
image = scipy.misc.imresize(frame, [66, 200]) / 255.0
degrees = model.y.eval(feed_dict={model.x: [image], model.keep_prob:
1.0})[0][0] 
* 180 / scipy.pi
cv2.imshow('frame', frame)
smoothed_angle += 0.2 * pow(abs((degrees - smoothed_angle)), 2.0 / 3.0) * 
(degrees - smoothed_angle) / abs(degrees - smoothed_angle)
M = cv2.getRotationMatrix2D((cols/2,rows/2),-smoothed_angle,1)
dst = cv2.warpAffine(img,M,(cols,rows))
cv2.imshow("steering wheel", dst)
cap.release()
cv2.destroyAllWindows()
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
End-to-End Driving with TensorFlow
Available on http://github.com/lexfridman/deeptesla
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
TrafficLight Classification with TensorFlow
We will be implementing a simple traffic light classifier, with 3 classes (red, green, yellow)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Parameters
• Max epochs: the number of
times the neural network
will see all training
examples
• Input_img_x/y: the size we
will use for inputs into the
the network
• Batch size: # of examples
the neural network will see
before making a gradient
step
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Helper Functions
We use some helper functions to make adding layers
easier/more consistent
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Model Input/Output
• We specify our input and output types in the same lines to
make sure they agree with our idea of the network
• Our input is an image of sized 32x32x3 (RGB channels)
• Our output consists of 3 neurons, representing the
probability of each class
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Convolutional Layer
• Here we specify our first convolutional layer using our helper
function
• W_conv1 – a 4D tensor representing the weights [filter_x,
filter_y, previous layer neurons, # of filters]
• b_conv1 – our simple addition variable
• h_conv1 – our actual layer/activation
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Pooling Layer
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Flattening Pool Layer
We calculate the total number of neurons needed in our first fully-connected
layer by multiplying all the dimensions of the pool layer shape
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Output Layer
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Loss and Optimizer
• Our loss function performs softmax and then computes
cross-entropy
• We use the AdamOptimizer and specify a learning rate
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Saver Object
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Loading Images
• Iterate over each image, resize to 32x32
• Create a one hot encoding of our class
• Shuffle the entire dataset
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Splitting the Dataset
• Split our data set into train and test
• We truncate our sets to a multiple of batch size (all batches
have to be the same size)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Training Loop
• Iterate over each batch and train on it
• (we assume training examples are a multiple of the batch size)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Best Model
• We evaluate the loss on all of our training examples and test
examples
• If the validation loss is lower than the lowest loss, we save our
model
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
Expected Output
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
TrafficLight Classification with TensorFlow
We will be implementing a simple traffic light classifier, with 3 classes (red, green, yellow)
Lex Fridman:
fridman@mit.edu
Website:
cars.mit.edu
January
2017
Course 6.S094:
Deep Learning for Self-Driving Cars
References
All references cited in this presentation are listed in the
following Google Sheets file:
https://goo.gl/9Xhp2t

Convolutional Neural Networks(CNN) and Computer Vision