Yousef Elbayoumi
January 20, 2022
2
• Executive Summary
• Introduction
• Methodology
• Results
• Conclusion
• Appendix
Outline
3
• Data were collected with several ways
• Machine learning models were built
• Data visualizations were created
Executive Summary
• Summary of methodologies
• Summary of all results
• The optimal model was acquired
• Visualizations were great for decision making
4
Introduction
• Project background and
context
In this project I will work in SpaceX company
and try to predict the Falcon 9 first stage. It’s
important to know if the rockets will land
successfully or not because the failure will cost
the company much resources.
• Problems that need answers
1. Which factors are behind the failure of landing?
2. Will the rockets land successfully?
3. What the accuracy of a successful landing?
5
6
Executive Summary
• Data collection methodology:
• With Rest API and Web Scrapping
• Perform data wrangling
• Data were transformed and one hot encoded to be apply later on the Machine Learning models
• Perform exploratory data analysis (EDA) using visualization and SQL
• Discovering new patterns in the data with visualization techniques such as scatter plots
• Perform interactive visual analytics using Folium and Plotly Dash
• Dash and Folium were used to achieve this goal
• Perform predictive analysis using classification models
• Classification machine learning models were built to achieve this goal
Methodology
7
Data sets were collected using the API call from several websites, I collected
rocket, launchpad, payloads, and cores data from https://api.spacexdata.com/v4
website.
Data Collection
1. Collecting the data with API call
2. Converting to data frame with help of
JSON
3. Updating columns and rows (pre-processing)
4. Filtering the data to keep only Falcon 9
launches
5. Convert the data to csv file with name
‘dataset_part_1.csv’
Data Collection – SpaceX API
8
1. Collecting the data with API call
2. Converting to data frame with help of
JSON
3. Updating columns and rows (pre-processing)
4. Filtering the data to keep only Falcon 9
launches
5. Convert the data to csv file with name
‘dataset_part_1.csv’ GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/1.%20spacex-data-
collection-api
9
Data Collection - Scraping
1. Creating the BeautifulSoup
object
2. Getting column names
3. Creating the launch_dict
4. Converting to final data frame
5. Convert the data to csv file with name
‘spacex_web_scraped.csv’
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/2.%20webscraping
10
Data Wrangling
1. Loading the data set
2. Creating landing outcomes
3. Finding the bad outcomes
4. Presenting outcomes as 0 and 1
5. Determining the success outcome
6. Convert the data to csv file with name
‘dataset_part_2.csv’
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/3.%20spacex-data-
wrangling
11
EDA with Data Visualization
Categorial plot between Flight number and Pay load mass (kg)
Bar chart between Orbit
and Success rate of each
orbit
Scatter plot between Orbit and Flight number
Line chart
between Year
and Success
rate
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/5.%20eda-dataviz
12
I used SQL queries to answer the following
questions:
• Display the names of the unique launch sites in the space mission
• Display 5 records where launch sites begin with the string 'CCA'
• Display the total payload mass carried by boosters launched by NASA (CRS)
• Display average payload mass carried by booster version F9 v1.1
• List the date when the first successful landing outcome in-ground pad was achieved
• List the names of the boosters which have success in drone ship and have payload mass greater
than 4000 but less than 6000
• List the total number of successful and failure mission outcomes
• List the names of the booster_versions which have carried the maximum payload mass. Use a
subquery
• List the failed landing_outcomes in drone ship, their booster versions, and launch site names for
the in year 2015
• Rank the count of landing outcomes (such as Failure (drone ship) or Success (ground pad))
between the date 2010-06-04 and 2017-03-20, in descending order
EDA with SQL
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/4.%20eda-sql
13
• folium.Marker() was used to create marks on the maps.
• folium.Circle() was used to create a circles above markers on the map.
• folium.Icon() was used to create an icon on the map.
• folium.PolyLine() was used to create polynomial line between the points.
• folium.plugins.AntPath() was used to create animated line between the points.
• markerCluster() was used to simplify the maps which contain several markers
with identical coordination.
Build an Interactive Map with Folium
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/6.%20launch-site-
location
14
Build a Dashboard with Plotly Dash
• Dash and html components were used as they are the most important thing and
almost everything depends on them, such as graphs, tables, dropdowns, etc.
• Pandas was used to simplifying the work by creating dataframe.
• Plotly was used to plot the graphs.
• Pie chart and scatter chart were used to for plotting purposes.
• Rangeslider was used for payload mass range selection.
• Dropdown was used for launch sites.
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/7.%20dashboard
Predictive Analysis (Classification)
1. Building the model
2. Evaluating the model
3. Finding the optimal model
15
Create column for the class
Standardize the data
Split the data info train and test sets
Build GridSearchCV model and fit the data
Find the best hyperparameters for the models
Find the best model with highest accuracy
Confirm the optimal model
Calculating the accuracies
Calculating the confusion matrixes
Plot the results
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project/tree/main/8.%20machine-
learning-prediction
• Exploratory data analysis results
• Interactive analytics demo in screenshots
• Predictive analysis results
16
Results
18
With the increase of flight
number, the success rate is
increasing as well in the
launch sites
Flight Number vs. Launch Site
19
Payload vs. Launch Site
With the increase of Pay load
Mass, the success rate is
increasing as well in the launch
sites
20
Success Rate vs. Orbit Type
ES-L1, GEO, HEO, and SSO have
a success rate of 100%
SO has a success rate of 0%
21
Flight Number vs. Orbit Type
It’s hard to tell anything here,
but we can say there is no
actual relationship between
flight number and GTO.
22
Payload vs. Orbit Type
First thing to see is how the Pay
load Mass between 2000 and 3000
is affecting ISS.
Similarly, Pay load Mass between
3000 and 7000 is affecting GTO.
23
Launch Success Yearly Trend
Since the year 2013, there was
a massive increase in success
rate. However, it dropped little
in 2018 but later it got stronger
than before.
24
All Launch Site Names
We can get the unique
values by using “DISTINCT”
25
Launch Site Names Begin with 'CCA'
We can get only 5 rows by using “LIMIT”
26
Total Payload Mass
We can get the sum of all
values by using “SUM”
27
Average Payload Mass by F9 v1.1
We can get the average of
all values by using “AVG”
28
First Successful Ground Landing Date
We can get the first
successful data by using
“MIN”, because first date is
same with the minimum date
29
Successful Drone Ship Landing with Payload between 4000 and 6000
The payload mass data was
taken between 4000 and
6000 only, and the landing
outcome was determined to
be “success drone ship”
30
Total Number of Successful and Failure Mission Outcomes
We can get the number of all
the successful mission by
using “COUNT” and LIKE
“Success%”
We can get the number of all
the failure mission by using
“COUNT” and LIKE
“Failure%”
31
Boosters Carried Maximum Payload
We can get the maximum
payload masses by using
“MAX”
32
2015 Launch Records
We can get the months by
using month(DATE) and in
the WHERE function we
assigned the year value to
“2015”
33
Rank Landing Outcomes Between 2010-06-04 and 2017-03-20
By using “ORDER” we can order
the values in descending order,
and with “COUNT” we can count
all numbers as we did previously
35
All Launch Sites’ Location Markers
All the launches are near
USA, Florida, and California
36
Color-labeled Launch Outcomes
Green means successful
Red means Failure
37
Launch Sites to its Proximities
All distances from launch
sites to its proximities, they
weren’t far from railway
tracks.
39
Launch Success Count
KSC LC-39A has the highest success score with 41.7%
CCAFS LC-40 comes next with 29.2%
Finally, VAFB SLC-4E and CCAFS SLC-40 with 16.7% and 12.5% respectively
40
Launch Site with Highest Score
KSC LC-39A has the highest score with 76.9% with payload range of
2000 kg – 10000 kg, and FT booster version has the highest score
41
Payload vs. Launch Outcome
Payload 0 kg – 5000 kg
(first half)
Payload 6000 kg – 10000 kg
(second half)
43
Classification Accuracy
Decision Tree has the
highest accuracy with almost
0.89, then comes the
remaining models with almost
same accuracy of 0.84
44
Confusion Matrix
Sensitivity = 1.00, formula: TPR = TP / (TP + FN)
Specificity = 0.50, formula: SPC = TN / (FP + TN)
Precision = 0.80, formula: PPV = TP / (TP + FP)
Accuracy = 0.83, formula: ACC = (TP + TN) / (P + N)
F1 Score = 0.89, formula: F1 = 2TP / (2TP + FP + FN)
False Positive Rate = 0.50, formula: FPR = FP / (FP + TN)
False Discovery Rate = 0.20, formula: FDR = FP / (FP +
TP)
True Positive
(TP)
False
Positive (FP)
False
Negative (FN)
True
Negative
(TN)
45
• We found the site with highest score which was KSC LC-39A
• The payload of 0 kg to 5000 kg was more diverse than 6000 kg to 10000 kg
• Decision Tree was the optimal model with accuracy of almost 0.89
• We calculated the launch sites distance to its proximities
Conclusions
46
All codes can be found on my GitHub
Appendix
GitHub repo: https://github.com/yousefhosam-
b/IBM_SpaceX_Capstone_Project
IBM SpaceX Capstone Project

IBM SpaceX Capstone Project

  • 1.
  • 2.
    2 • Executive Summary •Introduction • Methodology • Results • Conclusion • Appendix Outline
  • 3.
    3 • Data werecollected with several ways • Machine learning models were built • Data visualizations were created Executive Summary • Summary of methodologies • Summary of all results • The optimal model was acquired • Visualizations were great for decision making
  • 4.
    4 Introduction • Project backgroundand context In this project I will work in SpaceX company and try to predict the Falcon 9 first stage. It’s important to know if the rockets will land successfully or not because the failure will cost the company much resources. • Problems that need answers 1. Which factors are behind the failure of landing? 2. Will the rockets land successfully? 3. What the accuracy of a successful landing?
  • 5.
  • 6.
    6 Executive Summary • Datacollection methodology: • With Rest API and Web Scrapping • Perform data wrangling • Data were transformed and one hot encoded to be apply later on the Machine Learning models • Perform exploratory data analysis (EDA) using visualization and SQL • Discovering new patterns in the data with visualization techniques such as scatter plots • Perform interactive visual analytics using Folium and Plotly Dash • Dash and Folium were used to achieve this goal • Perform predictive analysis using classification models • Classification machine learning models were built to achieve this goal Methodology
  • 7.
    7 Data sets werecollected using the API call from several websites, I collected rocket, launchpad, payloads, and cores data from https://api.spacexdata.com/v4 website. Data Collection 1. Collecting the data with API call 2. Converting to data frame with help of JSON 3. Updating columns and rows (pre-processing) 4. Filtering the data to keep only Falcon 9 launches 5. Convert the data to csv file with name ‘dataset_part_1.csv’
  • 8.
    Data Collection –SpaceX API 8 1. Collecting the data with API call 2. Converting to data frame with help of JSON 3. Updating columns and rows (pre-processing) 4. Filtering the data to keep only Falcon 9 launches 5. Convert the data to csv file with name ‘dataset_part_1.csv’ GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project/tree/main/1.%20spacex-data- collection-api
  • 9.
    9 Data Collection -Scraping 1. Creating the BeautifulSoup object 2. Getting column names 3. Creating the launch_dict 4. Converting to final data frame 5. Convert the data to csv file with name ‘spacex_web_scraped.csv’ GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project/tree/main/2.%20webscraping
  • 10.
    10 Data Wrangling 1. Loadingthe data set 2. Creating landing outcomes 3. Finding the bad outcomes 4. Presenting outcomes as 0 and 1 5. Determining the success outcome 6. Convert the data to csv file with name ‘dataset_part_2.csv’ GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project/tree/main/3.%20spacex-data- wrangling
  • 11.
    11 EDA with DataVisualization Categorial plot between Flight number and Pay load mass (kg) Bar chart between Orbit and Success rate of each orbit Scatter plot between Orbit and Flight number Line chart between Year and Success rate GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project/tree/main/5.%20eda-dataviz
  • 12.
    12 I used SQLqueries to answer the following questions: • Display the names of the unique launch sites in the space mission • Display 5 records where launch sites begin with the string 'CCA' • Display the total payload mass carried by boosters launched by NASA (CRS) • Display average payload mass carried by booster version F9 v1.1 • List the date when the first successful landing outcome in-ground pad was achieved • List the names of the boosters which have success in drone ship and have payload mass greater than 4000 but less than 6000 • List the total number of successful and failure mission outcomes • List the names of the booster_versions which have carried the maximum payload mass. Use a subquery • List the failed landing_outcomes in drone ship, their booster versions, and launch site names for the in year 2015 • Rank the count of landing outcomes (such as Failure (drone ship) or Success (ground pad)) between the date 2010-06-04 and 2017-03-20, in descending order EDA with SQL GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project/tree/main/4.%20eda-sql
  • 13.
    13 • folium.Marker() wasused to create marks on the maps. • folium.Circle() was used to create a circles above markers on the map. • folium.Icon() was used to create an icon on the map. • folium.PolyLine() was used to create polynomial line between the points. • folium.plugins.AntPath() was used to create animated line between the points. • markerCluster() was used to simplify the maps which contain several markers with identical coordination. Build an Interactive Map with Folium GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project/tree/main/6.%20launch-site- location
  • 14.
    14 Build a Dashboardwith Plotly Dash • Dash and html components were used as they are the most important thing and almost everything depends on them, such as graphs, tables, dropdowns, etc. • Pandas was used to simplifying the work by creating dataframe. • Plotly was used to plot the graphs. • Pie chart and scatter chart were used to for plotting purposes. • Rangeslider was used for payload mass range selection. • Dropdown was used for launch sites. GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project/tree/main/7.%20dashboard
  • 15.
    Predictive Analysis (Classification) 1.Building the model 2. Evaluating the model 3. Finding the optimal model 15 Create column for the class Standardize the data Split the data info train and test sets Build GridSearchCV model and fit the data Find the best hyperparameters for the models Find the best model with highest accuracy Confirm the optimal model Calculating the accuracies Calculating the confusion matrixes Plot the results GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project/tree/main/8.%20machine- learning-prediction
  • 16.
    • Exploratory dataanalysis results • Interactive analytics demo in screenshots • Predictive analysis results 16 Results
  • 18.
    18 With the increaseof flight number, the success rate is increasing as well in the launch sites Flight Number vs. Launch Site
  • 19.
    19 Payload vs. LaunchSite With the increase of Pay load Mass, the success rate is increasing as well in the launch sites
  • 20.
    20 Success Rate vs.Orbit Type ES-L1, GEO, HEO, and SSO have a success rate of 100% SO has a success rate of 0%
  • 21.
    21 Flight Number vs.Orbit Type It’s hard to tell anything here, but we can say there is no actual relationship between flight number and GTO.
  • 22.
    22 Payload vs. OrbitType First thing to see is how the Pay load Mass between 2000 and 3000 is affecting ISS. Similarly, Pay load Mass between 3000 and 7000 is affecting GTO.
  • 23.
    23 Launch Success YearlyTrend Since the year 2013, there was a massive increase in success rate. However, it dropped little in 2018 but later it got stronger than before.
  • 24.
    24 All Launch SiteNames We can get the unique values by using “DISTINCT”
  • 25.
    25 Launch Site NamesBegin with 'CCA' We can get only 5 rows by using “LIMIT”
  • 26.
    26 Total Payload Mass Wecan get the sum of all values by using “SUM”
  • 27.
    27 Average Payload Massby F9 v1.1 We can get the average of all values by using “AVG”
  • 28.
    28 First Successful GroundLanding Date We can get the first successful data by using “MIN”, because first date is same with the minimum date
  • 29.
    29 Successful Drone ShipLanding with Payload between 4000 and 6000 The payload mass data was taken between 4000 and 6000 only, and the landing outcome was determined to be “success drone ship”
  • 30.
    30 Total Number ofSuccessful and Failure Mission Outcomes We can get the number of all the successful mission by using “COUNT” and LIKE “Success%” We can get the number of all the failure mission by using “COUNT” and LIKE “Failure%”
  • 31.
    31 Boosters Carried MaximumPayload We can get the maximum payload masses by using “MAX”
  • 32.
    32 2015 Launch Records Wecan get the months by using month(DATE) and in the WHERE function we assigned the year value to “2015”
  • 33.
    33 Rank Landing OutcomesBetween 2010-06-04 and 2017-03-20 By using “ORDER” we can order the values in descending order, and with “COUNT” we can count all numbers as we did previously
  • 35.
    35 All Launch Sites’Location Markers All the launches are near USA, Florida, and California
  • 36.
    36 Color-labeled Launch Outcomes Greenmeans successful Red means Failure
  • 37.
    37 Launch Sites toits Proximities All distances from launch sites to its proximities, they weren’t far from railway tracks.
  • 39.
    39 Launch Success Count KSCLC-39A has the highest success score with 41.7% CCAFS LC-40 comes next with 29.2% Finally, VAFB SLC-4E and CCAFS SLC-40 with 16.7% and 12.5% respectively
  • 40.
    40 Launch Site withHighest Score KSC LC-39A has the highest score with 76.9% with payload range of 2000 kg – 10000 kg, and FT booster version has the highest score
  • 41.
    41 Payload vs. LaunchOutcome Payload 0 kg – 5000 kg (first half) Payload 6000 kg – 10000 kg (second half)
  • 43.
    43 Classification Accuracy Decision Treehas the highest accuracy with almost 0.89, then comes the remaining models with almost same accuracy of 0.84
  • 44.
    44 Confusion Matrix Sensitivity =1.00, formula: TPR = TP / (TP + FN) Specificity = 0.50, formula: SPC = TN / (FP + TN) Precision = 0.80, formula: PPV = TP / (TP + FP) Accuracy = 0.83, formula: ACC = (TP + TN) / (P + N) F1 Score = 0.89, formula: F1 = 2TP / (2TP + FP + FN) False Positive Rate = 0.50, formula: FPR = FP / (FP + TN) False Discovery Rate = 0.20, formula: FDR = FP / (FP + TP) True Positive (TP) False Positive (FP) False Negative (FN) True Negative (TN)
  • 45.
    45 • We foundthe site with highest score which was KSC LC-39A • The payload of 0 kg to 5000 kg was more diverse than 6000 kg to 10000 kg • Decision Tree was the optimal model with accuracy of almost 0.89 • We calculated the launch sites distance to its proximities Conclusions
  • 46.
    46 All codes canbe found on my GitHub Appendix GitHub repo: https://github.com/yousefhosam- b/IBM_SpaceX_Capstone_Project