Data Science
Bootcamp, Day 1
Presented By:
Chetan Khatri, Volunteer Teaching assistant,
Data Science Lab.
Guidance By: Prof. Devji D. Chhanga, University
of Kachchh
Agenda
An Introduction to Data Science with Industrial perspective.
An Introduction to Distributing System
CAP Theorem
Collection frameworks in Java
Querying in
Distributed
MySQL
Example
â—Ź Facebook CEO Mark Zukerberg wants to
Analyze Data, he wants to know How many
Daily Users ? , How many Daily Messages ?
● Assume that, Facebook’s Data center is
available at California, Germany, Japan,
Bangalore, Kenya.
â—Ź Assume that, Facebook is using MySQL as
their Data storage RDBMS, How can he get it ?
â—Ź SQL ???
Querying in Distributed MySQL (conti…)
Let’s think ! You have to Query for Single Node ! Let’s Start !
Mark wants to have Analytics Chart so he can do Analytics on top of that, so
he can have ratios of customer behaviour such as how many users are
churning / leaving his platform which includes the Facebook + WhatsApp !
Assume, Table Structure are as below.user_master
User_id (PK)
Created_on(DATE)
last_updated_on(DATE)
transaction_master
trans_id (PK)
user_id(FK)
timespan(DATE)
Querying in Distributed MySQL (conti…)
1)Daily Users
Desired Output:
Date Daily Users
12-05-2016 32,00,000
13-05-2016 21,00,854
14-05-2016 22,54,246
15-05-2016 32,51,230
Query:
Select last_updated_on as “Date” , count(user_id) as
“Daily Users” from user_master group by
last_updated_on order by last_updated_on;
Querying in Distributed MySQL (conti…)
1)Daily Messages by User
Desired Output:
User Date Messages
Drew Houston 12-08-2016 700
Satya Nadella 12-08-2016 652
Sundar Pichai 12-08-2016 352
Tim Cook 12-08-2016 154
Query:
Home work !
Collection Framework in Java
How could you think About Hashset in Java?
How could you think About ArrayList in Java?
Concurrency in Java
Why Threading ?
How it can help you to optimize the performance /
throughput ?
Q & A session
Questions Please !!
Thankyou
Chetan Khatri, Volunteer Teaching Assistant, Data Science Lab,
University of Kachchh.
Email: chetan@kutchuni.edu.in
Github Data Science Lab: https://github.com/dskskv
CCCS936 Repository: https://github.com/dskskv/CCCS936

Data science bootcamp day1

  • 1.
    Data Science Bootcamp, Day1 Presented By: Chetan Khatri, Volunteer Teaching assistant, Data Science Lab. Guidance By: Prof. Devji D. Chhanga, University of Kachchh
  • 2.
    Agenda An Introduction toData Science with Industrial perspective. An Introduction to Distributing System CAP Theorem Collection frameworks in Java
  • 3.
    Querying in Distributed MySQL Example ● FacebookCEO Mark Zukerberg wants to Analyze Data, he wants to know How many Daily Users ? , How many Daily Messages ? ● Assume that, Facebook’s Data center is available at California, Germany, Japan, Bangalore, Kenya. ● Assume that, Facebook is using MySQL as their Data storage RDBMS, How can he get it ? ● SQL ???
  • 4.
    Querying in DistributedMySQL (conti…) Let’s think ! You have to Query for Single Node ! Let’s Start ! Mark wants to have Analytics Chart so he can do Analytics on top of that, so he can have ratios of customer behaviour such as how many users are churning / leaving his platform which includes the Facebook + WhatsApp ! Assume, Table Structure are as below.user_master User_id (PK) Created_on(DATE) last_updated_on(DATE) transaction_master trans_id (PK) user_id(FK) timespan(DATE)
  • 5.
    Querying in DistributedMySQL (conti…) 1)Daily Users Desired Output: Date Daily Users 12-05-2016 32,00,000 13-05-2016 21,00,854 14-05-2016 22,54,246 15-05-2016 32,51,230 Query: Select last_updated_on as “Date” , count(user_id) as “Daily Users” from user_master group by last_updated_on order by last_updated_on;
  • 6.
    Querying in DistributedMySQL (conti…) 1)Daily Messages by User Desired Output: User Date Messages Drew Houston 12-08-2016 700 Satya Nadella 12-08-2016 652 Sundar Pichai 12-08-2016 352 Tim Cook 12-08-2016 154 Query: Home work !
  • 7.
    Collection Framework inJava How could you think About Hashset in Java? How could you think About ArrayList in Java?
  • 8.
    Concurrency in Java WhyThreading ? How it can help you to optimize the performance / throughput ?
  • 9.
    Q & Asession Questions Please !!
  • 10.
    Thankyou Chetan Khatri, VolunteerTeaching Assistant, Data Science Lab, University of Kachchh. Email: chetan@kutchuni.edu.in Github Data Science Lab: https://github.com/dskskv CCCS936 Repository: https://github.com/dskskv/CCCS936