1
Sunday 11 December 16 Jorge Bouças, Bioinformatics Core Facility, MPI-AGE, Köln
Actionable data in life sciences
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
2
Performance
request for data analysis
reply with results
time
•  background / scientific question
•  metadata collection
•  data transfer
•  data analysis
•  validation
•  data transfer
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
3
Performance
request for data analysis
reply with results
time
•  background / scientific question
•  metadata collection
•  data transfer
•  data analysis
•  validation
•  data transfer
No build test
No integration test
Tailor cut validation
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
4
Performance
request for data analysis
reply with results
time
•  background / scientific question
•  metadata collection
•  data transfer
•  data analysis
•  validation
•  data transfer
structured
inplace
actionable
24/7
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
5
Performance
þ Network
þ Storage
þ CPUs
þ Memory
þ Software
þ Algorithms
¨ Human
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
6
Performance
þ Network
þ Storage
þ CPUs
þ Memory
þ Software
þ Algorithms
¨ Human
"Nur 8,3 Prozent der Stellen für
Informatiker können problemlos besetzt
werden.”
http://www.golem.de
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
7
Performance
þ Network
þ Storage
þ CPUs
þ Memory
þ Software
þ Algorithms
¨ Human
Data Science
Computer
Science
Math &
Statistics
Subject Matter
Expertise
/
biology
Unicorn
Trad.
Research
Trad.
Software
Machine
Learning
Copyright 2014 by Steven Geringer Raleigh, NC.
Permission is granted to use, distribute, or modify this
image, provided that this copyright remains intact
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
8
Performance
þ Network
þ Storage
þ CPUs
þ Memory
þ Software
þ Algorithms
¨ Human
“… It appears that the development of effective human
cooperation and the development of man-computer
symbiosis are "chicken-and-egg" problems. It will take
unusual human teamwork to set up a truly workable
man-computer partnership, and it will take man-computer
partnerships to engender and facilitate the human
cooperation.
…if the required solutions are not ready, it would not be
good to wait for them.”
Licklieder JRC, Clark WE, On-line man-computer communication,
Proceedings of the May 1-3, 1962, spring joint computer conference
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
9
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
10
Berlin
Garching
Köln
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
11
Berlin
Garching
Köln
TAPE
in-house
curl / wget
md5sum
bit -g
www
rsync
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
12
Berlin
Garching
Köln
results
8kb .. 8gb
private link
21d public link
write upload log on wiki
with perma links
push code
https://to.data
bit -i <myfile.txt> -m <code and data message>
customer
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
13
Berlin
Garching
Köln
results
8kb .. 8gb
private link
21d public link
write upload log on wiki
with perma links
push code
https://to.data
bit -i <myfile.txt> -m <code and data message>
customer
Binding of Results & Code
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
HPC
git
datashare
14
Berlin
Garching
Köln
results
8kb .. 8gb
private link
21d public link
write upload log on wiki
with perma links
push code
https://to.data
bit -i <myfile.txt> -m <code and data message>
customer
Binding of Results & Code
> 30 projects / 3 analysts
1 project:
> 1000 GB data
> 1000 files
> 1000 lines of code (with dependencies)
> 10-40 change actions
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
15
HPC datashare git
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
16
HPC datashare git
bit --start <DP_project_name>
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
17
HPC datashare git
bit --start <DP_project_name>
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
18
HPC datashare git
bit -i <myfile.txt> -m <code and data message>
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
19
HPC datashare git
bit -i <myfile.txt> -m <code and data message>
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
20
HPC datashare git
bit -c <folder_to_create>
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
21
HPC datashare git
bit -g <folder_or_file_to_download>
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
22
HPC HPC2
bit --sync <folder_or_file_to_sync> --sync_to <Uname@HPC2>
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
23
HPC HPC2
bit --sync <folder_or_file_to_sync> --sync_from <Uname@HPC2>
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
24
HPC git
bit --adduser
Garching
HPC
Köln
HPC
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“On-line man-computer communication”
git
datashare
25
Berlin
Garching
results
8kb .. 8gb
private link
21d public link
write upload log on wiki
with perma links
push code
https://to.data
customer
user1
user2
user3
pull code
Garching
HPC
Köln
HPC
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
github.com/owncloud/pyocclient
datashare
26
Garching
results
8kb .. 8gb
private link
21d public link
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
github.com/owncloud/pyocclient
27
REST API
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
github.com/owncloud/pyocclient
28
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
github.com/owncloud/pyocclient
29
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
Why?
30
ownCloud
http tmp link >> download.
simplicity
Github
“With statement-by-statement compiling and testing and with
computer-aided book-keeping and program integration, a few very
talented men may be able to handle in weeks programming tasks that
ordinarily require many people and many months.”
Licklieder JRC, Clark WE, On-line man-computer communication,
Proceedings of the May 1-3, 1962, spring joint computer conference
ownCloud + Github
data & metadata management
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
Front-end
31
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
“Back-end”
32
register
http://www.mpcdf.mpg.de/userspace/forms/onlineregistrationform
Sys. Admin. (MPI-AGE)
Github (MPI-MOLGEN)
user
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
33
Performance
request for data analysis
reply with results
time
•  background / scientific question
•  metadata collection
•  data transfer
•  data analysis
•  validation
•  data transfer
bit
Jorge Bouças, Bioinformatics Core FacilitySunday 11 December 16
[b]ermuda [i]nformation [t]riangle
34
github.com/mpg-age-bioinformatics/AGEpy

Actionable data in life sciences

  • 1.
    1 Sunday 11 December16 Jorge Bouças, Bioinformatics Core Facility, MPI-AGE, Köln Actionable data in life sciences
  • 2.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 2 Performance request for data analysis reply with results time •  background / scientific question •  metadata collection •  data transfer •  data analysis •  validation •  data transfer
  • 3.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 3 Performance request for data analysis reply with results time •  background / scientific question •  metadata collection •  data transfer •  data analysis •  validation •  data transfer No build test No integration test Tailor cut validation
  • 4.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 4 Performance request for data analysis reply with results time •  background / scientific question •  metadata collection •  data transfer •  data analysis •  validation •  data transfer structured inplace actionable 24/7
  • 5.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 5 Performance þ Network þ Storage þ CPUs þ Memory þ Software þ Algorithms ¨ Human
  • 6.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 6 Performance þ Network þ Storage þ CPUs þ Memory þ Software þ Algorithms ¨ Human "Nur 8,3 Prozent der Stellen für Informatiker können problemlos besetzt werden.” http://www.golem.de
  • 7.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 7 Performance þ Network þ Storage þ CPUs þ Memory þ Software þ Algorithms ¨ Human Data Science Computer Science Math & Statistics Subject Matter Expertise / biology Unicorn Trad. Research Trad. Software Machine Learning Copyright 2014 by Steven Geringer Raleigh, NC. Permission is granted to use, distribute, or modify this image, provided that this copyright remains intact
  • 8.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 8 Performance þ Network þ Storage þ CPUs þ Memory þ Software þ Algorithms ¨ Human “… It appears that the development of effective human cooperation and the development of man-computer symbiosis are "chicken-and-egg" problems. It will take unusual human teamwork to set up a truly workable man-computer partnership, and it will take man-computer partnerships to engender and facilitate the human cooperation. …if the required solutions are not ready, it would not be good to wait for them.” Licklieder JRC, Clark WE, On-line man-computer communication, Proceedings of the May 1-3, 1962, spring joint computer conference
  • 9.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” HPC git datashare 9
  • 10.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” HPC git datashare 10 Berlin Garching Köln
  • 11.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” HPC git datashare 11 Berlin Garching Köln TAPE in-house curl / wget md5sum bit -g www rsync
  • 12.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” HPC git datashare 12 Berlin Garching Köln results 8kb .. 8gb private link 21d public link write upload log on wiki with perma links push code https://to.data bit -i <myfile.txt> -m <code and data message> customer
  • 13.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” HPC git datashare 13 Berlin Garching Köln results 8kb .. 8gb private link 21d public link write upload log on wiki with perma links push code https://to.data bit -i <myfile.txt> -m <code and data message> customer Binding of Results & Code
  • 14.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” HPC git datashare 14 Berlin Garching Köln results 8kb .. 8gb private link 21d public link write upload log on wiki with perma links push code https://to.data bit -i <myfile.txt> -m <code and data message> customer Binding of Results & Code > 30 projects / 3 analysts 1 project: > 1000 GB data > 1000 files > 1000 lines of code (with dependencies) > 10-40 change actions
  • 15.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 15 HPC datashare git
  • 16.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 16 HPC datashare git bit --start <DP_project_name>
  • 17.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 17 HPC datashare git bit --start <DP_project_name>
  • 18.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 18 HPC datashare git bit -i <myfile.txt> -m <code and data message>
  • 19.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 19 HPC datashare git bit -i <myfile.txt> -m <code and data message>
  • 20.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 20 HPC datashare git bit -c <folder_to_create>
  • 21.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 21 HPC datashare git bit -g <folder_or_file_to_download>
  • 22.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 22 HPC HPC2 bit --sync <folder_or_file_to_sync> --sync_to <Uname@HPC2>
  • 23.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 23 HPC HPC2 bit --sync <folder_or_file_to_sync> --sync_from <Uname@HPC2>
  • 24.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” 24 HPC git bit --adduser
  • 25.
    Garching HPC Köln HPC Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “On-line man-computer communication” git datashare 25 Berlin Garching results 8kb .. 8gb private link 21d public link write upload log on wiki with perma links push code https://to.data customer user1 user2 user3 pull code
  • 26.
    Garching HPC Köln HPC Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 github.com/owncloud/pyocclient datashare 26 Garching results 8kb .. 8gb private link 21d public link
  • 27.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 github.com/owncloud/pyocclient 27 REST API
  • 28.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 github.com/owncloud/pyocclient 28
  • 29.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 github.com/owncloud/pyocclient 29
  • 30.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 Why? 30 ownCloud http tmp link >> download. simplicity Github “With statement-by-statement compiling and testing and with computer-aided book-keeping and program integration, a few very talented men may be able to handle in weeks programming tasks that ordinarily require many people and many months.” Licklieder JRC, Clark WE, On-line man-computer communication, Proceedings of the May 1-3, 1962, spring joint computer conference ownCloud + Github data & metadata management
  • 31.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 Front-end 31
  • 32.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 “Back-end” 32 register http://www.mpcdf.mpg.de/userspace/forms/onlineregistrationform Sys. Admin. (MPI-AGE) Github (MPI-MOLGEN) user
  • 33.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 33 Performance request for data analysis reply with results time •  background / scientific question •  metadata collection •  data transfer •  data analysis •  validation •  data transfer bit
  • 34.
    Jorge Bouças, BioinformaticsCore FacilitySunday 11 December 16 [b]ermuda [i]nformation [t]riangle 34 github.com/mpg-age-bioinformatics/AGEpy