The
Dark Matter
Mystery
The
big data universe.
Literally
D r M a g g i e L i e u
@space_mog
The
big data universe.
Literally
Bruno Merin & Beatriz Martinez
1 black hole
M87’s
super massive
black hole
10,000,000 x smaller
5x smaller
M87
0.5 deg
0.1 deg
A telescope the size of the Earth!
Event Horizon Telescope EHT
- Radio telescopes from around the world
📸 : Flora Graham
5PB of data
1 blackhole = 5000TB
1TB/day 8TB/day
Global average broadband
Download Upload
57.91Mbs
28.68Mbs
Global average broadband
57.91
Download
Mbs
28.68
Upload
Mbs
Upload time =
data volume
rate
= 5000000000[MB] x 8[Mb/MB]
28.68 Mbs
=
44 years!!!!
European space agency
📸 : MIT Haystack Observatory
📸 : Katie Bouman
2 years of processing
But space data is different….
‣ coming soon in 2022
Euclid
Gravitational lensing
background galaxies
Weak gravitational lensing
foreground mass
More mass = more distorted galaxies!
real galaxy shear
atmosphere &
telescope
blur
pixelised by
detectors noise
photo cred: CFHT
Euclid Wide Survey
0°
30°
60°
-30°
-60°
315°0°45° 270° 225°90°135°
Moon for
comparison
‣ 15,000 deg2
Flagship simulations ‣ End-2-end simulations
‣ Largest simulated galaxy
catalogue ever built
‣ 2 trillion dark matter particles
‣ Swiss National Computing Centre - 6th fastest computer in the world
‣ 80hrs
‣ 270,000 EUR
Piz Daint - over 5000 GPU nodes
‣ 5000 deg2
‣ Raw simulation data: 0.4PB
‣ Compress to catalogs
‣ Rockstar (5.5TB)
‣ 2D dark matter count maps (1TB)
Sun
L1 L2
Moon
Earth
1,500,000km
Collecting the data
DSA-2
DSA-3
ESA-ESOC
‣ ESA deep space tracking stations
DSA-1
‣ Ground station: Cebreros, Spain
‣ 4 hr communication window
‣ Steerable K-band (26 GHz)
‣ X-band (8.5 GHz)
‣ Data rate: 850 Gbit/day
‣ On board 4Tbit flash memory
Telemetry
‣ Science centre: ESAC, Spain
‣ Ex-telemetry, tracking & commanding station
‣ Quick look analysis, archiving and distribution
Storage
Attractive data
No longer era where we fight for data, but era that we choose data!
Visualising the data
‣ ESA sky
‣ HiPS map, based on HEALpix
‣ Visualise TB’s of data
‣ Render kB of data
Fernique et al 2015
www.sky.esa.int
Code-to-data platform
‣ Science Exploration platform:
‣ Jupyter notebooks on SPARK clusters
‣ Cloud computing:
‣ Amazon web services,
‣ Google cloud, etc
See Lieu+18
‣ Raw data is nasty!
‣ with GB’s data per day, traditional methods are not efficient
Analysing the data
Machine learning
Classification and detection
‣ K-means
Characterising
spectra of galaxies,
Rahmani+18
Classification and detection
‣ DBSCAN
core
core border
noise
Stars in a star forming
region, Canovas+19
Classification and detection
‣ K-nearest neighbour
Defining types of
supernovae,
Lochner+16
Classification and detection
‣ Decision tree
T > 2 S > 5
F>0. 3
D
CB
A
T F
T T FF
Finding weird
galaxies, Baron+16
Classification and detection
‣ Convolutional neural networks (& object detection)
Strong gravitational
lensing, Schaefer+17
Asteroids, Lieu+19
Classification and detection
‣ Transfer learning
Freeze
Replace
Classification and detection
‣ Transfer learning
Citizen science
‣ Outsource tasks to the general public
‣ Zooniverse platform: easy to build projects
‣ 100’s of Projects
‣ 250M classifications
‣ 2M Volunteers
Citizen science
Classify galaxy morphologies:
Lintott+08
Citizen science
Find star forming bubbles:
Kendrew+12
Citizen science
Discover glitches in
gravitational wave signals:
Zevity+17
Data & model compression
‣ Neural networks & emulators
Emulate the halo mass
function with mixture
density networks,
Lieu+in. prep
Emulate cosmology with
neural density estimators,
Alsing+2019
Scaling relations with
principle component
analyses PCA,
Bothwell+2016
‣ Edge computing:
‣ Some form in Gaia
‣ Mars rovers
Upcoming methods
‣ Continuous learning with GANS
Upcoming methods
‣ Federated learning
Upcoming methods
Super Model
ESAC
Data
10PB of data is nothing…
‣ First light: 2022
‣ Duration: 10 years
‣ Data: 15TB/day
‣ Search for transients (supernova, asteroids, comets, gamma ray bursts)
‣ Gravitational lensing img: Pursiainen
Can it get any worse…?
‣ SKA - square kilometre array
‣ First light: 2030
‣ Data: 2TB/sec
Karoo, South Africa
SKA-Mid
MRO, Australia
SKA-Low
Centaurus A
Optical
Hubble
Centaurus A
Radio
VLA
Centaurus A
Composite
Hubble/VLA/Chandra
EHT
5PB
10PB
60PB 200,000,000PB
EUCLID LSST SKA
Biggest challenges in Astronomy
‣ Collecting the data
‣ Retrieval
‣ Filtering good from bad
‣ Data Storage
‣ Distributing the data
‣ Upload/Download
‣ Combining data
‣ complementary observations and multi-wavelength observations
‣ Data analysis
‣ Compression
‣ Source detection
‣ Visualising the data
Let’s collaborate!
Thank you for your attentioN
The big data Universe. Literally.

The big data Universe. Literally.