SQL+GPU+SSD=∞
Wasserschwein@Shinagawa
Self Introduction
▌Name: Wasserschwein@Shinagawa
▌PostgreSQL: 9Years (2006~)
▌Works: Security, FDW, etc...
▌Hobby: Mixture of heterogeneous technology
with PostgreSQL
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞2
Very powerful
computing
capability
Very functional
& well-used
database
PG-Strom:
What I’m making
GPGPU
What’s PG-Strom – Brief overview
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞3
▌Core ideas
① GPU native code generation on the fly
② Asynchronous massive parallel execution
▌Advantages
 Transparent acceleration with 100% query compatibility
 Commodity H/W and less system integration cost
Parser
Planner
Executor
Custom-
Scan/Join
Interface
Query: SELECT * FROM l_tbl JOIN r_tbl on l_tbl.lid = r_tbl.rid;
PG-Strom
CUDA
driver
nvrtc
DMA Data Transfer
CUDA
Source
code
Massive
Parallel
Execution
Supported Workload – Scan, Join, Aggregation
▌SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 [, ...] GROUP BY cat;
 t0: 100M rows, t1~t10: 100K rows for each, all the data was preloaded.
▌Environment:
 PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)
 CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞4
0
50
100
150
200
250
300
PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom
2 3 4 5 6 7 8
QueryResponseTime[sec]
# of tables involved
Time consumption per component (PostgreSQL v9.5β vs PG-Strom)
Scan Join Aggregate Others
Next target is I/O acceleration – from TPC/DS results
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞5
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Time consumption per workloads (PostgreSQL v9.5beta+PG-Strom)
Scan Join Aggregate Others
So, How to accelerate I/O stuff by GPU?
NOTICE
The story I like to introduce next is...
Just my Ideaat this moment
......So, I’ll pay my efforts to implement
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞6
A rough x86_64 hardware architecture
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞7
GPU SSD
CPU + RAM CPU + RAM
PCI-E
SAS
Usual I/O bottleneck 
Simplified diagram for introduction
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞8
GPU SSD
CPU + RAM
PCI-E
OK, it’s storage
NVM EXPRESS SSD
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞9
PCI-E direct SSD device – low latency and higher bandwidth
Samsung
SSD 950 PRO
Intel SSD 750
HGST
Ultrastar SN100
Intel
SSD DC P3700
Data Flow in analytic queries
① Data load from storage to CPU/RAM
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞10
GPU SSD
CPU + RAM
PCI-E
Table
Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞11
GPU SSD
CPU + RAM
PCI-E
Table
Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
③ Remove unreferenced columns (Projection)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞12
GPU SSD
CPU + RAM
PCI-E
Table
 The job of CPU
Data Flow in analytic queries
① Data load from storage to CPU/RAM
② Remove invisible rows (Select)
③ Remove unreferenced columns (Projection)
④ Join with other tables (Join)
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞13
GPU SSD
CPU + RAM
PCI-E
Table
+
SSD-to-GPU Direct
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞14
Data transfer between SSD
and GPU, bypass CPU/RAM
Also available on NVMe,
not only Fusion-IO
Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞15
GPU SSD
CPU + RAM
PCI-E
Table
SSD-to-GPU
Direct
Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞16
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞17
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Remove invisible rows
according to the scan
qualifiers
Data Flow in analytic queries (1/3) – Basic
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞18
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Only visible rows are
moved to CPU+RAM
Remove invisible rows
according to the scan
qualifiers
Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞19
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞20
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Remove invisible rows
according to the scan
qualifiers
Data Flow in analytic queries (2/3) – Advanced
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞21
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Only visible rows and
referenced columns are
moved to CPU+RAM
Remove invisible rows
according to the scan
qualifiers
Remove invisible rows
and unreferenced
columns according to
the scan qualifiers
and projection
Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞22
GPU SSD
CPU + RAM
PCI-E
GPUcodegenerated
fromSQLonthefly
Innerrelations
(JOINtarget)
Table
Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞23
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Innerrelations
(JOINtarget)
Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞24
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Innerrelations
(JOINtarget)
Data Flow in analytic queries (3/3) – Ultimate
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞25
GPU SSD
CPU + RAM
PCI-E
Table
GPUcodegenerated
fromSQLonthefly
SSD-to-GPU
Direct
Tuples are already
joined when it read
data from the storage
Innerrelations
(JOINtarget)
+
Generate
joined tuples
on GPU side
Primitive Technologies
▌NVIDIA GPUDirect enhancement on NVMe device driver
 Interaction between NVMe and NVIDIA drivers are needed
▌Usage statistics of shared_buffers per relations
 To avoid SSDGPU direct on relations that is already preloaded
▌Add new access mode to shared_buffers
 Nobody can make the buffer dirty under the SSDGPU Direct transfer
We are welcome all the developer
who join to PG-Strom project
PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞26
Coming Soon?

SQL+GPU+SSD=∞ (English)

  • 1.
  • 2.
    Self Introduction ▌Name: Wasserschwein@Shinagawa ▌PostgreSQL:9Years (2006~) ▌Works: Security, FDW, etc... ▌Hobby: Mixture of heterogeneous technology with PostgreSQL PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞2 Very powerful computing capability Very functional & well-used database PG-Strom: What I’m making GPGPU
  • 3.
    What’s PG-Strom –Brief overview PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞3 ▌Core ideas ① GPU native code generation on the fly ② Asynchronous massive parallel execution ▌Advantages  Transparent acceleration with 100% query compatibility  Commodity H/W and less system integration cost Parser Planner Executor Custom- Scan/Join Interface Query: SELECT * FROM l_tbl JOIN r_tbl on l_tbl.lid = r_tbl.rid; PG-Strom CUDA driver nvrtc DMA Data Transfer CUDA Source code Massive Parallel Execution
  • 4.
    Supported Workload –Scan, Join, Aggregation ▌SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 [, ...] GROUP BY cat;  t0: 100M rows, t1~t10: 100K rows for each, all the data was preloaded. ▌Environment:  PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)  CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores) PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞4 0 50 100 150 200 250 300 PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom 2 3 4 5 6 7 8 QueryResponseTime[sec] # of tables involved Time consumption per component (PostgreSQL v9.5β vs PG-Strom) Scan Join Aggregate Others
  • 5.
    Next target isI/O acceleration – from TPC/DS results PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞5 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Time consumption per workloads (PostgreSQL v9.5beta+PG-Strom) Scan Join Aggregate Others So, How to accelerate I/O stuff by GPU?
  • 6.
    NOTICE The story Ilike to introduce next is... Just my Ideaat this moment ......So, I’ll pay my efforts to implement PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞6
  • 7.
    A rough x86_64hardware architecture PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞7 GPU SSD CPU + RAM CPU + RAM PCI-E SAS Usual I/O bottleneck 
  • 8.
    Simplified diagram forintroduction PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞8 GPU SSD CPU + RAM PCI-E OK, it’s storage
  • 9.
    NVM EXPRESS SSD PostgreSQLConference Japan - LT: SQL+GPU+SSD=∞9 PCI-E direct SSD device – low latency and higher bandwidth Samsung SSD 950 PRO Intel SSD 750 HGST Ultrastar SN100 Intel SSD DC P3700
  • 10.
    Data Flow inanalytic queries ① Data load from storage to CPU/RAM PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞10 GPU SSD CPU + RAM PCI-E Table
  • 11.
    Data Flow inanalytic queries ① Data load from storage to CPU/RAM ② Remove invisible rows (Select) PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞11 GPU SSD CPU + RAM PCI-E Table
  • 12.
    Data Flow inanalytic queries ① Data load from storage to CPU/RAM ② Remove invisible rows (Select) ③ Remove unreferenced columns (Projection) PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞12 GPU SSD CPU + RAM PCI-E Table
  • 13.
     The jobof CPU Data Flow in analytic queries ① Data load from storage to CPU/RAM ② Remove invisible rows (Select) ③ Remove unreferenced columns (Projection) ④ Join with other tables (Join) PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞13 GPU SSD CPU + RAM PCI-E Table +
  • 14.
    SSD-to-GPU Direct PostgreSQL ConferenceJapan - LT: SQL+GPU+SSD=∞14 Data transfer between SSD and GPU, bypass CPU/RAM Also available on NVMe, not only Fusion-IO
  • 15.
    Data Flow inanalytic queries (1/3) – Basic PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞15 GPU SSD CPU + RAM PCI-E Table SSD-to-GPU Direct
  • 16.
    Data Flow inanalytic queries (1/3) – Basic PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞16 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct
  • 17.
    Data Flow inanalytic queries (1/3) – Basic PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞17 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Remove invisible rows according to the scan qualifiers
  • 18.
    Data Flow inanalytic queries (1/3) – Basic PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞18 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Only visible rows are moved to CPU+RAM Remove invisible rows according to the scan qualifiers
  • 19.
    Data Flow inanalytic queries (2/3) – Advanced PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞19 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct
  • 20.
    Data Flow inanalytic queries (2/3) – Advanced PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞20 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Remove invisible rows according to the scan qualifiers
  • 21.
    Data Flow inanalytic queries (2/3) – Advanced PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞21 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Only visible rows and referenced columns are moved to CPU+RAM Remove invisible rows according to the scan qualifiers Remove invisible rows and unreferenced columns according to the scan qualifiers and projection
  • 22.
    Data Flow inanalytic queries (3/3) – Ultimate PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞22 GPU SSD CPU + RAM PCI-E GPUcodegenerated fromSQLonthefly Innerrelations (JOINtarget) Table
  • 23.
    Data Flow inanalytic queries (3/3) – Ultimate PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞23 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Innerrelations (JOINtarget)
  • 24.
    Data Flow inanalytic queries (3/3) – Ultimate PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞24 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Innerrelations (JOINtarget)
  • 25.
    Data Flow inanalytic queries (3/3) – Ultimate PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞25 GPU SSD CPU + RAM PCI-E Table GPUcodegenerated fromSQLonthefly SSD-to-GPU Direct Tuples are already joined when it read data from the storage Innerrelations (JOINtarget) + Generate joined tuples on GPU side
  • 26.
    Primitive Technologies ▌NVIDIA GPUDirectenhancement on NVMe device driver  Interaction between NVMe and NVIDIA drivers are needed ▌Usage statistics of shared_buffers per relations  To avoid SSDGPU direct on relations that is already preloaded ▌Add new access mode to shared_buffers  Nobody can make the buffer dirty under the SSDGPU Direct transfer We are welcome all the developer who join to PG-Strom project PostgreSQL Conference Japan - LT: SQL+GPU+SSD=∞26
  • 27.