HopsFS – Breaking 1 million ops/sec barrier in Hadoop
Dr Jim Dowling
Associate Prof @ KTH
Senior Researcher @ SICS
CEO at Logical Clocks AB
www.hops.io
@hopshadoop
Evolution of Hadoop
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 2/51
2009 2017
Evolution of Hadoop
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 3/51
2009 2017
?
Tiny Brain
(NameNode, ResourceMgr)
Huge Body (DataNodes)
HDFS Scalability Bottleneck – the NameNode
•Limited namespace/metadata
- JVM Heap (~200 GB)
•Limited concurrency
- Single global namespace lock
(single-writer, multiple readers)
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 5/51
HFDS
CLIENT
HFDS
DATANODE
NAMENODE
HopsFS
1. Scale-out Metadata
- Metadata in an in-memory distributed database
- Multiple stateless NameNodes
2. Remove the Global Namespace Lock
- Supports multiple concurrent read and write operations
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 6/51
HopsFS Architecture
2017-04-05 7/51
MySQL Cluster: Network Database Engine (NDB)
•Open-Source, Distributed, In-Memory Database
- Scales to 48 database nodes
• 200 Million NoSQL Read Ops/Sec*
•NewSQL (Relational) DB
- Read Committed Transactions
- Row-level Locking
- User-defined partitioning
- Efficient cross-partition
transactions
2017-04-05 8/51*https://www.mysql.com/why-mysql/benchmarks/mysql-cluster/
NameNode
(Apache v2)
DAL API
(Apache v2)
NDB-DAL-Impl
(GPL v2)
Other DB
(Other License)
hops-2.7.3.jar ndb-2.7.3-7.5.6.jar
HopsFS Metadata and Metadata Operations
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 9/51
/
user
F1 F2 F3
HopsFS Metadata & Metadata Partitioning
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 10/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ...
/
user
F1 F2 F3
➢Inode ID
➢Parent INode ID
➢Name
➢Size
➢Access Attributes
➢...
HopsFS Metadata & Metadata Partitioning
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 11/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ...
/
user
F1 F2 F3
➢File INode to Blocks Mapping
➢Block Size
➢...
HopsFS Metadata & Metadata Partitioning
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 12/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ...
/
user
F1 F2 F3
➢Location of blocks on
Datanodes
➢...
HopsFS Metadata & Metadata Partitioning
13/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Inode_ID Block_ID ... Inode_ID Block_ID DataNode_ID ...
1 / 0 3 1 3 1 1
2 user 1 3 2 3 1 2
3 F1 2 3 3 3 1 3
4 F2 2 3 2 4
5 F3 2 3 2 5
3 ... ...
$> ls /user/*
/
user
F1 F2 F3
MySQL Cluster
Partition 1 Partition 2 Partition 3 Partition 4
/ user F1 [{3,1},{3,2},{3,3}
F2 ],[{3,1,1},{3,1,2},
F3 {3,1,3},{3,2,4}
…{3,3,9}]
HopsFS Metadata & Metadata Partitioning
14/51
INode Table Block Table Replica Table
Inode_ID Name Parent_ID ... Inode_ID Block_ID ... Inode_ID Block_ID DataNode_ID ...
1 / 0 3 1 3 1 1
2 user 1 3 2 3 1 2
3 F1 2 3 3 3 1 3
4 F2 2 3 2 4
5 F3 2 3 2 5
3 ... ...
$> cat /user/F1
/
user
F1 F2 F3
MySQL Cluster
Partition 1 Partition 2 Partition 3 Partition 4
/ user F1 [{3,1},{3,2},{3,3}
F2 ],[{3,1,1},{3,1,2},
F3 {3,1,3},{3,2,4}
…{3,3,9}]
Leader Election using NDB*
•Leader NN coordinates replication/lease mgmt
- NDB as shared memory for Election of Leader NN.
• Zookeeper not needed!
15/51*Niazi, Berthou, Ismail, Dowling, ”Leader Election in a NewSQL Database”, DAIS 2015
Metadata Locking
16/51
Metadata Locking (contd.)
17/51
â—ŹExclusive Lock
â—ŹShared Lock
Metadata Locking (contd.)
18/51
â—ŹExclusive Lock
â—ŹShared Lock
Subtree Lock
Performance Evaluation for HopsFS
19/51
• On Premise
- Up to 72 servers
- Dual Intel® Xeon® E5-2620 v3
@2.40GHz
- 256 GB RAM, 4 TB Disks
• 10 GbE
- 0.1 ms ping latency
Evaluation: Spotify Workload
20/51
HopsFS Higher Throughput with Same Hardware
21/51
HopsFS outperforms with equivalent
hardware: HA-HDFS with Five Servers
â—Ź 1 Active NameNode
â—Ź 1 Standby NameNode
â—Ź 3 Servers
â—‹ Journal Nodes
â—‹ ZooKeeper Nodes
Evaluation: Spotify Workload (contd.)
22/51
Evaluation: Spotify Workload (contd.)
23/51
Evaluation: Spotify Workload (contd.)
24/51
Evaluation: Spotify Workload (contd.)
25/51
16X the performance of
HDFS.
Further scaling possible
with more hardware
Write Intensive workloads
26/51
Workloads
HopsFS
ops/sec HDFS ops/sec Scaling Factor
Synthetic Workload (5.0% File Writes) 1.19 M 53.6 K 22
Synthetic Workload (10% File Writes) 1.04 M 35.2 K 30
Synthetic Workload (20% File Writes) 0.748 M 19.9 K 37
Scalability of HopsFS and HDFS for write intensive workloads
Write Intensive workloads
27/51
Workloads
HopsFS
ops/sec HDFS ops/sec Scaling Factor
Synthetic Workload (5.0% File Writes) 1.19 M 53.6 K 22
Synthetic Workload (10% File Writes) 1.04 M 35.2 K 30
Synthetic Workload (20% File Writes) 0.748 M 19.9 K 37
Scalability of HopsFS and HDFS for write intensive workloads
Metadata Scalability
28/51
37 times more files than HDFS
Operational Latency
29/51
File System Clients
No of Clients HopsFS Latency HDFS Latency
50 3.0 3.1
Operational Latency
30/51
File System Clients
No of Clients HopsFS Latency HDFS Latency
50 3.0 3.1
1500 3.7 15.5
Operational Latency
31/51
File System Clients
No of Clients HopsFS Latency HDFS Latency
50 3.0 3.1
1500 3.7 15.5
6500 6.8 67.4
Operational Latency
32/51
File System Clients
No of Clients HopsFS Latency HDFS Latency
50 3.0 3.1
1500 3.7 15.5
6500 6.8 67.4
Erasure Coding with Data Locality
33/51
Reed-Solomon
(140%)
ZFS with HopsFS
HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 34/51
RAID-0
10 Gb/s
~350 MB/s
Reads
~250 MB/s
Writes
RAID-5 + HopsFS Erasure Coding
~500 MB/s
Reads
~350 MB/s
Writes
Archive filesTriple-replicated files
Elasticsearch
Strong Eventually Consistent Metadata
35/51
Database
Kafka
Epipe
Hive Metastore Changelog
for HDFS
Namespace
Free-Text Search for Files/Dirs in
the HopsFS Namespace
Extending Metadata in HopsFS
Metadata API (HopsFS->Elasticsearch)
public void attachMetadata(Json obj, String pathToFileorDir)
public void removeMetadata(String name, String pathToFileorDir)
•Design your own tables
- Use foreign keys for metadata integrity
- Transactions ensure metadata consistency
2017-04-05 36/51
HopsYARN
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 37/51
Hops scalability now limited by YARN
•YARN scheduler (triggered on node heartbeats)*
- Scheduling decisions cost O(N), where N is the number of active Applications
- We reduced the cost to O(M), where M is the number of applications currently
requesting resources. Typically M << N.
38/51
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1000 3000 5000 7000 9000 11000 13000 15000 17000 19000
ClusterUtilisation
Number of Node Managers
Hadoop(fix)
Hadoop(OFF)
Hadoop (INFO)
*Experiments based on workload from YARN paper at SOCC’13 using our own distributed benchmarking tool.
Hops Distribution (2.7.3)
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 39/51
HopsYARNResource
Manager
Storage HopsFS
On-Premise GCEAWSPlatform
Processing
Logstash
TensorflowSpark
Flink
Kafka
Hopsworks Elasticsearch
Kibana Zeppelin
Hadoop Distributions Simplify Things
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 40/51
Cloudera MgrKaramel/ChefAmbariInstall /
Upgrade
YARN
HDFS
On-Premise
MR TensorflowSpark FlinkKafka
Future of HopsFS
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 41/51
Hive Metastore is Moving in with HopsFS
HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 42/51
HopsFS
Hive
MetaStore
Hive Metastore is Moving in with HopsFS
HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 43/51
HopsFSHive
MetaStore
Hive
MetaStore
Result: Strongly Consistent Hive Metadata
2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 44/51
1.
3.
2.
Removing the HDFS
backing directory
removes the Table
from Hive the
Metastore
Small Files in Hadoop
•In both Spotify and Yahoo 20% of the files are <= 4 KB
45/51
*Niazi et al, Size Matters: Improving the Performance of Small Files in HDFS, Poster at Eurosys 2017
Small Files in HopsFS*
inode_id varbinary (on-disk column)
32123432 [File contents go here]
46/51
•In HopsFS, we can store small files co-located with the
metadata in MySQL Cluster as on-disk data.
30 namenodes/datanodes and 6 NDB nodes were used. Small file size was 4 KB. HopsFs files were stored on Intel 750 Series SSDs
HopsFS Small Files Performance (Early Results)
47/51
Multi-Data-Center HopsFS
• Multi-Master Replication of Metadata with Conflict Detection/Resolution.
48/51
NDB NDB
DN DN DN DN
Client
Synchronous Replication of Blocks
Network Partition Identification Service
NNNN NNNN
Asynchronous Replication of Metadata (~2000 ms delay)
Hops-eu-west1 Hops-eu-west2
Summary
•Hops is the only European distribution of Hadoop
- More scalable, tinker-friendly, and open-source.
•HopsFS has made a quantum leap in the
performance for HDFS
•HopsFS opens up new possibilities for building data
processing frameworks with support for small files,
free-text search of the namespace, and extensible
strongly consistent metadata.
2017-04-05 49/51
The Hops Team
Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman
Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto Bampi,
Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Zahin Azher
Rashid, Robin Andersson, ArunaKumari Yedurupaka, Tobias
Johansson, August Bonds, Filotas Siskos.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram
Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto
Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro,
Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos
Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Hops Heads
Resource manager
Lead simulator simulatorStart
start
Heartbeats
(nodes and apps)
Container allocations
stop
results
results
Scalable Benchmarker for YARN

Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop

  • 1.
    HopsFS – Breaking1 million ops/sec barrier in Hadoop Dr Jim Dowling Associate Prof @ KTH Senior Researcher @ SICS CEO at Logical Clocks AB www.hops.io @hopshadoop
  • 2.
    Evolution of Hadoop 2017-04-05HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 2/51 2009 2017
  • 3.
    Evolution of Hadoop 2017-04-05HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 3/51 2009 2017 ? Tiny Brain (NameNode, ResourceMgr) Huge Body (DataNodes)
  • 5.
    HDFS Scalability Bottleneck– the NameNode •Limited namespace/metadata - JVM Heap (~200 GB) •Limited concurrency - Single global namespace lock (single-writer, multiple readers) 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 5/51 HFDS CLIENT HFDS DATANODE NAMENODE
  • 6.
    HopsFS 1. Scale-out Metadata -Metadata in an in-memory distributed database - Multiple stateless NameNodes 2. Remove the Global Namespace Lock - Supports multiple concurrent read and write operations 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 6/51
  • 7.
  • 8.
    MySQL Cluster: NetworkDatabase Engine (NDB) •Open-Source, Distributed, In-Memory Database - Scales to 48 database nodes • 200 Million NoSQL Read Ops/Sec* •NewSQL (Relational) DB - Read Committed Transactions - Row-level Locking - User-defined partitioning - Efficient cross-partition transactions 2017-04-05 8/51*https://www.mysql.com/why-mysql/benchmarks/mysql-cluster/ NameNode (Apache v2) DAL API (Apache v2) NDB-DAL-Impl (GPL v2) Other DB (Other License) hops-2.7.3.jar ndb-2.7.3-7.5.6.jar
  • 9.
    HopsFS Metadata andMetadata Operations 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 9/51 / user F1 F2 F3
  • 10.
    HopsFS Metadata &Metadata Partitioning 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 10/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ... / user F1 F2 F3 ➢Inode ID ➢Parent INode ID ➢Name ➢Size ➢Access Attributes ➢...
  • 11.
    HopsFS Metadata &Metadata Partitioning 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 11/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ... / user F1 F2 F3 ➢File INode to Blocks Mapping ➢Block Size ➢...
  • 12.
    HopsFS Metadata &Metadata Partitioning 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 12/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Block_ID Inode_ID ... Inode_ID Block_ID DataNode_ID ... / user F1 F2 F3 ➢Location of blocks on Datanodes ➢...
  • 13.
    HopsFS Metadata &Metadata Partitioning 13/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Inode_ID Block_ID ... Inode_ID Block_ID DataNode_ID ... 1 / 0 3 1 3 1 1 2 user 1 3 2 3 1 2 3 F1 2 3 3 3 1 3 4 F2 2 3 2 4 5 F3 2 3 2 5 3 ... ... $> ls /user/* / user F1 F2 F3 MySQL Cluster Partition 1 Partition 2 Partition 3 Partition 4 / user F1 [{3,1},{3,2},{3,3} F2 ],[{3,1,1},{3,1,2}, F3 {3,1,3},{3,2,4} …{3,3,9}]
  • 14.
    HopsFS Metadata &Metadata Partitioning 14/51 INode Table Block Table Replica Table Inode_ID Name Parent_ID ... Inode_ID Block_ID ... Inode_ID Block_ID DataNode_ID ... 1 / 0 3 1 3 1 1 2 user 1 3 2 3 1 2 3 F1 2 3 3 3 1 3 4 F2 2 3 2 4 5 F3 2 3 2 5 3 ... ... $> cat /user/F1 / user F1 F2 F3 MySQL Cluster Partition 1 Partition 2 Partition 3 Partition 4 / user F1 [{3,1},{3,2},{3,3} F2 ],[{3,1,1},{3,1,2}, F3 {3,1,3},{3,2,4} …{3,3,9}]
  • 15.
    Leader Election usingNDB* •Leader NN coordinates replication/lease mgmt - NDB as shared memory for Election of Leader NN. • Zookeeper not needed! 15/51*Niazi, Berthou, Ismail, Dowling, ”Leader Election in a NewSQL Database”, DAIS 2015
  • 16.
  • 17.
  • 18.
    Metadata Locking (contd.) 18/51 â—ŹExclusiveLock â—ŹShared Lock Subtree Lock
  • 19.
    Performance Evaluation forHopsFS 19/51 • On Premise - Up to 72 servers - Dual Intel® Xeon® E5-2620 v3 @2.40GHz - 256 GB RAM, 4 TB Disks • 10 GbE - 0.1 ms ping latency
  • 20.
  • 21.
    HopsFS Higher Throughputwith Same Hardware 21/51 HopsFS outperforms with equivalent hardware: HA-HDFS with Five Servers â—Ź 1 Active NameNode â—Ź 1 Standby NameNode â—Ź 3 Servers â—‹ Journal Nodes â—‹ ZooKeeper Nodes
  • 22.
  • 23.
  • 24.
  • 25.
    Evaluation: Spotify Workload(contd.) 25/51 16X the performance of HDFS. Further scaling possible with more hardware
  • 26.
    Write Intensive workloads 26/51 Workloads HopsFS ops/secHDFS ops/sec Scaling Factor Synthetic Workload (5.0% File Writes) 1.19 M 53.6 K 22 Synthetic Workload (10% File Writes) 1.04 M 35.2 K 30 Synthetic Workload (20% File Writes) 0.748 M 19.9 K 37 Scalability of HopsFS and HDFS for write intensive workloads
  • 27.
    Write Intensive workloads 27/51 Workloads HopsFS ops/secHDFS ops/sec Scaling Factor Synthetic Workload (5.0% File Writes) 1.19 M 53.6 K 22 Synthetic Workload (10% File Writes) 1.04 M 35.2 K 30 Synthetic Workload (20% File Writes) 0.748 M 19.9 K 37 Scalability of HopsFS and HDFS for write intensive workloads
  • 28.
  • 29.
    Operational Latency 29/51 File SystemClients No of Clients HopsFS Latency HDFS Latency 50 3.0 3.1
  • 30.
    Operational Latency 30/51 File SystemClients No of Clients HopsFS Latency HDFS Latency 50 3.0 3.1 1500 3.7 15.5
  • 31.
    Operational Latency 31/51 File SystemClients No of Clients HopsFS Latency HDFS Latency 50 3.0 3.1 1500 3.7 15.5 6500 6.8 67.4
  • 32.
    Operational Latency 32/51 File SystemClients No of Clients HopsFS Latency HDFS Latency 50 3.0 3.1 1500 3.7 15.5 6500 6.8 67.4
  • 33.
    Erasure Coding withData Locality 33/51 Reed-Solomon (140%)
  • 34.
    ZFS with HopsFS HopsFS- Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 34/51 RAID-0 10 Gb/s ~350 MB/s Reads ~250 MB/s Writes RAID-5 + HopsFS Erasure Coding ~500 MB/s Reads ~350 MB/s Writes Archive filesTriple-replicated files
  • 35.
    Elasticsearch Strong Eventually ConsistentMetadata 35/51 Database Kafka Epipe Hive Metastore Changelog for HDFS Namespace Free-Text Search for Files/Dirs in the HopsFS Namespace
  • 36.
    Extending Metadata inHopsFS Metadata API (HopsFS->Elasticsearch) public void attachMetadata(Json obj, String pathToFileorDir) public void removeMetadata(String name, String pathToFileorDir) •Design your own tables - Use foreign keys for metadata integrity - Transactions ensure metadata consistency 2017-04-05 36/51
  • 37.
    HopsYARN 2017-04-05 HopsFS -Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 37/51
  • 38.
    Hops scalability nowlimited by YARN •YARN scheduler (triggered on node heartbeats)* - Scheduling decisions cost O(N), where N is the number of active Applications - We reduced the cost to O(M), where M is the number of applications currently requesting resources. Typically M << N. 38/51 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1000 3000 5000 7000 9000 11000 13000 15000 17000 19000 ClusterUtilisation Number of Node Managers Hadoop(fix) Hadoop(OFF) Hadoop (INFO) *Experiments based on workload from YARN paper at SOCC’13 using our own distributed benchmarking tool.
  • 39.
    Hops Distribution (2.7.3) 2017-04-05HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 39/51 HopsYARNResource Manager Storage HopsFS On-Premise GCEAWSPlatform Processing Logstash TensorflowSpark Flink Kafka Hopsworks Elasticsearch Kibana Zeppelin
  • 40.
    Hadoop Distributions SimplifyThings 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 40/51 Cloudera MgrKaramel/ChefAmbariInstall / Upgrade YARN HDFS On-Premise MR TensorflowSpark FlinkKafka
  • 41.
    Future of HopsFS 2017-04-05HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 41/51
  • 42.
    Hive Metastore isMoving in with HopsFS HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 42/51 HopsFS Hive MetaStore
  • 43.
    Hive Metastore isMoving in with HopsFS HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 43/51 HopsFSHive MetaStore Hive MetaStore
  • 44.
    Result: Strongly ConsistentHive Metadata 2017-04-05 HopsFS - Breaking 1 million ops/s Barrier, J Dowling, Nov 2016 44/51 1. 3. 2. Removing the HDFS backing directory removes the Table from Hive the Metastore
  • 45.
    Small Files inHadoop •In both Spotify and Yahoo 20% of the files are <= 4 KB 45/51
  • 46.
    *Niazi et al,Size Matters: Improving the Performance of Small Files in HDFS, Poster at Eurosys 2017 Small Files in HopsFS* inode_id varbinary (on-disk column) 32123432 [File contents go here] 46/51 •In HopsFS, we can store small files co-located with the metadata in MySQL Cluster as on-disk data.
  • 47.
    30 namenodes/datanodes and6 NDB nodes were used. Small file size was 4 KB. HopsFs files were stored on Intel 750 Series SSDs HopsFS Small Files Performance (Early Results) 47/51
  • 48.
    Multi-Data-Center HopsFS • Multi-MasterReplication of Metadata with Conflict Detection/Resolution. 48/51 NDB NDB DN DN DN DN Client Synchronous Replication of Blocks Network Partition Identification Service NNNN NNNN Asynchronous Replication of Metadata (~2000 ms delay) Hops-eu-west1 Hops-eu-west2
  • 49.
    Summary •Hops is theonly European distribution of Hadoop - More scalable, tinker-friendly, and open-source. •HopsFS has made a quantum leap in the performance for HDFS •HopsFS opens up new possibilities for building data processing frameworks with support for small files, free-text search of the namespace, and extensible strongly consistent metadata. 2017-04-05 49/51
  • 50.
    The Hops Team JimDowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Roberto Bampi, Fabio Buso, Fanti Machmount Al Samisti, Braulio Grana, Zahin Azher Rashid, Robin Andersson, ArunaKumari Yedurupaka, Tobias Johansson, August Bonds, Filotas Siskos. Active: Alumni: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu. Hops Heads
  • 51.
    Resource manager Lead simulatorsimulatorStart start Heartbeats (nodes and apps) Container allocations stop results results Scalable Benchmarker for YARN