Cloud Deep Learning Chips
Training & Inference
Created date:2019.12.07

Updated date : 2019.12.15/17/25

@Vengineer



This is a summary of learning and inference
chips for deep learning in the cloud.
Each company's chips, product photos and
images are borrowed from the URL on the same
page.
Habana Labs:Goya (DRAM)
Intel Nervana:NNP-I (DRAM)
Google:TPU v1 (SRAM)
Groq (SRAM)
Alibaba:Hanguang(含光) 800 (SRAM)
The inference chip does not have an
interconnect itself.
InferenceTraining
Google:TPU v2/v3 (HBM2)
Intel Nervana:NNP-T (HBM2)
Habana Labs:Gaudi (HBM2)
Alphaics:RAP (HBM2 option)
Huawei:Ascend 910 (HBM2)
Graphcore:GC2 (SRAM)
Cerebras:CS-1 (SRAM)
The training chip has its own
interconnect.
https://vengineer.hatenablog.com/entry/2019/11/05/060000
https://github.com/basicmi/AI-Chip
https://twitter.com/jwangARK/status/1189560904872058880
Training Chips
Google Cloud TPU v2/v3
https://cloud.google.com/tpu/docs/system-architecture
Cloud TPU v2
180 TFLOPS
64 GB HBM
Cloud TPU v3
420 TFLOPS
128 GB HBM
Intel Nervana NNP-T
https://www.intel.ai/nervana-nnp/
nGraph
https://github.com/NervanaSystems/ngraph
Habana Gaudi
https://habana.ai/ 2019.12.17(16) : Intel が 2Bドル(約2200憶円)で買収
Alphaics:Real AI Processing (RAP)
https://www.alphaics.ai/
Huawei:Ascend 910
https://ascend.huawei.com/
Enflame Technology:邃思 DTU
http://www.enflame-tech.com/products.html
CloudBlazer
 PCIe Gen4 x16
 HBM2 16GB (512GB/s)
 T10:PCIe Card (255W)
    FP32   :20TFLOPS
    FP16/BF16:80TFLOPS
 T11:OAM (300W
    FP32   :22TFLOPS
    FP16/BF16:86TFLOPS
Baidu:Kunlun (advanced XPU)
https://news.samsung.com/global/baidu-and-samsung-electronics-ready-for-production-of-leading
-edge-ai-chip-for-early-next-year
Samsung 14nm
 HBM2 16GB (512GB/s)
 260TOPS/150W
Graphcore GC2
https://www.graphcore.ai/
Cerebras Systems:Cerebras CS-1
https://www.cerebras.net/
Inference Chips
Google TPU v1
https://cloud.google.com/blog/products/gcp/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu
Intel Nervana NNP-I
https://www.intel.ai/nervana-nnp/
Glow
https://github.com/pytorch/glow
NNP-I x32:1U
Habana Goya 2019.12.17(16) : Intel が 2Bドル(約2200憶円)で買収
https://habana.ai/
Glow
https://github.com/pytorch/glow
groq:Tensor Streaming Architecture (TSA)
https://groq.com/
https://groq.com/groq-announces-worlds-first-architecture-capable-of-petaops-on-a-single-chip/
https://www.electronicdesign.com/industrial-automation/groq-s-ai-accelerator-eyes-hyperscalers-auto
nomous-vehicles
Alibaba:Hanguang(含光) 800
https://www.alibabacloud.com/blog/announcing-hanguang-800-alibabas-first-ai-inference-chip_59
5482
https://en.wikichip.org/wiki/t-head/hanguang_800
Collaboration
Glow: A community-driven approach to AI
infrastructure specification
https://engineering.fb.com/ml-applications/glow-a-community-driven-approach-to-ai-infrastructure/
https://github.com/pytorch/glow
Intel and Baidu Continue Collaboration
across AI, AD and 5G
https://newsroom.intel.com/articles/intel-baidu-continue-collaboration-across-ai-ad-5g/
● BaiduBrain* (Baidu’s AI platform),
● PaddlePaddle* (Baidu’s deep learning platform)
● DuerOS* (Baidu’s AI-powered voice assistant platform)
● Apollo* (Baidu’s autonomous driving platform)
● Intel® Xeon® Scalable platform
● Intel® Optane™ DC Persistent Memory
● Intel® Optane™ DC SSD
● silicon photonics
● Ethernet
● Intel AI accelerators and Intel software stack
Training PyTorch models on Cloud TPU
Pods
https://cloud.google.com/tpu/docs/tutorials/pytorch-pod
github : https://github.com/pytorch/xla
MICROSOFT AND GRAPHCORE
COLLABORATE TO ACCELERATE
ARTIFICIAL INTELLIGENCE
https://www.graphcore.ai/posts/microsoft-and-graphcore-collaborate-to-accelerate-artificial-intellig
ence
Today we are very excited to share details of our collaboration with Microsoft,
announcing preview of Graphcore® Intelligence Processing Units (IPUs) on
Microsoft Azure.
● Graphcore IPUs with Dell EMC DSS 8440 Server
● Graphcore also delivers a full training runtime for ONNX and is working closely with the ONNX
organisation to include this in the ONNX standard environment. Initial PyTorch support is
available in Q4 2019 with full advanced feature support becoming available in early 2020.
Baidu, Facebook and Microsoft work together to
define the OCP Accelerator Module
specification
https://www.opencompute.org/blog/baidu-facebook-and-microsoft-work-together-to-define-the-oc
p-accelerator-module-specification
https://146a55aca6f00848c565-a7635525d40ac1c70300198708936b4e.ssl.cf1.rackcdn.com/images/22fa829b159a4c
ea7b33aa12bc2c61909e52d077.pdf
Other than Apple and Amazon =>
I am a computer engineer,
not a deep learning craftsman




ありがとうございました。
Thanks
@Vengineer
ソースコード解析職人
Source code analysis craftsman

Cloud Deep Learning Chips Training & Inference