Grace Blackwell Superchip Overview
The Grace Blackwell Superchip represents a monumental leap in computing technology, merging unparalleled processing power with advanced architectural design.
With a memory clock speed of 8Gbps provided by HBM3E technology and an expansive memory bus width of 2x2x4096-bit, this superchip achieves an astonishing memory bandwidth of 2x8TB/sec, supported by a vast 384GB of VRAM.
At the core of its computational prowess, the superchip boasts tensor processing capabilities across a wide range of precisions, delivering up to 20 PFLOPS for FP4 Dense Tensor operations, 10 P(FL)OPS for INT8/FP8, 5 PFLOPS for FP16, 2.5 PFLOPS for TF32, and an impressive 90 TFLOPS for FP64 Dense Tensor calculations.
Connectivity is no less advanced, with dual NVLink 5 interfaces reaching 1800GB/sec and PCIe 6.0 connections providing an additional 256GB/sec bandwidth.
Powered by two "Blackwell GPUs" and harboring a staggering 416 billion transistors, this superchip is not just a powerhouse but a marvel of modern engineering.
Its thermal design power (TDP) stands at 2700W, indicative of its high performance and energy demands.
Fabricated on TSMC's 4NP process, the Grace Blackwell Superchip sets a new standard for high-performance computing platforms, blending the Grace and Blackwell architectures to achieve unmatched computational efficiency and throughput.
With a memory clock speed of 8Gbps provided by HBM3E technology and an expansive memory bus width of 2x2x4096-bit, this superchip achieves an astonishing memory bandwidth of 2x8TB/sec, supported by a vast 384GB of VRAM.
At the core of its computational prowess, the superchip boasts tensor processing capabilities across a wide range of precisions, delivering up to 20 PFLOPS for FP4 Dense Tensor operations, 10 P(FL)OPS for INT8/FP8, 5 PFLOPS for FP16, 2.5 PFLOPS for TF32, and an impressive 90 TFLOPS for FP64 Dense Tensor calculations.
Connectivity is no less advanced, with dual NVLink 5 interfaces reaching 1800GB/sec and PCIe 6.0 connections providing an additional 256GB/sec bandwidth.
Powered by two "Blackwell GPUs" and harboring a staggering 416 billion transistors, this superchip is not just a powerhouse but a marvel of modern engineering.
Its thermal design power (TDP) stands at 2700W, indicative of its high performance and energy demands.
Fabricated on TSMC's 4NP process, the Grace Blackwell Superchip sets a new standard for high-performance computing platforms, blending the Grace and Blackwell architectures to achieve unmatched computational efficiency and throughput.
Specification
GB200 NVL72
Configuration
1 Grace CPU : 2 Blackwell GPUs
FP4 Tensor Core (with sparsity)
40 PFLOPS
FP8/FP6 Tensor Core (with sparsity)
20 PFLOPS
INT8 Tensor Core (with sparsity)
20 POPS
FP16/BF16 Tensor Core (with sparsity)
10 PFLOPS
TF32 Tensor Core
5 PFLOPS
FP32
180 TFLOPS
FP64
90 TFLOPS
FP64 Tensor Core
90 TFLOPS
GPU Memory
Up to 384 GB HBM3e
GPU Memory Bandwidth
16 TB/s
NVLink Bandwidth
3.6 TB/s
CPU Core Count
72 Arm® Neoverse V2 cores
CPU Memory
Up to 480 GB LPDDR5X
CPU Memory Bandwidth
Up to 512 GB/s