计算机视觉研究院专栏

作者:Edison_G

NVIDIA A100 TENSOR CORE GPU_计算机视觉


NVIDIA®GPU是推动人工智能革命的主要计算引擎,为人工智能训练和推理工作负载提供了巨大的加速。此外,NVIDIA GPU加速了许多类型的HPC和数据分析应用程序和系统,使客户能够有效地分析、可视化和将数据转化为洞察力。NVIDIA的加速计算平台是世界上许多最重要和增长最快的行业的核心。


NVIDIA A100 TENSOR CORE GPU_5g_02

计算机视觉研究院


长按扫描维码关注我们


EDC.CV


NVIDIA A100 TENSOR CORE GPU_计算机视觉_03

1. Unprecedented Acceleration at Every Scale

The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and HPC to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale up to thousands of GPUs or, using new Multi-Instance GPU (MIG) technology, can be partitioned into seven isolated GPU instances to accelerate workloads of all sizes. A100’s third- generation Tensor Core technology now accelerates more levels of precision for diverse workloads, speeding time to insight as well as time to market.


NVIDIA A100 TENSOR CORE GPU_sed_04

2. SYSTEM SPECIFICATIONS (PEAK PERFORMANCE)

NVIDIA A100 TENSOR CORE GPU_sed_05

3. GROUNDBREAKING INNOVATIONS


NVIDIA A100 TENSOR CORE GPU_计算机视觉_06

The NVIDIA A100 Tensor Core GPU is the flagship product of the NVIDIA data center platform for deep learning, HPC, and data analytics. The platform accelerates over 700 HPC applications and every major deep learning framework. It’s available everywhere, from desktops to servers to cloud services, delivering both dramatic performance gains and cost-saving opportunities.


NVIDIA A100 TENSOR CORE GPU_计算机视觉_07

To learn more about the NVIDIA A100 Tensor Core GPU, visit www.nvidia.com/a100

1  BERT pre-training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len =
512 | V100: NVIDIA DGX-1TM server with 8x NVIDIA V100 Tensor Core GPU using FP32 precision | A100: NVIDIA DGXTM A100 server with 8x A100 using TF32 precision.

2  BERT large inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRTTM (TRT) 7.1, precision = INT8, batch size 256 | V100: TRT 7.1, precision FP16, batch size 256 | A100 with 7 MIG instances of 1g.5gb; pre-production TRT, batch size 94, precision INT8 with sparsity.

3  V100 used is single V100 SXM2. A100 used is single A100 SXM4. AMBER based on PME-Cellulose, LAMMPS with Atomic Fluid LJ-2.5, FUN3D with dpw, Chroma with szscl21_24_128.

SPECIFICATIONS

 

NVIDIA A100 for HGX

NVIDIA A100 for PCIe

Peak FP64

9.7 TF

9.7 TF

Peak FP64 Tensor Core

19.5 TF

19.5 TF

Peak FP32

19.5 TF

19.5 TF

Peak TF32 Tensor Core

156 TF | 312 TF*

156 TF | 312 TF*

Peak BFLOAT16 Tensor Core

312 TF | 624 TF*

312 TF | 624 TF*

Peak FP16 Tensor Core

312 TF | 624 TF*

312 TF | 624 TF*

Peak INT8 Tensor Core

624 TOPS | 1,248 TOPS*

624 TOPS | 1,248 TOPS*

Peak INT4 Tensor Core

1,248 TOPS | 2,496 TOPS*

1,248 TOPS | 2,496 TOPS*

GPU Memory

40 GB

40 GB

GPU Memory Bandwidth

1,555 GB/s

1,555 GB/s

Interconnect

NVIDIA NVLink 600 GB/s**

PCIe Gen4 64 GB/s 

NVIDIA NVLink 600 GB/s**

PCIe Gen4 64 GB/s 

Multi-instance GPUs

Various instance sizes with up to 7MIGs @5GB

Various instance sizes with up to 7MIGs @5GB

Form Factor

4/8 SXM on NVIDIA HGX™ A100

PCIe

Max TDP Power

400W

250W

Delivered Performance of Top Apps

100%

90%

* With sparsity
** SXM GPUs via HGX A100 server boards, PCIe GPUs via NVLink Bridge for up to 2-GPUs



如果想加入我们“计算机视觉研究院”,请扫二维码加入我们。我们会按照你的需求将你拉入对应的学习群!

计算机视觉研究院主要涉及深度学习领域,主要致力于人脸检测、人脸识别,多目标检测、目标跟踪、图像分割等研究方向。研究院接下来会不断分享最新的论文算法新框架,我们这次改革不同点就是,我们要着重”研究“。之后我们会针对相应领域分享实践过程,让大家真正体会摆脱理论的真实场景,培养爱动手编程爱动脑思考的习惯!

NVIDIA A100 TENSOR CORE GPU_5g_02

计算机视觉研究院


长按扫描维码关注我们

EDC.CV