计算机视觉研究院专栏
作者:Edison_G
NVIDIA®GPU是推动人工智能革命的主要计算引擎,为人工智能训练和推理工作负载提供了巨大的加速。此外,NVIDIA GPU加速了许多类型的HPC和数据分析应用程序和系统,使客户能够有效地分析、可视化和将数据转化为洞察力。NVIDIA的加速计算平台是世界上许多最重要和增长最快的行业的核心。
计算机视觉研究院
长按扫描维码关注我们
EDC.CV
1. Unprecedented Acceleration at Every Scale
The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and HPC to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale up to thousands of GPUs or, using new Multi-Instance GPU (MIG) technology, can be partitioned into seven isolated GPU instances to accelerate workloads of all sizes. A100’s third- generation Tensor Core technology now accelerates more levels of precision for diverse workloads, speeding time to insight as well as time to market.
2. SYSTEM SPECIFICATIONS (PEAK PERFORMANCE)
3. GROUNDBREAKING INNOVATIONS
The NVIDIA A100 Tensor Core GPU is the flagship product of the NVIDIA data center platform for deep learning, HPC, and data analytics. The platform accelerates over 700 HPC applications and every major deep learning framework. It’s available everywhere, from desktops to servers to cloud services, delivering both dramatic performance gains and cost-saving opportunities.
To learn more about the NVIDIA A100 Tensor Core GPU, visit www.nvidia.com/a100
1 BERT pre-training throughput using Pytorch, including (2/3) Phase 1 and (1/3) Phase 2 | Phase 1 Seq Len = 128, Phase 2 Seq Len =
512 | V100: NVIDIA DGX-1TM server with 8x NVIDIA V100 Tensor Core GPU using FP32 precision | A100: NVIDIA DGXTM A100 server with 8x
A100 using TF32 precision.
2 BERT large inference | NVIDIA T4 Tensor Core GPU: NVIDIA TensorRTTM (TRT) 7.1, precision = INT8, batch size 256 | V100: TRT 7.1, precision FP16, batch size 256 | A100 with 7 MIG instances of 1g.5gb; pre-production TRT, batch size 94, precision INT8 with sparsity.
3 V100 used is single V100 SXM2. A100 used is single A100 SXM4. AMBER based on PME-Cellulose, LAMMPS with Atomic Fluid LJ-2.5, FUN3D with dpw, Chroma with szscl21_24_128.
SPECIFICATIONS
| NVIDIA A100 for HGX | NVIDIA A100 for PCIe |
Peak FP64 | 9.7 TF | 9.7 TF |
Peak FP64 Tensor Core | 19.5 TF | 19.5 TF |
Peak FP32 | 19.5 TF | 19.5 TF |
Peak TF32 Tensor Core | 156 TF | 312 TF* | 156 TF | 312 TF* |
Peak BFLOAT16 Tensor Core | 312 TF | 624 TF* | 312 TF | 624 TF* |
Peak FP16 Tensor Core | 312 TF | 624 TF* | 312 TF | 624 TF* |
Peak INT8 Tensor Core | 624 TOPS | 1,248 TOPS* | 624 TOPS | 1,248 TOPS* |
Peak INT4 Tensor Core | 1,248 TOPS | 2,496 TOPS* | 1,248 TOPS | 2,496 TOPS* |
GPU Memory | 40 GB | 40 GB |
GPU Memory Bandwidth | 1,555 GB/s | 1,555 GB/s |
Interconnect | NVIDIA NVLink 600 GB/s** PCIe Gen4 64 GB/s | NVIDIA NVLink 600 GB/s** PCIe Gen4 64 GB/s |
Multi-instance GPUs | Various instance sizes with up to 7MIGs @5GB | Various instance sizes with up to 7MIGs @5GB |
Form Factor | 4/8 SXM on NVIDIA HGX™ A100 | PCIe |
Max TDP Power | 400W | 250W |
Delivered Performance of Top Apps | 100% | 90% |
* With sparsity
** SXM GPUs via HGX A100 server boards, PCIe GPUs via NVLink Bridge for up to 2-GPUs
如果想加入我们“计算机视觉研究院”,请扫二维码加入我们。我们会按照你的需求将你拉入对应的学习群!
计算机视觉研究院主要涉及深度学习领域,主要致力于人脸检测、人脸识别,多目标检测、目标跟踪、图像分割等研究方向。研究院接下来会不断分享最新的论文算法新框架,我们这次改革不同点就是,我们要着重”研究“。之后我们会针对相应领域分享实践过程,让大家真正体会摆脱理论的真实场景,培养爱动手编程爱动脑思考的习惯!
计算机视觉研究院
长按扫描维码关注我们
EDC.CV