带gpu的板卡_深度学习

rtx2070 gpu

带gpu的板卡_游戏_02

NVIDIA 英伟达

On September 1st 2020, NVIDIA revealed its new lineup of gaming GPUs: the RTX 3000 series, based on their Ampere architecture. We’ll discuss what’s new, the AI-powered software that comes with it, and all the details that make this generation really awesome.

2020年9月1日,NVIDIA展示了其新的游戏GPU系列:基于其Ampere架构的RTX 3000系列。 我们将讨论新功能,随附的基于AI的软件,以及使这一代产品真正令人敬畏的所有细节。

(Meet the RTX 3000 Series GPUs)

带gpu的板卡_神经网络_03

NVIDIA 英伟达

NVIDIA’s main announcement was its shiny new GPUs, all built on a custom 8 nm manufacturing process, and all bringing in major speedups in both rasterization and ray-tracing performance.

NVIDIA的主要公告是其闪亮的新型GPU,它们均基于定制的8 nm制造Craft.io构建,并且在光栅化和光线追踪性能方面均实现了重大提速。

On the low end of the lineup, there’s the RTX 3070, which comes in at $499. It is a bit expensive for the cheapest card unveiled by NVIDIA at the initial announcement, but it’s an absolute steal once you learn that it beats out the existing RTX 2080 Ti, a top of the line card which regularly retailed for over $1400. However, after NVIDIA’s announcement, the third-party sale priced dropped, with a large number of them being panic sold on eBay for under $600.

在产品阵容的低端,有RTX 3070 ,价格为499美元。 对于NVIDIA最初发布的最便宜的卡来说,它有点贵,但是一旦您发现它击败了现有的RTX 2080 Ti(通常零售价超过1400美元的顶级卡),就绝对是物有所值。 但是,在NVIDIA宣布之后,第三方销售的价格下降了,其中很多人在eBay上惊慌失措,价格不到600美元。

There are no solid benchmarks out as of the announcement, so it’s unclear if the card is really objectively “better” than a 2080 Ti, or if NVIDIA is twisting the marketing a bit. The benchmarks being ran were at 4K and likely had RTX on, which may make the gap look larger than it will be in purely rasterized games, as the Ampere-based 3000 series will perform over twice as well at ray tracing than Turing. But, with ray tracing now being something that doesn’t hurt performance much, and being supported in the latest generation of consoles, it’s a major selling point to have it running as fast as last gen’s flagship for almost a third of the price.

截至公告发布之日,尚无可靠的基准测试,因此尚不清楚该卡在客观上是否真的比2080 Ti更好,或者NVIDIA是否在扭曲营销。 运行的基准测试为4K,可能已启用RTX,这可能会使差距看起来比纯光栅化游戏中的差距大,因为基于Ampere的3000系列在光线追踪方面的性能是Turing的两倍。 但是,由于光线跟踪现在并不会影响性能,并且在最新一代的游戏机中得到支持,因此以几乎三分之一的价格使其运行速度与上一代旗舰产品一样,这是一个主要卖点。

It’s also unclear if the price will stay that way. Third-party designs regularly add at least $50 to the price tag, and with how high demand will likely be, it won’t be surprising to see it selling for $600 come October 2020.

还不清楚价格是否会保持这种趋势。 第三方设计通常会在价格标签上至少增加50美元,并且由于需求量可能很高,看到它在2020年10月以600美元的价格出售就不足为奇了。

Just above that is the RTX 3080 at $699, which should be twice as fast as the RTX 2080, and come in around 25-30% faster than the 3080.

略高于此价格的是售价为699美元的RTX 3080 ,应该是RTX 2080的两倍,并且比3080快25-30%。

Then, at the top end, the new flagship is the RTX 3090, which is comically huge. NVIDIA is well aware, and referred to it as a “BFGPU,” which the company says stands for “Big Ferocious GPU.”

然后,在高端,新的旗舰产品是RTX 3090 ,它可笑地很大。 NVIDIA非常了解,并将其称为“ BFGPU”,该公司称其为“ Big Ferocious GPU”。

带gpu的板卡_带gpu的板卡_04

NVIDIA 英伟达

NVIDIA didn’t show off any direct performance metrics, but the company showed it running 8K games at 60 FPS, which is seriously impressive. Granted, NVIDIA is almost certainly using DLSS to hit that mark, but 8K gaming is 8K gaming.

NVIDIA没有炫耀任何直接的性能指标,但该公司展示了它以60 FPS的速度运行8K游戏,这一点令人印象深刻。 诚然,NVIDIA几乎肯定会使用DLSS达到这一目标,但是8K游戏就是8K游戏。

Of course, there will eventually be a 3060, and other variations of more budget-oriented cards, but those usually come in later.

当然,最终将有3060,以及其他更多面向预算的卡,但通常会在以后出现。

To actually cool the things, NVIDIA needed a revamped cooler design. The 3080 is rated for 320 watts, which is quite high, so NVIDIA has opted for a dual fan design, but instead of both fans vwinf placed on the bottom, NVIDIA has put a fan on the top end where the back plate usually goes. The fan directs air upward towards the CPU cooler and top of the case.

为了真正冷却物体,NVIDIA需要改进的冷却器设计。 3080的额定功率为320瓦,这是相当高的,因此NVIDIA选择了双风扇设计,但是NVIDIA并没有在底部放置两个风扇,而是将风扇放在了底部的顶部。 风扇将空气向上引向CPU散热器和机箱顶部。

带gpu的板卡_神经网络_05

NVIDIA 英伟达

Judging by how much performance can be affected by bad airflow in a case, this makes perfect sense. However, the circuit board is very cramped because of this, which will likely affect third-party sale prices.

判断机箱中的不良气流会影响多少性能,这是很合理的。 但是,由于这个原因,电路板非常狭窄,可能会影响第三方销售价格。

(DLSS: A Software Advantage)

Ray tracing isn’t the only benefit of these new cards. Really, it’s all a bit of a hack—the RTX 2000 series and 3000 series isn’t that much better at doing actual ray tracing, compared to older generations of cards. Ray tracing a full scene in 3D software like Blender usually takes a few seconds or even minutes per frame, so brute-forcing it in under 10 milliseconds is out of the question.

光线跟踪并不是这些新卡的唯一优点。 真的,这一切都有点下锅的RTX 2000系列和3000系列的不说,在做实际的光线追踪好得多,比卡老一代。 在3D软件(如Blender)中,光线跟踪整个场景通常每帧要花费几秒钟甚至几分钟,因此在10毫秒内强行将其强行排除是不可能的。

Of course, there is dedicated hardware for running ray calculations, called the RT cores, but largely, NVIDIA opted for a different approach. NVIDIA improved the denoising algorithms, which allow the GPUs to render a very cheap single pass that looks terrible, and somehow—through AI magic—turn that into a something that a gamer wants to look at. When combined with traditional rasterization-based techniques, it makes for a pleasant experience enhanced by raytracing effects.

当然,有专门用于运行射线计算的硬件,称为RT内核,但在很大程度上,NVIDIA选择了另一种方法。 NVIDIA改进了降噪算法,使GPU可以渲染非常便宜的单次传递,看起来很糟糕,并且通过AI魔术以某种方式将其转变为游戏玩家想要看的东西。 当与传统的基于栅格化的技术结合使用时,通过光线跟踪效果可以增强令人愉悦的体验。

带gpu的板卡_人工智能_06

NVIDIA 英伟达

However, to do this fast, NVIDIA has added AI-specific processing cores called Tensor cores. These process all the math required to run machine learning models, and do it very quickly. They’re a total game-changer for AI in the cloud server space, as AI is used extensively by many companies.

但是,为了快速完成此任务,NVIDIA添加了称为Tensor内核的AI专用处理内核。 这些处理运行机器学习模型所需的所有数学运算,并且很快完成。 它们是云服务器领域 AI的彻底改变者,因为AI被许多公司广泛使用。

Beyond denoising, the main use of the Tensor cores for gamers is called DLSS, or deep learning super sampling. It takes in a low-quality frame and upscales it to full-native quality. This essentially means you can game with 1080p level framerates, while looking at a 4K picture.

除去噪外,Tensor内核对游戏玩家的主要用途称为DLSS,即深度学习超级采样。 它采用了低质量的框架,并将其升级为全原生质量。 从本质上讲,这意味着您可以在观看4K图片的同时以1080p级帧速率进行游戏。

This also helps out with ray-tracing performance quite a bit—benchmarks from PCMag show an RTX 2080 Super running Control at ultra quality, with all ray-tracing settings cranked to the max. At 4K, it struggles with only 19 FPS, but with DLSS on, it gets a much better 54 FPS. DLSS is free performance for NVIDIA, made possible by the Tensor cores on Turing and Ampere. Any game that supports it and is GPU-limited can see serious speedups just from software alone.

这也极大地改善了光线追踪性能-PCMag的基准测试显示,RTX 2080 Super运行控制具有超高品质,所有光线追踪设置均达到最大值。 在4K时,它只能以19 FPS挣扎,但是在启用DLSS的情况下,它可以获得更好的54 FPS。 DLSS是NVIDIA的免费性能,图灵和安培上的Tensor内核使DLSS成为可能。 任何支持它且受GPU限制的游戏都可以仅从软件中就可以看到严重的加速。

DLSS isn’t new, and was announced as a feature when the RTX 2000 series launched two years ago. At the time, it was supported by very few games, as it required NVIDIA to train and tune a machine-learning model for each individual game.

DLSS并不是新的,它是两年前RTX 2000系列发布时宣布的一项功能。 当时,它很少有游戏支持,因为它要求NVIDIA为每个单独的游戏训练和调整机器学习模型。

However, in that time, NVIDIA has completely rewritten it, calling the new version DLSS 2.0. It’s a general-purpose API, which means any developer can implement it, and it’s already being picked up by most major releases. Rather than working on one frame, it takes in motion vector data from the previous frame, similarly to TAA. The result is much sharper than DLSS 1.0, and in some cases, actually looks better and sharper than even native resolution, so there’s not much reason to not turn it on.

但是,到那时NVIDIA已经完全重写了它,称为新版本DLSS 2.0。 这是一个通用的API,这意味着任何开发人员都可以实现它,并且大多数主要版本已经采用了它。 与其处理一个帧,不如从TAA接收前一帧的运动矢量数据。 其结果是比DLSS 1.0更加清晰,在某些情况下,实际上看起来更好 ,更清晰,甚至比原始分辨率,所以没有太多的理由不将其打开。

There is one catch—when switching scenes entirely, like in cutscenes, DLSS 2.0 must render the very first frame at 50% quality while waiting on the motion vector data. This can result in a tiny drop in quality for a few milliseconds. But, 99% of everything you look at will be rendered properly, and most people don’t notice it in practice.

有一个陷阱-当像切换场景那样完全切换场景时,DLSS 2.0必须在等待运动矢量数据时以50%的质量渲染第一帧。 这可能会导致质量在几毫秒内略有下降。 但是,您看到的所有内容中有99%可以正确渲染,而且大多数人在实践中不会注意到它。

(Ampere Architecture: Built For AI)

Ampere is fast. Seriously fast, especially at AI calculations. The RT core is 1.7x faster than Turing, and the new Tensor core is 2.7x faster than Turing. The combination of the two is a true generational leap in raytracing performance.

安培速度很快。 非常快,尤其是在AI计算中。 RT内核比Turing快1.7倍,而新的Tensor内核比Turing快2.7倍。 两者的结合是射线追踪性能的真正飞跃。

带gpu的板卡_游戏_07

NVIDIA 英伟达

Earlier this May, NVIDIA released the Ampere A100 GPU, a data center GPU designed for running AI. With it, they detailed a lot of what makes Ampere so much faster. For data-center and high-performance computing workloads, Ampere is in general around 1.7x faster than Turing. For AI training, it’s up to 6 times faster.

NVIDIA在五月初发布了Ampere A100 GPU ,这是一个专为运行AI而设计的数据中心GPU。 他们详细介绍了许多使Ampere更快的原因。 对于数据中心和高性能计算工作负载,安培一般比图灵快1.7倍。 对于AI培训,速度最高可提高6倍。

带gpu的板卡_神经网络_08

NVIDIA 英伟达

With Ampere, NVIDIA is using a new number format designed to replace the industry-standard “Floating-Point 32,” or FP32, in some workloads. Under the hood, every number your computer processes takes up a predefined number of bits in memory, whether that’s 8 bits, 16 bits, 32, 64, or even larger. Numbers that are larger are harder to process, so if you can use a smaller size, you’ll have less to crunch.

借助Ampere,NVIDIA在某些工作负载中使用了一种新的数字格式,旨在替代行业标准的“浮点32”或FP32。 在后台,计算机处理的每个数字都会占用内存中预定义的位数,无论是8位,16位,32、64甚至更大。 较大的数字较难处理,因此,如果可以使用较小的数字,则运算量将更少。

FP32 stores a 32-bit decimal number, and it uses 8 bits for the range of the number (how big or small it can be), and 23 bits for the precision. NVIDIA’s claim is that these 23 precision bits aren’t entirely necessary for many AI workloads, and you can get similar results and much better performance out of just 10 of them. Reducing the size down to just 19 bits, instead of 32, makes a big difference across many calculations.

FP32存储一个32位十进制数字,它使用8位数字范围(可以是大或小),并使用23位精度。 NVIDIA声称,这23个精度位对于许多AI工作负载并不是完全必要的,并且仅从其中的10个中,您就可以获得类似的结果和更好的性能。 在许多计算中,将大小减小到仅19位而不是32位将产生很大的不同。

This new format is called Tensor Float 32, and the Tensor Cores in the A100 are optimized to handle the weirdly sized format. This is, on top of die shrinks and core count increases, how they’re getting the massive 6x speedup in AI training.

这种新格式称为Tensor Float 32,并且A100中的Tensor Core经过优化以处理奇怪大小的格式。 这就是,除了芯片缩小和核心数量增加以外,他们如何在AI培训中获得6倍的大规模提速。

带gpu的板卡_游戏_09

NVIDIA 英伟达

On top of the new number format, Ampere is seeing major performance speedups in specific calculations, like FP32 and FP64. These don’t directly translate to more FPS for the layman, but they’re part of what makes it nearly three times faster overall at Tensor operations.

在新的数字格式之上,Ampere在诸如FP32和FP64的特定计算中看到了显着的性能提升。 这些并不能直接为外行人员带来更多的FPS,但它们是使Tensor整体运行速度提高近三倍的原因之一。

带gpu的板卡_神经网络_10

NVIDIA 英伟达

Then, to speed up calculations even more, they’ve introduced the concept of fine-grained structured sparsity, which is a very fancy word for a pretty simple concept. Neural networks work with large lists of numbers, called weights, which effect the final output. The more numbers to crunch, the slower it will be.

然后,为了进一步加快计算速度,他们引入了细粒度结构化稀疏性的概念,对于一个非常简单的概念而言,这是一个非常花哨的词。 神经网络处理大量数字,称为权重,这些数字会影响最终输出。 要处理的数字越多,速度就越慢。

However, not all of these numbers are actually useful. Some of them are literally just zero, and can basically be thrown out, which leads to massive speedups when you can crunch more numbers at the same time. Sparsity essentially compresses the numbers, which takes less effort to do calculations with. The new “Sparse Tensor Core” is built to operate on compressed data.

但是,并非所有这些数字实际上都有用。 其中一些实际上只是零,并且基本上可以扔掉,当您可以同时处理更多数字时,这会导致大量加速。 稀疏性实质上压缩了数字,从而减少了计算工作量。 新的“ Sparse Tensor Core”构建为可对压缩数据进行操作。

Despite the changes, NVIDIA says that this shouldn’t noticeably affect accuracy of trained models at all.

尽管进行了更改,但NVIDIA表示这完全不会显着影响经过训练的模型的准确性。

带gpu的板卡_带gpu的板卡_11

NVIDIA 英伟达

For Sparse INT8 calculations, one of the smallest number formats, the peak performance of a single A100 GPU is over 1.25 PetaFLOPs, a staggeringly high number. Of course, that’s only when crunching one specific kind of number, but it’s impressive nonetheless.

对于稀疏INT8计算(最小的数字格式之一),单个A100 GPU的峰值性能超过1.25 PetaFLOP,这是一个惊人的高数字。 当然,这只是在处理一种特定类型的数字时,但这仍然令人印象深刻。

翻译自: https://www.howtogeek.com/688625/nvidias-rtx-3000-series-gpus-heres-whats-new/

rtx2070 gpu