gpu中专用硬件单元有哪些专用gpu内存是显存吗

转载

mob64ca1401b651 2024-04-27 16:12:39

文章标签 gpu中专用硬件单元有哪些 Jetson TX2 英伟达 Memory sed 文章分类 游戏开发

一、参考资料

CUDA for Tegra 知否，NVIDIA Jetson 产品显存到底多少？

二、重要概念

1. integrated GPU (iGPU)

集成显卡

2. discrete GPU (dGPU)

独立显卡

3. device memory

GPU显存

4. host memory

普通内存

Jetson系列（包括TX1，TX2，Xavier等）用的都是SoC芯片，CPU和GPU集成在一个芯片上，自然用的是同一个内存，因此GPU可以直接访问内存上的数据（100多GB/s）而不用受到PCIE的限制（10多GB/s)。

NVIDIA嵌入式产品的核心模组就是这种不能拔出来的，因为物理（die）上就是合并在一起的，因此它们也不存在独立的显存和内存，CPU部分和GPU部分公用存储器的。

在CUDA编程中可以舍弃cudaMemcpy系列函数（相当于在同一个内存上徒劳地复制了一遍），转而使用zero copy或者统一内存unified memory。

三、NVIDIA® Tegra®架构

Tegra是一款通用处理器（即CPU，NVIDIA称为“Computer on a chip”片上计算机），是一种系统芯片(SoC)，集CPU、GPU、南桥以及北桥芯片于一身，基于ARM 11处理器架构。能够为便携设备提供高性能、低功耗体验。

它最大的特点是将CPU和GPU等核心整合在一颗微小的芯片上，在提供更为强劲渲染性能的同时，体积和功耗却只有Atom的1/10左右，因此可以设计出更为小巧的、集成上网/影音/游戏/GPS等众多功能的手持设备，同时续航时间也将获得大幅提升。

Tegra是一款完全重新开发的片上系统产品，耗费了工程师1000人/年的开发，能够将移动设备的功耗降低百倍。

Tegra是一种采用单片机系统设计（system-on-a-chip）芯片，它集成了ARM架构处理器和NVIDIA的GeforceGPU，并内置了其它功能，产品主要面向小型设备。和Intel以PC为起点的x86架构相比，ARM架构的Tegra更像是以手机处理器为起点做出的发展。它不能运行x86PC上的WindowsXP等操作系统，但在手机上应用多年的ARM架构轻量级操作系统更能适应它高速低功耗的需求。

为了更好地在中国地区开展NVIDIA（英伟达™）的品牌推广活动，使NVIDIA（英伟达™）产品在中国地区更加深入人心并被广泛记忆，从2009年11月30日起，NVIDIA（英伟达™）Tegra™正式在中国启用中文名－－“图睿™”。“NVIDIA（英伟达™）Tegra™（图睿™）”的中英文组合名称，被使用于所有的NVIDIA（英伟达™）公关、销售及其他宣传材料中，以及NVIDIA（英伟达™）合作伙伴的公关、销售及其他宣传材料中。“图睿™”须与“Tegra™”英文名称组合使用，不可以单独使用，且顺序不可以颠倒，即“Tegra™”在前，“图睿™”在后。

四、内存共享

In Tegra® devices, both the CPU (Host) and the iGPU share SoC DRAM memory.

gpu中专用硬件单元有哪些专用gpu内存是显存吗_gpu中专用硬件单元有哪些

In Tegra®, device memory, host memory, and unified memory are allocated on the same physical SoC DRAM. On a dGPU, device memory is allocated on the dGPU DRAM.

Memory Type	CPU	iGPU	Tegra®-connected dGPU
Device memory	Not directly accessible	Cached	Cached
Pageable host memory	Cached	Not directly accessible	Not directly accessible
Pinned host memory	Uncached where compute capability is less than 7.2.Cached where compute capability is greater than or equal to 7.2.	Uncached	Uncached
Unified memory	Cached	Cached	Not supported

On Tegra®, because device memory, host memory, and unified memory are allocated on the same physical SoC DRAM, duplicate memory allocations and data transfers can be avoided.

1. device memory 显存分配

Host allocated memory = Total used physical memory – Device allocated memory
If (Host allocated memory < Free Swap Space) then Device allocatable memory = Total Physical Memory – already allocated device memory
If (Host allocated memory > Free Swap Space) then Device allocatable memory = Total Physical Memory – (Host allocated memory - Free swap space)

2. 查看 device memory

Device allocated memory is memory already allocated on the device. It can be obtained from the NvMapMemUsed field in /proc/meminfo or from the total field of /sys/kernel/debug/nvmap/iovmm/clients.
Total used physical memory can be obtained using the free -mcommand. The used field in row Mem represents this information.
Total Physical memory is obtained from the MemTotal field in /proc/meminfo.
Free swap space can be find by using the free -m command. The free field in the Swap row represents this information.
If the free command is not available, the same information can be obtained from /proc/meminfo as:

Total Used physical memory = MemTotal – MemFree
Free swap space = SwapFree

五、Unified Memory 统一内存寻址

Unified Memory

简化了代码编写和内存模型。
可以在CPU端和GPU端公用一个指针，不用单独各自分配空间。方便管理，减少代码量。
使用cudaMallocManaged分配内存，而非malloc。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：opencv 二值化仿色扩散 opencv 二值化轮廓提取

下一篇：基于transformer的目标检测模型目标检测attention

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯