Abstract

本cuDNN 8.0.4开发人员指南概述了cuDNN功能,如可自定义的数据布局、支持灵活的dimension ordering,striding,4D张量的子区域,这些张量用作其所有例程的输入和输出。这种灵活性可简单集成到任何神经网络实现中。             

要访问cuDNN API参考,请参阅cuDNN API参考指南。 

https://docs.nvidia.com/deeplearning/cudnn/api/index.html           

有关先前发布的cuDNN开发人员文档,请参阅cuDNN存档。

https://docs.nvidia.com/deeplearning/cudnn/archives/index.html

1. Overview

NVIDIA® CUDA® Deep Neural Network library™ (cuDNN) 提供了深度神经网络GPU加速库。该cuDNN数据类型参考API描述了所有类型和枚举的 cuDNN库API。该cuDNN API参考描述了所有程序的API cuDNN库。

该cuDNN库以及这个API文档已经被分成以下库:

  • cudnn_ops_infer -该实体包含与cuDNN上下文创建和销毁,张量描述符管理器,张量实用程序以及常见ML算法的推理部分(例如批处理归一化,softmax,dropout等)有关的例程。
  • cudnn_ops_train -该实体包含通用的训练例程和算法,例如批量归一化,softmax,dropout等。 
  • cudnn_ops_train 库依赖于 cudnn_ops_infer。
  • cudnn_cnn_infer-该实体包含推理时所需的与卷积神经网络相关的所有例程。的 cudnn_cnn_infer 库依赖于 cudnn_ops_infer。
  • cudnn_cnn_train-该实体包含训练期间所需的与卷积神经网络相关的所有例程。的 cudnn_cnn_train 库依赖于 cudnn_ops_infer, cudnn_ops_train和 cudnn_cnn_infer。
  • cudnn_adv_infer-该实体包含所有其他功能和算法。这包括RNN,CTC loss和多线程attention。cudnn_adv_infer 库依赖于 cudnn_ops_infer。
  • cudnn_adv_train -该实体包含以下所有cudnn_adv_infer训练对象 。 cudnn_adv_train 库依赖于 cudnn_ops_infer, cudnn_ops_train和 cudnn_adv_infer。
  • cudnn -这是应用程序层和cuDNN代码之间的可选填充层。该层在运行时适时地为API开放了适配的库。
  • cudnn_ops_infer - This entity contains the routines related to cuDNN context creation and destruction, tensor descriptor management, tensor utility routines, and the inference portion of common ML algorithms such as batch normalization, softmax, dropout, etc.
  • cudnn_ops_train - This entity contains common training routines and algorithms, such as batch normalization, softmax, dropout, etc. The cudnn_ops_train library depends on cudnn_ops_infer.
  • cudnn_cnn_infer - This entity contains all routines related to convolutional neural networks needed at inference time. The cudnn_cnn_infer library depends on cudnn_ops_infer.
  • cudnn_cnn_train - This entity contains all routines related to convolutional neural networks needed during training time. The cudnn_cnn_train library depends on cudnn_ops_infer, cudnn_ops_train, and cudnn_cnn_infer.
  • cudnn_adv_infer - This entity contains all other features and algorithms. This includes RNNs, CTC loss, and Multihead Attention. The cudnn_adv_infer library depends on cudnn_ops_infer.
  • cudnn_adv_train - This entity contains all the training counterparts of cudnn_adv_infer. The cudnn_adv_train library depends on cudnn_ops_infer, cudnn_ops_train, and cudnn_adv_infer.
  • cudnn - This is an optional shim layer between the application layer and the cuDNN code. This layer opportunistically opens the correct library for the API at runtime.

2. Programming Model

cudn库公开了一个主机API,但是假设对于使用GPU的操作,可以从设备直接访问所需的数据。             

使用cuDNN的应用程序必须通过调用cudnnCreate()初始化库上下文的句柄。这个句柄被显式地传递给对GPU数据进行操作的每个后续库函数。一旦应用程序使用完cudndn,它就可以使用cudndestory()释放与库句柄关联的资源。这种方法允许用户在使用多个主机线程、gpu和CUDA流时显式地控制库的功能。             

例如,应用程序可以使用cudaSetDevice将不同的设备与不同的主机线程相关联,并且在每个主机线程中,使用一个唯一的cuDNN句柄,该句柄将库调用定向到与之关联的设备。因此,使用不同句柄进行的cudn库调用将自动在不同的设备上运行。             

假定与特定cuDNN关联的设备在相应的cudncreate()和cudndestory()调用之间保持不变。为了使cuDNN库在同一主机线程中使用不同的设备,应用程序必须通过调用cudaSetDevice()设置要使用的新设备,然后通过调用cudnnCreate()创建另一个cuDNN,cuDNN将与新设备相关联。

cuDNN API Compatibility

Beginning in cuDNN 7, the binary compatibility of a patch and minor releases is maintained as follows:

  • Any patch release x.y.z is forward or backward-compatible with applications built against another cuDNN patch release x.y.w (meaning, of the same major and minor version number, but having w!=z).
  • cuDNN minor releases beginning with cuDNN 7 are binary backward-compatible with applications built against the same or earlier patch release (meaning, an application built against cuDNN 7.x is binary compatible with cuDNN library 7.y, where y>=x).
  • Applications compiled with a cuDNN version 7.y are not guaranteed to work with 7.x release when y > x.

3. Convolution Formulas

This section describes the various convolution formulas implemented in convolution functions.

The convolution terms described in the table below apply to all the convolution formulas that follow.

cuDNN 功能模块解析_句柄

 

 cuDNN 功能模块解析_应用程序_02

 

人工智能芯片与自动驾驶