在本教程中,我们将学习矩阵,变换,世界/视图/投影空间矩阵以及每次绘制的常量缓冲区。

概述

在本教程中,我们将学习矩阵,变换,世界/视图/投影空间矩阵以及每次绘制的常量缓冲区。

  1. 矩阵
  2. 变换
  3. 世界/视图/投影空间
  4. 每次绘制的常量缓冲区

好吧,让我们从矩阵开始。

矩阵

矩阵是二维数组。 我们可以使用矩阵来表示包括平移,旋转和缩放的变换,以及包括世界(所有变换),视图和投影的空间。 在3D图形中,我们使用4x4矩阵,但是4x3矩阵可以用于蒙皮网格,将网格发送到GPU时可以节省一些带宽。

向量与矩阵乘法

通过将顶点与每个空间矩阵相乘,我们能够将顶点从对象空间转换为世界空间,摄像机空间,最后转换为投影空间。 顶点位置始于对象空间,该对象空间是在3D建模程序中创建的空间。 通常,对象空间中的对象以点(0,0,0)为中心。
向量中的元素数必须等于矩阵中的行数,乘法才能起作用。 结果将是另一个向量,该向量等于矩阵中的列数。
要相乘,我们得到向量与矩阵中每一列的点积。 点积将为我们提供一个标量变量。 这是计算点积的方法:

[x1, y1, z1, w1] . [x2, y2, z2, w2] = [x1*x2 + y1*y2 + z1*z2 + w1*w2] = a scalar

列与向量的点积在最终向量中为我们提供了一个元素。 例如,第一列的点积和向量将给我们“ x”,第二列的点积和向量将给我们“ y”。 让我们看一个例子:

vector x matrix
               [ 5,  6,  7,  8]
[1, 2, 3, 4] x [ 9, 10, 11, 12] = [x, y, z, w]
               [13, 14, 15, 16]
               [17, 18, 19, 20]

dot product of the vector and first row
               [ 5]
[1, 2, 3, 4] . [ 9] = [1*5 + 2*9  + 3*13 + 4*17] = [5 + 18 + 39 + 68] =  130 = x
               [13]
               [17]

dot product of the vector and second row
               [ 6]
[1, 2, 3, 4] . [10] = [1*6 + 2*10 + 3*14 + 4*18] = [6 + 20 + 42 + 72] =  140 = y
               [14]
               [18]

dot product of the vector and third row
               [ 7]
[1, 2, 3, 4] . [11] = [1*7 + 2*11 + 3*15 + 4*19] = [7 + 22 + 45 + 76] =  150 = z
               [15]
               [19]

dot product of the vector and fourth row
               [ 8]
[1, 2, 3, 4] . [12] = [1*8 + 2*12 + 3*16 + 4*20] = [8 + 24 + 48 + 80] =  160 = w
               [16]
               [20]

the resulting vector is then:
[130, 140, 150, 160]

矩阵与矩阵乘法

矩阵与矩阵相乘将产生另一个矩阵,该矩阵是x行和y列,其中x是第二个矩阵中的行数,而y是第一个矩阵中的列数。 基本上,我们做的事情与向量矩阵乘法相同。 让我们看一个例子:

matrix x matrix
[21, 22, 23, 24]   [ 5,  6,  7,  8]   [_00, _01, _02, _03]
[25, 26, 27, 28] x [ 9, 10, 11, 12] = [_10, _11, _12, _13]
[29, 30, 31, 32]   [13, 14, 15, 16]   [_20, _21, _22, _23]
[33, 34, 35, 36]   [17, 18, 19, 20]   [_30, _31, _32, _33]

Get the first row of the final matrix
Get the first element of this row in the final matrix:
                   [ 5]
[21, 22, 23, 24] . [ 9] = [21*5 + 22*9  + 23*13 + 24*17] = [105 + 198 + 299 + 408] =  1010 = _00
                   [13]
                   [17]

Get the second element of this row in the final matrix:
                   [ 6]
[21, 22, 23, 24] . [10] = [21*6 + 22*10 + 23*14 + 24*18] = [126 + 220 + 322 + 432] =  1100 = _01
                   [14]
                   [18]

Get the third element of this row in the final matrix:
                   [ 7]
[21, 22, 23, 24] . [11] = [21*7 + 22*11 + 23*15 + 24*19] = [147 + 242 + 345 + 456] =  1190 = _02
                   [15]
                   [19]

Get the fourth element of this row in the final matrix:
                   [ 8]
[21, 22, 23, 24] . [12] = [21*8 + 22*12 + 23*16 + 24*20] = [168 + 264 + 368 + 480] =  1280 = _03
                   [16]
                   [20]

first row of final matrix is:
[1010, 1100, 1190, 1280]

now lets get the second row of the final matrix
Get the first element of this row in the final matrix:
                   [ 5]
[25, 26, 27, 28] . [ 9] = [25*5 + 26*9  + 27*13 + 28*17] = [125 + 234 + 351 + 476] =  1186 = _00
                   [13]
                   [17]

Get the second element of this row in the final matrix:
                   [ 6]
[25, 26, 27, 28] . [10] = [25*6 + 26*10 + 27*14 + 28*18] = [150 + 260 + 378 + 504] =  1292 = _01
                   [14]
                   [18]

Get the third element of this row in the final matrix:
                   [ 7]
[25, 26, 27, 28] . [11] = [25*7 + 26*11 + 27*15 + 28*19] = [175 + 286 + 405 + 532] =  1398 = _02
                   [15]
                   [19]

Get the fourth element of this row in the final matrix:
                   [ 8]
[25, 26, 27, 28] . [12] = [25*8 + 26*12 + 27*16 + 28*20] = [200 + 312 + 432 + 560] =  1504 = _03
                   [16]
                   [20]

first row of final matrix is:
[1186, 1292, 1398, 1504]

I won't do the last two rows here, but i'll give you the final result so if you want to try yourself you can compare
[1010, 1100, 1190, 1280]
[1186, 1292, 1398, 1504]
[1362, 1484, 1606, 1728]
[1538, 1676, 1814, 1952]

您可以看到向量正与矩阵中的每一列相乘。 矩阵相乘的顺序很重要,我们将在稍后显示。

行主/列主排序

矩阵可以行主要或列主要顺序存储。

Row Major Matrix          Column Major Matrix

[_00, _01, _02, _03]          [_00, _10, _20, _30]
[_10, _11, _12, _13]          [_01, _11, _21, _31]
[_20, _21, _22, _23]          [_02, _12, _22, _32]
[_30, _31, _32, _33]          [_03, _13, _23, _33]

如上所示,行主矩阵为每一行都有一个向量([_00,_01,_02,_03]是向量),这意味着向量中的值在内存中彼此相邻。主列在每一列中都有向量,因此每个向量中的值是分开的。
DirectX数学库按行主要顺序存储矩阵。
HLSL按列的主要顺序存储矩阵,以便通过将向量乘以矩阵中的每一行而不是每一列,可以轻松地进行向量矩阵乘法。这很方便,因为HLSL现在能够在GPU寄存器中存储整列(现在在HLSL中为一行),以便在一条指令中进行计算。它还利用SSE的优势,SSE是SIMD或单指令多数据的扩展,因此它可以在单个cpu周期内进行点积运算。
尽管HLSL按列主要顺序存储矩阵,但实际上它按行主要顺序读取矩阵,这就是它能够从矩阵中抓取一行并将其存储在寄存器中以进行计算的方式。
让我们快速查看向量乘以矩阵的HLSL汇编代码:
这是我们从应用程序传递的矩阵,按行主要顺序排列:

[ 1,  2,  3,  4]
[ 5,  6,  7,  8]
[ 9, 10, 11, 12]
[13, 14, 15, 16]

HLSL就是这样存储矩阵的:

[ 1,  5,  9, 13]
[ 2,  6, 10, 14]
[ 3,  7, 11, 15]
[ 4,  8, 12, 16]

HLSL代码

output.pos = mul(input.pos, wvpMat);

HLSL汇编代码:

0: dp4 r0.x, v0.xyzw, cb0[0].xyzw  // r0.x <- output.pos.x
1: dp4 r0.y, v0.xyzw, cb0[1].xyzw  // r0.y <- output.pos.y
2: dp4 r0.z, v0.xyzw, cb0[2].xyzw  // r0.z <- output.pos.z
3: dp4 r0.w, v0.xyzw, cb0[3].xyzw  // r0.w <- output.pos.w

cb0 [0]在HLSL中现在是这个(这是我们应用程序中的一列,但现在在HLSL中是一行,这使得乘法更容易):

[ 1,  5,  9, 13]

cb0 [0]是矩阵的第一行。 xyzw是它从该行存储在寄存器中的值。 由于HLSL已按列主顺序存储矩阵,因此它现在可以通过一条指令快速将每个“列”存储在寄存器中。 我猜dp4是点积的首选指令,因为它执行得最好。
这是我做矩阵x向量时的一些HLSL汇编(反向词,但在将矩阵发送给HLSL之前不必转置矩阵):

HLSL代码:

output.pos = mul(wvpMat, input.pos);

HLSL汇编:

0: mul r0.xyzw, v0.xxxx, cb0[0].xyzw
1: mul r1.xyzw, v0.yyyy, cb0[1].xyzw
2: add r0.xyzw, r0.xyzw, r1.xyzw
3: mul r1.xyzw, v0.zzzz, cb0[2].xyzw
4: add r0.xyzw, r0.xyzw, r1.xyzw
5: mul r1.xyzw, v0.wwww, cb0[3].xyzw

这里有更多说明可以完成相同的操作,而dp4根本没有使用。 这就是HLSL按一列主要顺序存储矩阵的原因,这意味着如果我们使用DirectX Math库,则需要先转置矩阵,然后再将它们发送到着色器,这使我们可以转置矩阵。

矩阵转置

矩阵转置实际上非常简单。 它所做的就是将行主序矩阵更改为列主序,将列主序更改为行主序。

矩阵类型

单位矩阵

单位矩阵是一个矩阵,当与另一个矩阵相乘时,将产生另一个矩阵。 通常,您总是希望将矩阵初始化为一个单位矩阵,如下所示:

单位矩阵:

[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 1, 0]
[0, 0, 0, 1]

注意对角元素为何为1,而其余元素为零? 就像我上面提到的,乘以这个矩阵的任何结果都会得到乘以它的结果。 无论是使用行主布局还是列主布局,单位矩阵都将始终看起来相同。
您可以使用DirectX Math库的XMMatrixIdentity()方法创建一个单位矩阵。

变换

变换是描述平移,旋转和缩放的矩阵。 要获得世界矩阵,您需要将这些矩阵相乘,从而将一个对象带出对象空间,并进入“世界”空间。
与这些矩阵(以及所有矩阵)相乘的顺序非常重要。 例如,如果要将对象旋转90度,然后将对象移动10个单位,则可以将旋转矩阵放置在平移矩阵(rotmattransmat)之前。 如果要以相反的顺序进行乘法(transmatrotmat),它将首先将对象移动10个单位,然后将对象绕点0,0,0旋转。

平移矩阵

平移矩阵通过更改3D空间中的位置来“移动”对象。 对象始于对象空间,该对象空间通常围绕点0,0,0居中。 要从0,0,0位置移动对象,您需要将对象中的每个顶点位置乘以平移矩阵。 平移矩阵如下所示:

[1, 0, 0, x]
[0, 1, 0, y]
[0, 0, 1, z]
[0, 0, 0, 1]

其中x,y和z是您希望对象移动到的位置。
您可以使用DirectX Math库的XMMatrixTranslation()方法来创建平移矩阵。

旋转矩阵

有三种不同的旋转矩阵,一种用于x轴,一种用于y轴,另一种用于z轴。
旋转始终围绕点0,0,0旋转,这意味着如果只想旋转一个对象,则必须先旋转,然后使用平移矩阵移动该对象。 例如,如果您希望某个对象围绕另一个对象“旋转”,则需要先将该对象平移到您希望该对象围绕另一个对象旋转的距离,然后旋转该对象,最后再将该对象再次平移到您所需要的位置 希望它旋转。
X轴旋转矩阵

[1,      0,       0, 0]
[0, cos(A), -sin(A), 0]
[0, sin(A),  cos(A), 0]
[0,      0,       0, 1]

Y轴旋转矩阵

[ cos(A), 0, sin(A), 0]
[      0, 1,      0, 0]
[-sin(A), 0, cos(A), 0]
[      0, 0,      0, 1]

Z轴旋转矩阵

[cos(A), -sin(A), 0, 0]
[sin(A),  cos(A), 0, 0]
[      0,      0, 1, 0]
[      0,      0, 0, 1]

其中A是您要旋转的弧度角
DirectX Math库中有几种不同的旋转矩阵方法:

  • XMMatrixRotationX()
  • XMMatrixRotationY()
  • XMMatrixRotationZ()
  • XMMatrixRotationRollPitchYawFromVector()
  • XMMatrixRotationRollPitchYaw()
  • XMMatrixRotationQuaternion()
  • XMMatrixRotationAxis()

缩放矩阵

缩放矩阵相对于点0,0,0缩放对象。 这意味着您几乎总是想在进行任何其他转换之前先进行缩放。 例如,如果您先平移一个对象,然后对其进行缩放,则最终看起来会被拉伸,而不是很好地缩放并保持其原始形状。

[x, 0, 0, 0]
[0, y, 0, 0]
[0, 0, z, 0]
[0, 0, 0, 1]

其中x,y和z是您要缩放每个轴的倍数。 将x,y,z设置为1将保持对象的原始大小,并且实际上会使该矩阵成为一个单位矩阵。 同样,请记住,通常您将需要在进行任何其他转换之前先进行缩放。
您可以使用DirectX Math库的XMMatrixScaling()方法来创建缩放矩阵。

世界/视图/投影空间矩阵

在3DS建模程序(如3DS Max)中创建对象时,将在对象空间中创建对象。 加载到程序中的3D模型始于对象空间,您可以通过将每个顶点与要移动到的空间相乘来在空间之间移动它,该空间最终是通过图形管道传递时的投影空间。
要获得从对象空间到投影空间的3D模型,必须首先将所有顶点乘以世界矩阵,该世界矩阵包含在虚拟世界中对其进行平移,缩放和旋转的变换。 之后,您将顶点乘以代表相机的视图矩阵。 进入视图空间后,您将所有顶点乘以投影矩阵,这会将它们移动到投影空间中。

世界矩阵

世界矩阵是变换矩阵相乘的组合,将3D模型从对象空间移动到虚拟世界的空间。 我们在上面讨论了转换。
因此,要获得世界矩阵,请以正确的顺序乘以所有变换矩阵,然后得出的矩阵将是世界空间矩阵。
场景中的每个对象都有其自己的世界矩阵,这意味着包含该矩阵的常量缓冲区将需要更新每个 3D模型。

视图空间

视图空间实际上是相机空间。 在3D图形中,当整个世界在相机周围移动时,数学变得简单得多。 我们可以通过将所有内容都乘以一个视图空间矩阵来实现这一点,该矩阵包含相机的位置,所看方向,右向量和上向量。 当您在虚拟世界中移动时,您实际上并不是在移动相机,而是按照“本应”一直在移动相机的相反方式移动所有东西。
这是视图空间矩阵的样子:

[right.x, up.x, forward.x, position.x]
[right.y, up.y, forward.y, position.y]
[right.z, up.z, forward.z, position.z]
[0,       0,    0,         1         ]

右向量,上向量和前向量是归一化向量(也称为单位向量),这意味着它们的长度为1.0个单位。 描述虚拟世界中相机的正确方向,向上的方向和向前的方向(相机所面对的方向)。 位置矢量是x,y,z坐标,描述了相机在虚拟世界中的位置。
通过将世界空间顶点乘以视图空间矩阵,可以将这些顶点从世界空间移至摄相机空间。
您可以使用DirectX Math库的XMMatrixLookAtLH()方法来获取视图矩阵。

投影空间

投影空间有两种类型,透视投影和正交投影。

正投影

正交投影是一个投影空间,无论对象距相机有多远,它的大小总是相同的。 如果您曾经使用过3D建模程序,则将看到上,左,右,前等的视口。
有多种方法来构建正交投影矩阵,我将向您展示Microsoft如何使用XMMatrixOrthographicLH()函数在DirectX数学库中实现它:

w = 2/width
h = 2/height
a = 1.0f / (FarZ-NearZ)
b = -a * NearZ

orthographic projection matrix
[w, 0, 0, 0]
[0, h, 0, 0]
[0, 0, a, 0]
[0, 0, b, 1]

width和height是查看窗口(视口)的宽度和高度。
FarZ是您无法看到的最远的对象。 现在,将渲染比FarZ更远的距离。
NearZ是对象在不渲染之前最接近相机的位置。 不会比NearZ更接近相机(包括相机后方)的任何东西。
您可以使用DirectX Math库的XMMatrixOrthographicLH()方法来获取正交投影矩阵

透视投影

透视投影是我们人类如何看待世界。 距离较远的物体似乎比靠近我们的眼睛的物体小。 在我们的虚拟世界中,我们将需要这种效果,因此我们将顶点(在视图空间中)乘以透视投影矩阵。
有很多不同的方法来构造透视投影矩阵,因此,我将仅展示当您使用XMMatrixPerspectiveFovLH()函数时DirectX Math库是如何做到的。

aspectRatio = width/height
h = 1 / tan(fovy*0.5)
w = h / aspectRatio
a = zfar / (zfar - znear)
b = (-znear * zfar) / (zfar - znear)

perspective projection matrix
[w, 0, 0, 0]
[0, h, 0, 0]
[0, 0, a, 1]
[0, 0, b, 0]

您可以使用DirectX Math库的XMMatrixPerspectiveFovLH()或XMMatrixPerspectiveLH()方法来获取透视投影矩阵

从局部空间到投影空间

您可以将多个矩阵相乘以节省空间并通过存储结果矩阵来计算周期。 例如,由于投影矩阵通常不经常更改,并且视图矩阵通常每帧仅更改一次,因此您可以每帧计算一次视图/投影矩阵,然后将每个对象的世界空间乘以该矩阵,而不用乘以worldviewprojection 每次。
综上所述,要获得从对象(又称局部)空间到投影空间的3d模型,我们将每个顶点乘以世界,然后是视图,然后是投影。 它看起来像这样:

finalvertex.pos = vertex.pos * worldMatrix * viewMatrix * projectionMatrix;

在我们的顶点着色器中,我们实际上将使用世界/视图/投影矩阵(包含所有三个空间的单个矩阵)将每个顶点移入投影空间,如下所示:

output.pos = mul(input.pos, wvpMat);
每次绘制(或对象)的常量缓冲区

我认为这需要一些解释。 主要是因为我刚开始时,在想起对齐要求之前,我几次将计算机崩溃了。
我将首先快速回顾一下根签名。

根签名

根签名的根源在于植物的根向植物的其余部分(管道)提供营养(数据)的想法。 根签名告诉管道的其余部分期望什么数据以及在哪里找到它。
创建根签名时,您可以使用三种类型的根参数:根常量,根描述符和描述符表。
根常量
根常量每个在根签名中占用1个DWORD的空间。 当着色器访问它们时,它们没有重定向,这意味着着色器可以比从根描述符或描述符表访问数据更快地访问它们。 浮点常量为4个字节,即1个DWORD,因此float4为16个字节,即4个DWORDS。
您将需要使用根常量来处理经常更改或需要尽快访问的数据。 根签名可以占用多少空间是有限制的,我们将在后面讨论。
您可以使用命令列表的SetGraphicsRoot32BitConstant()函数设置根签名。

根描述符

根描述符是一个64位GPU虚拟地址。它们指向存储资源的GPU上的内存块。这意味着它们每个占用2个DWORDS的空间。
由于根描述符只是一个虚拟地址,因此它们不会像描述符堆中的描述符那样进行越界检查,这意味着您需要确保着色器在通过根描述符访问数据时不会访问未初始化的内存。
着色器通过根描述符进行访问时,根描述符具有一种间接性。根描述符包含一个内存地址,因此着色器必须首先读取该地址,然后在该地址获取数据,使其成为一个间接地址。
根描述符中只能使用常量缓冲区视图(CBV)和包含32位FLOAT / UINT / SINT的SRV / UAV缓冲区。不能使用任何需要格式转换的缓冲区,例如Texture2D。因此只能通过描述符表访问纹理。
频繁更改的缓冲区(例如每个绘图或每个对象)是根描述符的良好候选者。一个示例是一个包含世界/视图/投影矩阵的常量缓冲区,每个对象/绘图调用都会对其进行更改。在本教程中,我们将使用根描述符来绑定包含世界/视图/投影矩阵的常量缓冲区。
您可以使用命令列表的SetGraphicsRootConstantBufferView()函数更改根描述符。

描述符表

描述符表包含当前绑定的描述符堆的偏移量的16位地址(任何时候都只能设置一个堆),以及一个表示要使用的描述符范围的数字。 描述符表仅需要1个DWORD的空间用于根签名,但是当着色器通过描述符表访问数据时,描述符表具有2个间接寻址。
着色器必须首先从描述符表获取偏移量到描述符堆中,然后从描述符表中的描述符获取内存位置,然后最后获取内存位置指向的数据,进行2次间接操作。
使用描述符表时,必须确保绑定了描述符堆。
诸如纹理之类的缓冲区将通过描述符表进行访问。

静态采样器

最后,我们有静态采样器。 这些是不变的采样器,直接存储在根签名中。 它们实际上存储在GPU内存中,因此它们不像根常量,根描述符或描述符表那样计入根签名的总内存限制。

版本控制

版本控制是一个术语,用于表示绑定到管道的数据何时需要针对每个正在执行的命令列表进行多次更改。例如,我们有一个包含世界/视图/投影矩阵的常量缓冲区。必须为每个绘图调用更改此矩阵。我们需要一种对该常量缓冲区进行版本控制的方法,以便在第二次绘制调用中对其进行更改时,第一个绘制调用仍将使用其所需的矩阵。
对于根常量和根描述符,我们将获得免费版本控制。这就是为什么您希望经常访问的数据是根描述符或根常量。每次更改根参数(例如根描述符)时,DirectX都会制作根签名的完整副本并更改您需要的参数,以便先前的绘制调用仍将使用先前设置的参数。
通过描述符表访问数据时,我们必须进行自己的版本控制。有几种不同的方法可以做到这一点,但是最简单的方法是每次需要更改绘制调用的数据时(通过描述符表访问数据),都将描述符添加到描述符堆的末尾,或者,您可以重用陈旧的描述符(已经使用过但肯定现在不使用的描述符)。

根签名限制

DirectX 12驱动程序将根签名限制为64个DWORDS。 还有更多的内容,例如某些硬件如何具有16个DWORDS的专用根签名内存,当它经过那16个DWORDS时,它使用16个DWORDS中的一个作为大块内存的地址,其余的根 签名是。 因此,您将需要在根签名开始时进行最频繁更改的参数。 在这些设备的前16个DWORD中更改根参数时,将仅对根签名的前16个DWORD进行版本控制。
硬件分为三层,每一层都有对资源绑定的限制,较低的则为较小的限制,您可以在此处找到:feature levels in d3d

创建根签名

您可以在运行时在代码中创建根签名(我们在这些教程中这样做),也可以直接在着色器代码中创建根签名。 当在着色器中定义了根签名时,只有一个着色器可以具有根签名代码,或者确实定义了根签名的着色器必须具有完全相同的根签名。
创建PSO时会使用根签名,因为硬件需要知道着色器将期望哪些数据,在何处以及如何存储着色器,以便可以优化流水线状态。 着色器默认情况下将使用在着色器代码中定义的根签名(如果存在),但是当您在代码中定义根签名并在创建PSO时使用该根签名,则可以覆盖该根签名。 如果多个着色器包含不同的根签名,则PSO将无法创建。

更改根签名

尽管实际上更改根签名的成本很小,但是您通常不希望经常更改它。 原因是,当您更改根签名时,所有绑定的数据都将变为未绑定,并且您将必须重新为新的根签名重新绑定根参数。 如果绑定当前绑定的同一根签名,则不会发生这种情况,并且绑定的所有根参数都将保持绑定状态。

常量缓冲区版本控制

我知道我刚刚在上面讨论了有关根签名的版本控制,但是现在我将更详细地说明常量缓冲区,以及如何为每个绘制调用,每帧更新常量缓冲区。
在本教程中,我们不会像在上一教程中那样使用描述符堆。相反,我们现在将描述符直接存储在根签名中,我们将使用根描述符参数。根描述符为我们做版本控制,这使事情变得更容易。
常量缓冲区数据存储在资源堆中。我们需要确保不更新着色器当前正在从前一帧访问的资源堆中的数据。我们通过创建3个资源堆(每帧一个)来解决此问题。我们的场景中有两个对象,每个对象都有自己的固定缓冲区数据。我们可以为每个对象的每个帧创建一个资源堆,但这很浪费,所以我们要做的是将两个对象的常量缓冲区分别存储在三个资源堆的每一个中,然后将根描述符设置为对象的正确内存地址。我们需要在下一个绘制调用中使用的常量缓冲区。

常量缓冲区对齐

如果您不了解缓冲区的对齐要求,这将使事情变得混乱。
对于单个纹理或缓冲区,资源堆的大小必须是64KB的倍数,这意味着,例如,如果我们在资源中存储一个包含单个float的常量缓冲区(只有4个字节),则必须分配64*1024个字节,即65,536个个字节。如果我们有一个常量缓冲区,即16,385个浮点数(即65,540字节),使其大于64KB,则需要分配(1024*  64 * 2)个字节,即128KB。
多采样纹理资源必须对齐4MB。
常量缓冲区本身必须从资源堆的开头以256个字节的偏移量存储。当您第一次开始使用常量缓冲区时,这可能会帮助您。设置根描述符时,会为其提供要使用的数据的存储位置。内存地址必须是资源堆的内存地址,再加上256字节偏移量的倍数。
解决此问题的一种方法是简单地将常量缓冲区填充为256字节对齐,例如:

struct ConstantBuffer
{
    float4x4 wvpMat;
    
    // now pad the constant buffer to be 256 byte aligned
    float4 padding[48];
}

这样,我们可以将堆中下一个常量缓冲区的偏移量设置为resourceHeapAddress + sizeof(ConstantBuffer)。这将起作用,但是当您使用memcpy时,您可能想知道要复制多少常量缓冲区,在上述情况下为16个字节,否则,如果像平时一样使用memcpy sizeof(ConstantBuffer),您将最终得到复制48个多余的字节,则毫无意义。
在本教程中,我们要做的是创建一个名为ConstantBufferPerObjectAlignedSize的变量,该变量是256的倍数,然后通过将该值添加到资源堆虚拟地址的开头来存储和访问下一个常量缓冲区。
如果您尝试将根描述符设置为一个从资源堆开始起不是256字节倍数的地址,则可能会遇到操作系统崩溃,因为启动时我错误地做了。
因此,回顾一下,在本教程中,我们创建3个资源堆,为每个堆分配64KB。在每个资源堆中,我们存储两个常量缓冲区,每个缓冲区用于场景中的每个对象。第一个对象常量缓冲区存储在资源堆的开头,而第二个对象常量缓冲区数据存储在资源堆的开头加上 256字节。如果常量缓冲区的大小为260字节,则第二个常量缓冲区将存储在资源堆的开头加上512字节。

右手坐标系与左手坐标系

有两种类型的坐标系。 大多数图形库都使用左手坐标系,但是某些软件(例如3DS Max)可以使用右手坐标系。 区别仅在于z轴从一个方向翻转到另一个方向。 在这些教程中,我们将使用左手坐标系。
左手坐标系
左手坐标系是当y的正轴指向上方,x的正轴指向右边,z轴正指向前方时。
右手坐标系
右手坐标系是当y轴的正极指向上方,x轴的正极指向右侧,z轴的正极指向您时。

XMMATRIX / XMVECTOR与XMFLOAT4X4 / XMFLOAT4(又称DirectX数学库)

请阅读DirectX Math Programming Guide,特别是入门部分。人们遇到的许多问题是存储和传递XMMATRIX和XMVECTOR。这些是SIMD类型,并且有很多限制,因此通常更容易使用存储类型XMFLOAT4X4和XMFLOAT4来存储和传递数据,并对它们执行操作时将它们加载到XMMATRIX或XMVECTOR中。
如果您坚持存储和传递XMMATRIX和XMVECTOR类型,请阅读此内容。
将XMMATRIX和XMVECTOR用作局部变量或全局变量很好,并且那里没有问题,它可以将它们存储为类的成员,也可以在混乱的函数之间传递它们。
在这些教程中,我们将存储数据并以FLOAT4X4和FLOAT4类型传递数据。当我们想对这些变量做任何工作时,我们将它们加载到XMMATRIX或XMVECTOR变量中,进行工作,然后将结果存储回FLOAT4X4或FLOAT4中。
您可以使用以下功能将FLOAT4X4加载到XMMATRIX中:
XMLoadFloat4x4()
您可以使用以下功能将XMMATRIX存储到FLOAT4X4中:
XMStoreFloat4x4()
您可以使用以下功能将FLOAT4加载到XMVECTOR中:
XMLoadFloat4()
您可以使用以下功能将XMVECTOR存储到FLOAT4中:
XMStoreFloat4()
FLOAT4X4和FLOAT4仅用于存储和传递。您不能像对它们进行加法或乘法运算一样直接对它们进行操作。 DirectX Math库中没有与之一起使用的数学运算。
XMVECTOR和XMMATRIX是DirectX Math库的强大功能。您可以将它们加在一起,相乘在一起,然后库中的所有数学函数都会对其起作用。它们是SIMD类型,这意味着一次可以将4个值存储在寄存器中,并可以在一条指令中对其进行操作。 SIMD类型可以彼此相乘,比常规类型快4倍,这是因为向量中的所有4个值将同时被运算。查找SIMD以获得更好的理解。
我想这就是本教程要讲的全部内容,因此我们现在就可以进入代码了〜

编码

在本教程中,我们将使一个立方体围绕另一个立方体旋转。 这涉及创建立方体,创建资源堆,使用一个根描述符参数创建根签名,映射资源堆,更新立方体的世界矩阵,将根描述符设置为指向当前立方体在资源堆中的正确位置 ,最后绘制立方体。 必须严格遵守对齐要求,否则每次运行程序时计算机都死机时,您会头疼(在我意识到我没有在资源堆中正确对齐常量缓冲区之前,发生了几次)。

新变量

我们已经删除了与描述符堆有关的所有内容,我们在上一教程中已经使用了该描述符堆。
您将看到我们已经更新了常量缓冲区结构,并且现在将其命名为ConstantBufferPerObject,因为此常量缓冲区包含一个world / view / projection矩阵,并将针对每个对象进行更新。这意味着每个命令列表将多次更新此立方体。因此,我们需要一种对常量缓冲区进行版本控制的方法,因此,当我们对其进行更改时,它不会影响先前的对象常量缓冲区。根描述符由驱动程序自动进行版本控制,因此我们可以利用它。当我们更新根描述符时,先前的绘制调用仍将使用先前设置的根描述符。如果我们决定通过描述符表访问常量缓冲区,则我们需要自己进行版本控制,这不一定很糟糕,但是确实会使代码变得更加复杂,并且您通常最终会使用更多的描述符空间。
我已经讨论了很多,所以希望它现在已经沉入其中,并且还在代码中进行了注释,常量缓冲区需要在资源堆中以256字节偏移量对齐。这与常量读取有关。不要尝试将根描述符设置为除资源堆开始处的256字节对齐偏移量的倍数以外的任何内容,否则计算机可能会自发燃烧哈哈哈。这就是变量ConstantBufferPerObjectAlignedSize的来源。我们将此变量设置为常量缓冲区结构大小之后的下一个256倍。现在我们的常量缓冲区只有16个字节,因此此变量设置为256。在更新资源堆中的常量缓冲区并设置根描述符时,将使用此变量。
接下来,我们创建一个常量缓冲区对象cbPerObject的实例。这只是存储数据的缓冲区,然后我们将其存储到资源堆中正确的常量缓冲区中。
之后,我们有了资源堆,称为constantBufferUploadHeaps。我们使用上传堆是因为我们将经常更新堆(每帧两次)。您将看到我们有3个资源堆。每个帧都有一个。这样一来,当我们更新下一帧常量缓冲区时,我们不会弄乱前一帧当前可能正在使用的前一帧常量缓冲区数据。我们可以创建一个资源堆并将所有内容存储在其中,但是说实话,保持每个帧分离要容易得多。
接下来,我们有3个GPU虚拟地址(UINT8),名为cbvGPUAddress。这些将是指向每个资源堆的指针。我们将向其添加ConstantBufferPerObjectAlignedSize以获得第二个常量缓冲区的地址(第一个常量缓冲区存储在资源堆的开头)。
我们将投影矩阵,视图矩阵以及相机位置,目标和上向量存储在XMFLOAT4和XMFLOAT4X4中。这些类型用于存储和传递数据,而XMMATRIX和XMVECTOR用于对数据进行实际的数学运算。我们还以这些类型存储立方体的位置,旋转和世界变换矩阵。
然后我们有numCubeIndices。这只是一个变量,我们将包含每个立方体要绘制的索引数。

// this is the structure of our constant buffer.
struct ConstantBufferPerObject {
    XMFLOAT4X4 wvpMat;
};

// Constant buffers must be 256-byte aligned which has to do with constant reads on the GPU.
// We are only able to read at 256 byte intervals from the start of a resource heap, so we will
// make sure that we add padding between the two constant buffers in the heap (one for cube1 and one for cube2)
// Another way to do this would be to add a float array in the constant buffer structure for padding. In this case
// we would need to add a float padding[50]; after the wvpMat variable. This would align our structure to 256 bytes (4 bytes per float)
// The reason i didn't go with this way, was because there would actually be wasted cpu cycles when memcpy our constant
// buffer data to the gpu virtual address. currently we memcpy the size of our structure, which is 16 bytes here, but if we
// were to add the padding array, we would memcpy 64 bytes if we memcpy the size of our structure, which is 50 wasted bytes
// being copied.
int ConstantBufferPerObjectAlignedSize = (sizeof(ConstantBufferPerObject) + 255) & ~255;
    
ConstantBufferPerObject cbPerObject; // this is the constant buffer data we will send to the gpu 
                                        // (which will be placed in the resource we created above)
    
ID3D12Resource* constantBufferUploadHeaps[frameBufferCount]; // this is the memory on the gpu where constant buffers for each frame will be placed
    
UINT8* cbvGPUAddress[frameBufferCount]; // this is a pointer to each of the constant buffer resource heaps
    
XMFLOAT4X4 cameraProjMat; // this will store our projection matrix
XMFLOAT4X4 cameraViewMat; // this will store our view matrix
    
XMFLOAT4 cameraPosition; // this is our cameras position vector
XMFLOAT4 cameraTarget; // a vector describing the point in space our camera is looking at
XMFLOAT4 cameraUp; // the worlds up vector
    
XMFLOAT4X4 cube1WorldMat; // our first cubes world matrix (transformation matrix)
XMFLOAT4X4 cube1RotMat; // this will keep track of our rotation for the first cube
XMFLOAT4 cube1Position; // our first cubes position in space
    
XMFLOAT4X4 cube2WorldMat; // our first cubes world matrix (transformation matrix)
XMFLOAT4X4 cube2RotMat; // this will keep track of our rotation for the second cube
XMFLOAT4 cube2PositionOffset; // our second cube will rotate around the first cube, so this is the position offset from the first cube
    
int numCubeIndices; // the number of indices to draw the cube

新的根签名

现在,我们将使用根描述符来访问着色器中的常量缓冲区。 为此,我们需要使用根描述符参数定义根签名。
我们首先填充一个D3D12_ROOT_DESCRIPTOR结构:

typedef struct D3D12_ROOT_DESCRIPTOR {
  UINT ShaderRegister;
  UINT RegisterSpace;
} D3D12_ROOT_DESCRIPTOR;
  • ShaderRegister-这是我们要用于存储根描述符的寄存器。我们正在使用CBV,因此它位于b寄存器中。 我们在b0寄存器的顶点着色器中定义了常量缓冲区,因此我们将此参数设置为0。
  • RegisterSpace-默认情况下,寄存器位于0寄存器空间中。 当我们在两个不同的着色器中使用相同的寄存器时,多数情况下使用空格是为了方便。 我们尚未在着色器中定义空间,这意味着它在默认寄存器空间0中,因此我们将此参数设置为0。

接下来,我们通过填充D3D12_ROOT_PARAMETER结构来创建根参数。在上一教程中已经说明了这种结构,因此在此不再赘述。在本教程中,我们将创建一个根描述符,而不是一个描述符表参数,因此我们将Descriptor成员设置为我们刚刚填写的根描述符。仅顶点着色器将使用此常量缓冲区,因此我们将此参数的可见性设置为仅顶点着色器。通过仅允许需要访问参数的着色器查看参数,您将获得更好的性能,GPU和DirectX驱动程序可以通过广播选择着色器来进行更多优化。在某些硬件上,允许所有着色器可见性实际上可以更好地执行,因为硬件可能会向所有着色器发送单个广播,而不是向多个着色器发送广播。当前,存在一种方法来了解所有着色器的可见性是否比某些着色器的可见性更好,因此通常您只希望使该参数对需要它的着色器可见。
然后,我们通过填充CD3DX12_ROOT_SIGNATURE_DESC结构(由d3dx12.h标头提供)来创建根签名描述。在上一教程中也对此进行了解释,因此此处将不进行解释。
之后,我们序列化根签名。这样做是将根签名转换为字节码,GPU可以读取和处理该字节码。
一旦创建了根签名描述并序列化了根签名(转换为GPU可以读取的字节码),便创建了根签名。

// create root signature
    
// create a root descriptor, which explains where to find the data for this root parameter
D3D12_ROOT_DESCRIPTOR rootCBVDescriptor;
rootCBVDescriptor.RegisterSpace = 0;
rootCBVDescriptor.ShaderRegister = 0;
    
// create a root parameter and fill it out
D3D12_ROOT_PARAMETER  rootParameters[1]; // only one parameter right now
rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV; // this is a constant buffer view root descriptor
rootParameters[0].Descriptor = rootCBVDescriptor; // this is the root descriptor for this root parameter
rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX; // our pixel shader will be the only shader accessing this parameter for now
    
CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc;
rootSignatureDesc.Init(_countof(rootParameters), // we have 1 root parameter
    rootParameters, // a pointer to the beginning of our root parameters array
    0,
    nullptr,
    D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | // we can deny shader stages here for better performance
    D3D12_ROOT_SIGNATURE_FLAG_DENY_HULL_SHADER_ROOT_ACCESS |
    D3D12_ROOT_SIGNATURE_FLAG_DENY_DOMAIN_SHADER_ROOT_ACCESS |
    D3D12_ROOT_SIGNATURE_FLAG_DENY_GEOMETRY_SHADER_ROOT_ACCESS |
    D3D12_ROOT_SIGNATURE_FLAG_DENY_PIXEL_SHADER_ROOT_ACCESS);
    
ID3DBlob* signature;
hr = D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, nullptr);
if (FAILED(hr))
{
    return false;
}
    
hr = device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&rootSignature));
if (FAILED(hr))
{
    return false;
}

创建立方体几何

现在,我们将开始使用3D对象,因此我在这里创建了一个立方体,该立方体应该比我们一直使用的四边形更有趣。 我所做的只是在上一教程中已有的代码中添加了一些顶点和索引,并且还将新变量numCubeIndices设置为我们要绘制的索引数。

// Create vertex buffer

// a quad
Vertex vList[] = {
    // front face
    { -0.5f,  0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
    {  0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
    { -0.5f, -0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
    {  0.5f,  0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

    // right side face
    {  0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
    {  0.5f,  0.5f,  0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
    {  0.5f, -0.5f,  0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
    {  0.5f,  0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

    // left side face
    { -0.5f,  0.5f,  0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
    { -0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
    { -0.5f, -0.5f,  0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
    { -0.5f,  0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

    // back face
    {  0.5f,  0.5f,  0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
    { -0.5f, -0.5f,  0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
    {  0.5f, -0.5f,  0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
    { -0.5f,  0.5f,  0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

    // top face
    { -0.5f,  0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
    { 0.5f,  0.5f,  0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
    { 0.5f,  0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
    { -0.5f,  0.5f,  0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

    // bottom face
    {  0.5f, -0.5f,  0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
    { -0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
    {  0.5f, -0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
    { -0.5f, -0.5f,  0.5f, 0.0f, 1.0f, 0.0f, 1.0f },
};

int vBufferSize = sizeof(vList);

// create default heap
// default heap is memory on the GPU. Only the GPU has access to this memory
// To get data into this heap, we will have to upload the data using
// an upload heap
device->CreateCommittedResource(
    &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap
    D3D12_HEAP_FLAG_NONE, // no flags
    &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer
    D3D12_RESOURCE_STATE_COPY_DEST, // we will start this heap in the copy destination state since we will copy data
                                    // from the upload heap to this heap
    nullptr, // optimized clear value must be null for this type of resource. used for render targets and depth/stencil buffers
    IID_PPV_ARGS(&vertexBuffer));

// we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at
vertexBuffer->SetName(L"Vertex Buffer Resource Heap");

// create upload heap
// upload heaps are used to upload data to the GPU. CPU can write to it, GPU can read from it
// We will upload the vertex buffer using this heap to the default heap
ID3D12Resource* vBufferUploadHeap;
device->CreateCommittedResource(
    &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap
    D3D12_HEAP_FLAG_NONE, // no flags
    &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer
    D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap
    nullptr,
    IID_PPV_ARGS(&vBufferUploadHeap));
vBufferUploadHeap->SetName(L"Vertex Buffer Upload Resource Heap");

// store vertex buffer in upload heap
D3D12_SUBRESOURCE_DATA vertexData = {};
vertexData.pData = reinterpret_cast<BYTE*>(vList); // pointer to our vertex array
vertexData.RowPitch = vBufferSize; // size of all our triangle vertex data
vertexData.SlicePitch = vBufferSize; // also the size of our triangle vertex data

// we are now creating a command with the command list to copy the data from
// the upload heap to the default heap
UpdateSubresources(commandList, vertexBuffer, vBufferUploadHeap, 0, 0, 1, &vertexData);

// transition the vertex buffer data from copy destination state to vertex buffer state
commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(vertexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER));

// Create index buffer

// a quad (2 triangles)
DWORD iList[] = {
    // ffront face
    0, 1, 2, // first triangle
    0, 3, 1, // second triangle

    // left face
    4, 5, 6, // first triangle
    4, 7, 5, // second triangle

    // right face
    8, 9, 10, // first triangle
    8, 11, 9, // second triangle

    // back face
    12, 13, 14, // first triangle
    12, 15, 13, // second triangle

    // top face
    16, 17, 18, // first triangle
    16, 19, 17, // second triangle

    // bottom face
    20, 21, 22, // first triangle
    20, 23, 21, // second triangle
};

int iBufferSize = sizeof(iList);

numCubeIndices = sizeof(iList) / sizeof(DWORD);

// create default heap to hold index buffer
device->CreateCommittedResource(
    &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap
    D3D12_HEAP_FLAG_NONE, // no flags
    &CD3DX12_RESOURCE_DESC::Buffer(iBufferSize), // resource description for a buffer
    D3D12_RESOURCE_STATE_COPY_DEST, // start in the copy destination state
    nullptr, // optimized clear value must be null for this type of resource
    IID_PPV_ARGS(&indexBuffer));

// we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at
vertexBuffer->SetName(L"Index Buffer Resource Heap");

// create upload heap to upload index buffer
ID3D12Resource* iBufferUploadHeap;
device->CreateCommittedResource(
    &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap
    D3D12_HEAP_FLAG_NONE, // no flags
    &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer
    D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap
    nullptr,
    IID_PPV_ARGS(&iBufferUploadHeap));
vBufferUploadHeap->SetName(L"Index Buffer Upload Resource Heap");

// store vertex buffer in upload heap
D3D12_SUBRESOURCE_DATA indexData = {};
indexData.pData = reinterpret_cast<BYTE*>(iList); // pointer to our index array
indexData.RowPitch = iBufferSize; // size of all our index buffer
indexData.SlicePitch = iBufferSize; // also the size of our index buffer

// we are now creating a command with the command list to copy the data from
// the upload heap to the default heap
UpdateSubresources(commandList, indexBuffer, iBufferUploadHeap, 0, 0, 1, &indexData);

// transition the vertex buffer data from copy destination state to vertex buffer state
commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(indexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER));

创建常量缓冲区资源堆

现在,我们创建常量缓冲区资源堆。我们将使用上传堆,因为我们将频繁更新数据(每帧两次)。您可以看到我们正在创建3个资源堆,每个资源缓冲区一个。因此,当我们更新下一帧常量缓冲区时,我们不会干扰前一帧访问它的常量缓冲区。
缓冲区资源堆的大小必须对齐64KB,这就是为什么我们在创建资源堆时分配1024*64字节的原因。在上一教程中,我们已经讨论过创建提交的资源,因此在这里我不再赘述。
创建资源堆后,我们将其映射以获得虚拟GPU地址。此地址是指向资源堆起点的指针,稍后将用于设置根描述符。
注意我们有两个memcpy调用。我们有两个常量缓冲区,每个立方体一个。我们将两个立方体的常量缓冲区都存储在同一堆中,确保第二个常量缓冲区与资源堆的开头之间的偏移量为256字节。为此,我们向虚拟GPU地址添加ConstantBufferPerObjectAlignedSize ,该地址指向资源堆的开头。我们还将在每个立方体的更新功能中设置此数据。
如果常量缓冲区中有多个变量,并且每帧仅更新一个或几个变量,则可以仅存储更新的变量。但是为了使事情变得简单,我们仅存储整个常量缓冲区结构(无论如何,我们的cb只有一个变量)。

// create the constant buffer resource heap
// We will update the constant buffer one or more times per frame, so we will use only an upload heap
// unlike previously we used an upload heap to upload the vertex and index data, and then copied over
// to a default heap. If you plan to use a resource for more than a couple frames, it is usually more
// efficient to copy to a default heap where it stays on the gpu. In this case, our constant buffer
// will be modified and uploaded at least once per frame, so we only use an upload heap

// first we will create a resource heap (upload heap) for each frame for the cubes constant buffers
// As you can see, we are allocating 64KB for each resource we create. Buffer resource heaps must be
// an alignment of 64KB. We are creating 3 resources, one for each frame. Each constant buffer is 
// only a 4x4 matrix of floats in this tutorial. So with a float being 4 bytes, we have 
// 16 floats in one constant buffer, and we will store 2 constant buffers in each
// heap, one for each cube, thats only 64x2 bits, or 128 bits we are using for each
// resource, and each resource must be at least 64KB (65536 bits)
for (int i = 0; i < frameBufferCount; ++i)
{
    // create resource for cube 1
    hr = device->CreateCommittedResource(
        &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // this heap will be used to upload the constant buffer data
        D3D12_HEAP_FLAG_NONE, // no flags
        &CD3DX12_RESOURCE_DESC::Buffer(1024 * 64), // size of the resource heap. Must be a multiple of 64KB for single-textures and constant buffers
        D3D12_RESOURCE_STATE_GENERIC_READ, // will be data that is read from so we keep it in the generic read state
        nullptr, // we do not have use an optimized clear value for constant buffers
        IID_PPV_ARGS(&constantBufferUploadHeaps[i]));
    constantBufferUploadHeaps[i]->SetName(L"Constant Buffer Upload Resource Heap");

    ZeroMemory(&cbPerObject, sizeof(cbPerObject));

    CD3DX12_RANGE readRange(0, 0);    // We do not intend to read from this resource on the CPU. (so end is less than or equal to begin)
        
    // map the resource heap to get a gpu virtual address to the beginning of the heap
    hr = constantBufferUploadHeaps[i]->Map(0, &readRange, reinterpret_cast<void**>(&cbvGPUAddress[i]));

    // Because of the constant read alignment requirements, constant buffer views must be 256 bit aligned. Our buffers are smaller than 256 bits,
    // so we need to add spacing between the two buffers, so that the second buffer starts at 256 bits from the beginning of the resource heap.
    memcpy(cbvGPUAddress[i], &cbPerObject, sizeof(cbPerObject)); // cube1's constant buffer data
    memcpy(cbvGPUAddress[i] + ConstantBufferPerObjectAlignedSize, &cbPerObject, sizeof(cbPerObject)); // cube2's constant buffer data
}

建立世界/视图/投影矩阵

我们首先使用XMMatrixPerspectiveFovLH()函数创建一个投影矩阵:

XMMATRIX XMMatrixPerspectiveFovLH(
  [in] float FovAngleY,
  [in] float AspectRatio,
  [in] float NearZ,
  [in] float FarZ
);
  • FovAngleY-这是以弧度为单位的y轴的角度。
  • AspectRatio-这是视口的纵横比,通常是Width / Height
  • NearZ-这是对象可以呈现给相机的最近距离。 相机附近或后面的任何物体都不会渲染。
  • FarZ-这是物体离相机最远的距离。 如果距离较远,则不会渲染。

请注意,当我们创建投影矩阵时,我们是如何创建XMMATRIX的。这是因为所有DirectX Math操作都与XMVECTOR和XMMATRIX一起使用,因此它可以将多个数据存储在要同时操作的寄存器中。一旦获得投影矩阵,就可以使用XMStoreFloat4x4()函数将其存储在FLOAT4X4变量cameraProjMat中。
接下来,我们放置摄像机。我们设置位置,即向上2个单位,向后4个单位。我们告诉摄像机目标点(0,0,0),我们的第一个立方体在其中,第二个立方体在旋转,然后将世界的向上向量设置为y轴。
现在看看我们如何将那些描述摄像机的FLOAT4加载到XMVECTOR中?之所以这样做,是因为就像我提到过很多次一样,DirectX数学可用于XMVECTORS和XMMATRIX类型,并且我们会将这些XMVECTORS传递给创建视图矩阵的函数。
完成设置摄像机的位置,目标和上矢量并将它们存储在XMVECTOR中后,我们使用XMMatrixLookAtLH()函数创建视图矩阵。
接下来是为我们的立方体设置原始世界矩阵。我们从第一个立方体开始,设置其位置。然后,我们将其存储在XMVECTOR中并创建一个转换矩阵。实际上,这就是我们对第一个立方体真正需要的,因为它尚未移动,我们可以将其世界矩阵初始化为一个单位矩阵。注意我们如何将旋转矩阵初始化为一个单位矩阵。我们将要做的是在每个帧本身更新此矩阵,因此我们需要确保在开始时将其设置为某些值。
立方体2的位置实际上是相对于立方体1的位置的偏移。我们将围绕cube1旋转cube2,所以我们的方法是将cube2转换为它与cube1的偏移量,进行旋转,然后将其转换为cube1的实际位置。这将导致它围绕cube1旋转。

// build projection and view matrix
XMMATRIX tmpMat = XMMatrixPerspectiveFovLH(45.0f*(3.14f/180.0f), (float)Width / (float)Height, 0.1f, 1000.0f);
XMStoreFloat4x4(&cameraProjMat, tmpMat);

// set starting camera state
cameraPosition = XMFLOAT4(0.0f, 2.0f, -4.0f, 0.0f);
cameraTarget = XMFLOAT4(0.0f, 0.0f, 0.0f, 0.0f);
cameraUp = XMFLOAT4(0.0f, 1.0f, 0.0f, 0.0f);

// build view matrix
XMVECTOR cPos = XMLoadFloat4(&cameraPosition);
XMVECTOR cTarg = XMLoadFloat4(&cameraTarget);
XMVECTOR cUp = XMLoadFloat4(&cameraUp);
tmpMat = XMMatrixLookAtLH(cPos, cTarg, cUp);
XMStoreFloat4x4(&cameraViewMat, tmpMat);

// set starting cubes position
// first cube
cube1Position = XMFLOAT4(0.0f, 0.0f, 0.0f, 0.0f); // set cube 1's position
XMVECTOR posVec = XMLoadFloat4(&cube1Position); // create xmvector for cube1's position

tmpMat = XMMatrixTranslationFromVector(posVec); // create translation matrix from cube1's position vector
XMStoreFloat4x4(&cube1RotMat, XMMatrixIdentity()); // initialize cube1's rotation matrix to identity matrix
XMStoreFloat4x4(&cube1WorldMat, tmpMat); // store cube1's world matrix

// second cube
cube2PositionOffset = XMFLOAT4(1.5f, 0.0f, 0.0f, 0.0f);
posVec = XMLoadFloat4(&cube2PositionOffset) + XMLoadFloat4(&cube1Position); // create xmvector for cube2's position
                                                                            // we are rotating around cube1 here, so add cube2's position to cube1

tmpMat = XMMatrixTranslationFromVector(posVec); // create translation matrix from cube2's position offset vector
XMStoreFloat4x4(&cube2RotMat, XMMatrixIdentity()); // initialize cube2's rotation matrix to identity matrix
XMStoreFloat4x4(&cube2WorldMat, tmpMat); // store cube2's world matrix

Update()函数

好吧,现在是我们的“游戏逻辑”。这是我们将在每一帧更新场景的地方。
我们首先创建3个旋转矩阵。每个旋转矩阵用于3个笛卡尔坐标之一。 x,y和z。我们使用各自的XMMatrixRotationN()函数创建旋转矩阵,然后将它们相乘得到一个旋转矩阵。注意我们如何将它们与立方体的当前旋转矩阵相乘。这将导致我们刚刚创建的旋转矩阵被添加到立方体的旋转矩阵中。然后,我们转换cube1的位置。在本教程中,我们实际上并没有移动cube1,因此该位置并未真正更新。接下来,我们创建它的世界矩阵。请记住,矩阵乘法的顺序确实很重要。
接下来,我们加载视图和投影矩阵,然后为cube1创建wvp矩阵。同样,顺序很重要,因为我们首先需要将立方体移至世界空间,然后摄像机空间,最后移至投影空间。
有了wvp矩阵后,我们将其存储在常量缓冲区对象中,然后将该常量缓冲区对象的内容复制到当前帧资源堆中GPU上的常量缓冲区中。
Cube1的常量缓冲区位于资源堆的最开始,因此我们提供给memcpy的地址是我们从mat中获得的gpu虚拟地址。
接下来,我们对cube2做同样的事情。首先请注意,我们正在重用很多东西,包括常量缓冲区对象。我们已经将包含cube1的wvp矩阵的常量缓冲区对象复制到资源堆,现在我们将重用它来存储cube2的wvp矩阵。
最后,我们将常量缓冲区对象复制到当前帧资源堆中cube2的位置。注意我们如何从堆的开头偏移内存地址。这是因为由于需要256个对齐的常量读取要求,因此cube2的常量缓冲区从资源堆的开头开始以256个字节存储。我们将ConstantBufferPerObjectAlignedSize添加到资源堆的gpu虚拟地址,这将使我们有256个字节进入资源堆,立方体2的常量缓冲区位于该资源堆中。

void Update()
{
    // update app logic, such as moving the camera or figuring out what objects are in view

    // create rotation matrices
    XMMATRIX rotXMat = XMMatrixRotationX(0.0001f);
    XMMATRIX rotYMat = XMMatrixRotationY(0.0002f);
    XMMATRIX rotZMat = XMMatrixRotationZ(0.0003f);

    // add rotation to cube1's rotation matrix and store it
    XMMATRIX rotMat = XMLoadFloat4x4(&cube1RotMat) * rotXMat * rotYMat * rotZMat;
    XMStoreFloat4x4(&cube1RotMat, rotMat);

    // create translation matrix for cube 1 from cube 1's position vector
    XMMATRIX translationMat = XMMatrixTranslationFromVector(XMLoadFloat4(&cube1Position));

    // create cube1's world matrix by first rotating the cube, then positioning the rotated cube
    XMMATRIX worldMat = rotMat * translationMat;

    // store cube1's world matrix
    XMStoreFloat4x4(&cube1WorldMat, worldMat);

    // update constant buffer for cube1
    // create the wvp matrix and store in constant buffer
    XMMATRIX viewMat = XMLoadFloat4x4(&cameraViewMat); // load view matrix
    XMMATRIX projMat = XMLoadFloat4x4(&cameraProjMat); // load projection matrix
    XMMATRIX wvpMat = XMLoadFloat4x4(&cube1WorldMat) * viewMat * projMat; // create wvp matrix
    XMMATRIX transposed = XMMatrixTranspose(wvpMat); // must transpose wvp matrix for the gpu
    XMStoreFloat4x4(&cbPerObject.wvpMat, transposed); // store transposed wvp matrix in constant buffer

    // copy our ConstantBuffer instance to the mapped constant buffer resource
    memcpy(cbvGPUAddress[frameIndex], &cbPerObject, sizeof(cbPerObject));

    // now do cube2's world matrix
    // create rotation matrices for cube2
    rotXMat = XMMatrixRotationX(0.0003f);
    rotYMat = XMMatrixRotationY(0.0002f);
    rotZMat = XMMatrixRotationZ(0.0001f);

    // add rotation to cube2's rotation matrix and store it
    rotMat = rotZMat * (XMLoadFloat4x4(&cube2RotMat) * (rotXMat * rotYMat));
    XMStoreFloat4x4(&cube2RotMat, rotMat);

    // create translation matrix for cube 2 to offset it from cube 1 (its position relative to cube1
    XMMATRIX translationOffsetMat = XMMatrixTranslationFromVector(XMLoadFloat4(&cube2PositionOffset));

    // we want cube 2 to be half the size of cube 1, so we scale it by .5 in all dimensions
    XMMATRIX scaleMat = XMMatrixScaling(0.5f, 0.5f, 0.5f);

    // reuse worldMat. 
    // first we scale cube2. scaling happens relative to point 0,0,0, so you will almost always want to scale first
    // then we translate it. 
    // then we rotate it. rotation always rotates around point 0,0,0
    // finally we move it to cube 1's position, which will cause it to rotate around cube 1
    worldMat = scaleMat * translationOffsetMat * rotMat * translationMat;

    wvpMat = XMLoadFloat4x4(&cube2WorldMat) * viewMat * projMat; // create wvp matrix
    transposed = XMMatrixTranspose(wvpMat); // must transpose wvp matrix for the gpu
    XMStoreFloat4x4(&cbPerObject.wvpMat, transposed); // store transposed wvp matrix in constant buffer

    // copy our ConstantBuffer instance to the mapped constant buffer resource
    memcpy(cbvGPUAddress[frameIndex] + ConstantBufferPerObjectAlignedSize, &cbPerObject, sizeof(cbPerObject));

    // store cube2's world matrix
    XMStoreFloat4x4(&cube2WorldMat, worldMat);
}

更新根描述符

现在我们得到绘制立方体的零件。和以前一样,我们要做的第一件事是设置根签名。与使用DX12的许多人一样,您可能想知道,当创建具有根签名的PSO时,为什么需要在命令列表中设置根签名?有两个原因。一种是创建PSO时的根签名仅用于创建PSO,而不存储在PSO中。 PSO将假定您在使用它时将绑定正确的根签名。另一个想法是,程序可能尚未完全知道需要设置什么PSO,但是需要开始更新和设置根参数数据。在设置PSO之前,应用程序可以设置根参数。这种分离的另一个原因是,当您更改根签名时,它将取消绑定根参数。 DirectX团队不希望更改PSO来使绑定的数据默默地解除绑定。这样,当您需要将PSO更改为具有相同根签名的PSO时,所有数据都将保持与管道的绑定。
好了,继续学习本教程。我们需要告诉着色器在哪里可以找到立方体1的常量缓冲区。通过将根描述符参数(我们在代码初始化时为其创建根参数)设置为当前帧资源堆中的常量缓冲区位置,可以实现此目的。基本上,我们只将根描述符设置为当前资源堆的GPU虚拟地址,因为cube1的常量缓冲区存储在它的开头。
然后,我们绘制cube1,告诉它要绘制的索引数。
现在,我们需要更新根描述符,以指向第二个立方体常量缓冲区数据。这是自动使用根描述符(和根常量,尽管此处未使用)的便利之处。我们可以重用先前为cube1的常量缓冲区设置的根描述符。通过更改根参数,GPU实际上将为下一个绘制调用创建根签名的新副本,并根据需要更新该副本,在这种情况下,该更新将更新副本的根描述符。需要执行命令列表时,第一个绘制调用将使用根签名的原始副本,该副本的根描述符指向立方体1的常量缓冲区,第二个绘制调用将使用根签名的第二个副本,其中包含一个指向第二个立方体常量缓冲区数据的根描述符。
您可以看到,当我们为第二个立方体设置了根描述符时,我们再次将其指向距当前帧资源堆的开头256个字节对齐的偏移量。我们再次使用变量ConstantBufferPerObjectAlignedSize来执行此操作,因为我们已使该变量成为常量缓冲区结构的大小之后的下一个256倍(本教程中的常量缓冲区结构为16个字节,因此该变量为256个字节)。如果您尝试将根签名设置为距资源堆地址开始处256字节偏移量倍数以外的其他值,则计算机可能会在此处冻结。
然后我们绘制第二个立方体。

// set root signature
commandList->SetGraphicsRootSignature(rootSignature); // set the root signature

// draw triangle
commandList->RSSetViewports(1, &viewport); // set the viewports
commandList->RSSetScissorRects(1, &scissorRect); // set the scissor rects
commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST); // set the primitive topology
commandList->IASetVertexBuffers(0, 1, &vertexBufferView); // set the vertex buffer (using the vertex buffer view)
commandList->IASetIndexBuffer(&indexBufferView);

// first cube

// set cube1's constant buffer
commandList->SetGraphicsRootConstantBufferView(0, constantBufferUploadHeaps[frameIndex]->GetGPUVirtualAddress());

// draw first cube
commandList->DrawIndexedInstanced(numCubeIndices, 1, 0, 0, 0);

// second cube

// set cube2's constant buffer. You can see we are adding the size of ConstantBufferPerObject to the constant buffer
// resource heaps address. This is because cube1's constant buffer is stored at the beginning of the resource heap, while
// cube2's constant buffer data is stored after (256 bits from the start of the heap).
commandList->SetGraphicsRootConstantBufferView(0, constantBufferUploadHeaps[frameIndex]->GetGPUVirtualAddress() + ConstantBufferPerObjectAlignedSize);

// draw second cube
commandList->DrawIndexedInstanced(numCubeIndices, 1, 0, 0, 0);

清理

我们要做的最后一件事是在程序退出时释放3个资源堆:

for (int i = 0; i < frameBufferCount; ++i)
{
    SAFE_RELEASE(constantBufferUploadHeaps[i]);
};

在HLSL中将对象从对象空间移动到投影空间

我几乎忘记了本教程代码中最重要的部分,实际上是将一个对象从对象空间转移到投影空间!
让我们快速浏览一下VertexShader.hlsl中的常量缓冲区
我们使用cbuffer关键字创建一个恒定的缓冲区结构。 我们将此缓冲区绑定到寄存器b0。 常量缓冲区绑定到b寄存器,纹理(SRV)绑定到t寄存器,而UAV绑定到u寄存器。 我们这里有一个常量缓冲区,因此我们将其绑定到“ b”寄存器。 您可以将其绑定到所需的任何b寄存器,但从寄存器0开始绑定是有意义的,并在添加更多缓冲区时递增该寄存器。
在HLSL中,矩阵由float4x4表示。 因此,我们的常量缓冲区包含一个矩阵,该矩阵是绑定网格的世界/视图/投影矩阵。

cbuffer ConstantBuffer : register(b0)
{
    float4x4 wvpMat;
};

现在,我们进行从对象空间到投影空间的实际转换。 要将矩阵和顶点相乘,请使用HLSL中的mul()函数。 同样,此处提供参数的顺序很重要。 如果切换参数,最终将得到不同的汇编代码。 当在第一个参数中提供矢量,在第二个参数中提供矩阵时,HLSL创建最有效的汇编。
这种相乘的结果是顶点在投影空间中的新位置。 这是顶点着色器,因此,您告诉管道绘制的每个顶点都会发生这种情况。

VS_OUTPUT main(VS_INPUT input)
{
    VS_OUTPUT output;
    output.pos = mul(input.pos, wvpMat);
    output.color = input.color;
    return output;
}
完整代码

VertexShader.hlsl

struct VS_INPUT
{
    float4 pos : POSITION;
    float4 color: COLOR;
};

struct VS_OUTPUT
{
    float4 pos: SV_POSITION;
    float4 color: COLOR;
};

cbuffer ConstantBuffer : register(b0)
{
    float4x4 wvpMat;
};

VS_OUTPUT main(VS_INPUT input)
{
    VS_OUTPUT output;
    output.pos = mul(input.pos, wvpMat);
    output.color = input.color;
    return output;
}

PixelShader.hlsl

struct VS_OUTPUT
{
    float4 pos: SV_POSITION;
    float4 color: COLOR;
};

float4 main(VS_OUTPUT input) : SV_TARGET
{
    // return interpolated color
    return input.color;
}

stdafx.h

#pragma once

#ifndef WIN32_LEAN_AND_MEAN
#define WIN32_LEAN_AND_MEAN    // Exclude rarely-used stuff from Windows headers.
#endif

#include <windows.h>
#include <d3d12.h>
#include <dxgi1_4.h>
#include <D3Dcompiler.h>
#include <DirectXMath.h>
#include "d3dx12.h"
#include <string>

// this will only call release if an object exists (prevents exceptions calling release on non existant objects)
#define SAFE_RELEASE(p) { if ( (p) ) { (p)->Release(); (p) = 0; } }

using namespace DirectX; // we will be using the directxmath library

// Handle to the window
HWND hwnd = NULL;

// name of the window (not the title)
LPCTSTR WindowName = L"BzTutsApp";

// title of the window
LPCTSTR WindowTitle = L"Bz Window";

// width and height of the window
int Width = 800;
int Height = 600;

// is window full screen?
bool FullScreen = false;

// we will exit the program when this becomes false
bool Running = true;

// create a window
bool InitializeWindow(HINSTANCE hInstance,
    int ShowWnd,
    bool fullscreen);

// main application loop
void mainloop();

// callback function for windows messages
LRESULT CALLBACK WndProc(HWND hWnd,
    UINT msg,
    WPARAM wParam,
    LPARAM lParam);

// direct3d stuff
const int frameBufferCount = 3; // number of buffers we want, 2 for double buffering, 3 for tripple buffering

ID3D12Device* device; // direct3d device

IDXGISwapChain3* swapChain; // swapchain used to switch between render targets

ID3D12CommandQueue* commandQueue; // container for command lists

ID3D12DescriptorHeap* rtvDescriptorHeap; // a descriptor heap to hold resources like the render targets

ID3D12Resource* renderTargets[frameBufferCount]; // number of render targets equal to buffer count

ID3D12CommandAllocator* commandAllocator[frameBufferCount]; // we want enough allocators for each buffer * number of threads (we only have one thread)

ID3D12GraphicsCommandList* commandList; // a command list we can record commands into, then execute them to render the frame

ID3D12Fence* fence[frameBufferCount];    // an object that is locked while our command list is being executed by the gpu. We need as many 
                                         //as we have allocators (more if we want to know when the gpu is finished with an asset)

HANDLE fenceEvent; // a handle to an event when our fence is unlocked by the gpu

UINT64 fenceValue[frameBufferCount]; // this value is incremented each frame. each fence will have its own value

int frameIndex; // current rtv we are on

int rtvDescriptorSize; // size of the rtv descriptor on the device (all front and back buffers will be the same size)
                       // function declarations

bool InitD3D(); // initializes direct3d 12

void Update(); // update the game logic

void UpdatePipeline(); // update the direct3d pipeline (update command lists)

void Render(); // execute the command list

void Cleanup(); // release com ojects and clean up memory

void WaitForPreviousFrame(); // wait until gpu is finished with command list

ID3D12PipelineState* pipelineStateObject; // pso containing a pipeline state

ID3D12RootSignature* rootSignature; // root signature defines data shaders will access

D3D12_VIEWPORT viewport; // area that output from rasterizer will be stretched to.

D3D12_RECT scissorRect; // the area to draw in. pixels outside that area will not be drawn onto

ID3D12Resource* vertexBuffer; // a default buffer in GPU memory that we will load vertex data for our triangle into
ID3D12Resource* indexBuffer; // a default buffer in GPU memory that we will load index data for our triangle into

D3D12_VERTEX_BUFFER_VIEW vertexBufferView; // a structure containing a pointer to the vertex data in gpu memory
                                           // the total size of the buffer, and the size of each element (vertex)

D3D12_INDEX_BUFFER_VIEW indexBufferView; // a structure holding information about the index buffer

ID3D12Resource* depthStencilBuffer; // This is the memory for our depth buffer. it will also be used for a stencil buffer in a later tutorial
ID3D12DescriptorHeap* dsDescriptorHeap; // This is a heap for our depth/stencil buffer descriptor

// this is the structure of our constant buffer.
struct ConstantBufferPerObject {
    XMFLOAT4X4 wvpMat;
};

// Constant buffers must be 256-byte aligned which has to do with constant reads on the GPU.
// We are only able to read at 256 byte intervals from the start of a resource heap, so we will
// make sure that we add padding between the two constant buffers in the heap (one for cube1 and one for cube2)
// Another way to do this would be to add a float array in the constant buffer structure for padding. In this case
// we would need to add a float padding[50]; after the wvpMat variable. This would align our structure to 256 bytes (4 bytes per float)
// The reason i didn't go with this way, was because there would actually be wasted cpu cycles when memcpy our constant
// buffer data to the gpu virtual address. currently we memcpy the size of our structure, which is 16 bytes here, but if we
// were to add the padding array, we would memcpy 64 bytes if we memcpy the size of our structure, which is 50 wasted bytes
// being copied.
int ConstantBufferPerObjectAlignedSize = (sizeof(ConstantBufferPerObject) + 255) & ~255;

ConstantBufferPerObject cbPerObject; // this is the constant buffer data we will send to the gpu 
                                        // (which will be placed in the resource we created above)

ID3D12Resource* constantBufferUploadHeaps[frameBufferCount]; // this is the memory on the gpu where constant buffers for each frame will be placed

UINT8* cbvGPUAddress[frameBufferCount]; // this is a pointer to each of the constant buffer resource heaps

XMFLOAT4X4 cameraProjMat; // this will store our projection matrix
XMFLOAT4X4 cameraViewMat; // this will store our view matrix

XMFLOAT4 cameraPosition; // this is our cameras position vector
XMFLOAT4 cameraTarget; // a vector describing the point in space our camera is looking at
XMFLOAT4 cameraUp; // the worlds up vector

XMFLOAT4X4 cube1WorldMat; // our first cubes world matrix (transformation matrix)
XMFLOAT4X4 cube1RotMat; // this will keep track of our rotation for the first cube
XMFLOAT4 cube1Position; // our first cubes position in space

XMFLOAT4X4 cube2WorldMat; // our first cubes world matrix (transformation matrix)
XMFLOAT4X4 cube2RotMat; // this will keep track of our rotation for the second cube
XMFLOAT4 cube2PositionOffset; // our second cube will rotate around the first cube, so this is the position offset from the first cube

int numCubeIndices; // the number of indices to draw the cube

main.cpp

#include "stdafx.h"

struct Vertex {
    Vertex(float x, float y, float z, float r, float g, float b, float a) : pos(x, y, z), color(r, g, b, a) {}
    XMFLOAT3 pos;
    XMFLOAT4 color;
};

int WINAPI WinMain(HINSTANCE hInstance,    //Main windows function
    HINSTANCE hPrevInstance,
    LPSTR lpCmdLine,
    int nShowCmd)

{
    // create the window
    if (!InitializeWindow(hInstance, nShowCmd, FullScreen))
    {
        MessageBox(0, L"Window Initialization - Failed",
            L"Error", MB_OK);
        return 1;
    }

    // initialize direct3d
    if (!InitD3D())
    {
        MessageBox(0, L"Failed to initialize direct3d 12",
            L"Error", MB_OK);
        Cleanup();
        return 1;
    }

    // start the main loop
    mainloop();

    // we want to wait for the gpu to finish executing the command list before we start releasing everything
    WaitForPreviousFrame();

    // close the fence event
    CloseHandle(fenceEvent);

    // clean up everything
    Cleanup();

    return 0;
}

// create and show the window
bool InitializeWindow(HINSTANCE hInstance,
    int ShowWnd,
    bool fullscreen)

{
    if (fullscreen)
    {
        HMONITOR hmon = MonitorFromWindow(hwnd,
            MONITOR_DEFAULTTONEAREST);
        MONITORINFO mi = { sizeof(mi) };
        GetMonitorInfo(hmon, &mi);

        Width = mi.rcMonitor.right - mi.rcMonitor.left;
        Height = mi.rcMonitor.bottom - mi.rcMonitor.top;
    }

    WNDCLASSEX wc;

    wc.cbSize = sizeof(WNDCLASSEX);
    wc.style = CS_HREDRAW | CS_VREDRAW;
    wc.lpfnWndProc = WndProc;
    wc.cbClsExtra = NULL;
    wc.cbWndExtra = NULL;
    wc.hInstance = hInstance;
    wc.hIcon = LoadIcon(NULL, IDI_APPLICATION);
    wc.hCursor = LoadCursor(NULL, IDC_ARROW);
    wc.hbrBackground = (HBRUSH)(COLOR_WINDOW + 2);
    wc.lpszMenuName = NULL;
    wc.lpszClassName = WindowName;
    wc.hIconSm = LoadIcon(NULL, IDI_APPLICATION);

    if (!RegisterClassEx(&wc))
    {
        MessageBox(NULL, L"Error registering class",
            L"Error", MB_OK | MB_ICONERROR);
        return false;
    }

    hwnd = CreateWindowEx(NULL,
        WindowName,
        WindowTitle,
        WS_OVERLAPPEDWINDOW,
        CW_USEDEFAULT, CW_USEDEFAULT,
        Width, Height,
        NULL,
        NULL,
        hInstance,
        NULL);

    if (!hwnd)
    {
        MessageBox(NULL, L"Error creating window",
            L"Error", MB_OK | MB_ICONERROR);
        return false;
    }

    if (fullscreen)
    {
        SetWindowLong(hwnd, GWL_STYLE, 0);
    }

    ShowWindow(hwnd, ShowWnd);
    UpdateWindow(hwnd);

    return true;
}

void mainloop() {
    MSG msg;
    ZeroMemory(&msg, sizeof(MSG));

    while (Running)
    {
        if (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
        {
            if (msg.message == WM_QUIT)
                break;

            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
        else {
            // run game code
            Update(); // update the game logic
            Render(); // execute the command queue (rendering the scene is the result of the gpu executing the command lists)
        }
    }
}

LRESULT CALLBACK WndProc(HWND hwnd,
    UINT msg,
    WPARAM wParam,
    LPARAM lParam)

{
    switch (msg)
    {
    case WM_KEYDOWN:
        if (wParam == VK_ESCAPE) {
            if (MessageBox(0, L"Are you sure you want to exit?",
                L"Really?", MB_YESNO | MB_ICONQUESTION) == IDYES)
            {
                Running = false;
                DestroyWindow(hwnd);
            }
        }
        return 0;

    case WM_DESTROY: // x button on top right corner of window was pressed
        Running = false;
        PostQuitMessage(0);
        return 0;
    }
    return DefWindowProc(hwnd,
        msg,
        wParam,
        lParam);
}

bool InitD3D()
{
    HRESULT hr;

    // -- Create the Device -- //

    IDXGIFactory4* dxgiFactory;
    hr = CreateDXGIFactory1(IID_PPV_ARGS(&dxgiFactory));
    if (FAILED(hr))
    {
        return false;
    }

    IDXGIAdapter1* adapter; // adapters are the graphics card (this includes the embedded graphics on the motherboard)

    int adapterIndex = 0; // we'll start looking for directx 12  compatible graphics devices starting at index 0

    bool adapterFound = false; // set this to true when a good one was found

                               // find first hardware gpu that supports d3d 12
    while (dxgiFactory->EnumAdapters1(adapterIndex, &adapter) != DXGI_ERROR_NOT_FOUND)
    {
        DXGI_ADAPTER_DESC1 desc;
        adapter->GetDesc1(&desc);

        if (desc.Flags & DXGI_ADAPTER_FLAG_SOFTWARE)
        {
            // we dont want a software device
            continue;
        }

        // we want a device that is compatible with direct3d 12 (feature level 11 or higher)
        hr = D3D12CreateDevice(adapter, D3D_FEATURE_LEVEL_11_0, _uuidof(ID3D12Device), nullptr);
        if (SUCCEEDED(hr))
        {
            adapterFound = true;
            break;
        }

        adapterIndex++;
    }

    if (!adapterFound)
    {
        return false;
    }

    // Create the device
    hr = D3D12CreateDevice(
        adapter,
        D3D_FEATURE_LEVEL_11_0,
        IID_PPV_ARGS(&device)
        );
    if (FAILED(hr))
    {
        return false;
    }

    // -- Create a direct command queue -- //

    D3D12_COMMAND_QUEUE_DESC cqDesc = {};
    cqDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE;
    cqDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT; // direct means the gpu can directly execute this command queue

    hr = device->CreateCommandQueue(&cqDesc, IID_PPV_ARGS(&commandQueue)); // create the command queue
    if (FAILED(hr))
    {
        return false;
    }

    // -- Create the Swap Chain (double/tripple buffering) -- //

    DXGI_MODE_DESC backBufferDesc = {}; // this is to describe our display mode
    backBufferDesc.Width = Width; // buffer width
    backBufferDesc.Height = Height; // buffer height
    backBufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the buffer (rgba 32 bits, 8 bits for each chanel)

                                                        // describe our multi-sampling. We are not multi-sampling, so we set the count to 1 (we need at least one sample of course)
    DXGI_SAMPLE_DESC sampleDesc = {};
    sampleDesc.Count = 1; // multisample count (no multisampling, so we just put 1, since we still need 1 sample)

                          // Describe and create the swap chain.
    DXGI_SWAP_CHAIN_DESC swapChainDesc = {};
    swapChainDesc.BufferCount = frameBufferCount; // number of buffers we have
    swapChainDesc.BufferDesc = backBufferDesc; // our back buffer description
    swapChainDesc.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; // this says the pipeline will render to this swap chain
    swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD; // dxgi will discard the buffer (data) after we call present
    swapChainDesc.OutputWindow = hwnd; // handle to our window
    swapChainDesc.SampleDesc = sampleDesc; // our multi-sampling description
    swapChainDesc.Windowed = !FullScreen; // set to true, then if in fullscreen must call SetFullScreenState with true for full screen to get uncapped fps

    IDXGISwapChain* tempSwapChain;

    dxgiFactory->CreateSwapChain(
        commandQueue, // the queue will be flushed once the swap chain is created
        &swapChainDesc, // give it the swap chain description we created above
        &tempSwapChain // store the created swap chain in a temp IDXGISwapChain interface
        );

    swapChain = static_cast<IDXGISwapChain3*>(tempSwapChain);

    frameIndex = swapChain->GetCurrentBackBufferIndex();

    // -- Create the Back Buffers (render target views) Descriptor Heap -- //

    // describe an rtv descriptor heap and create
    D3D12_DESCRIPTOR_HEAP_DESC rtvHeapDesc = {};
    rtvHeapDesc.NumDescriptors = frameBufferCount; // number of descriptors for this heap.
    rtvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; // this heap is a render target view heap

                                                       // This heap will not be directly referenced by the shaders (not shader visible), as this will store the output from the pipeline
                                                       // otherwise we would set the heap's flag to D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE
    rtvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE;
    hr = device->CreateDescriptorHeap(&rtvHeapDesc, IID_PPV_ARGS(&rtvDescriptorHeap));
    if (FAILED(hr))
    {
        return false;
    }

    // get the size of a descriptor in this heap (this is a rtv heap, so only rtv descriptors should be stored in it.
    // descriptor sizes may vary from device to device, which is why there is no set size and we must ask the 
    // device to give us the size. we will use this size to increment a descriptor handle offset
    rtvDescriptorSize = device->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV);

    // get a handle to the first descriptor in the descriptor heap. a handle is basically a pointer,
    // but we cannot literally use it like a c++ pointer.
    CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart());

    // Create a RTV for each buffer (double buffering is two buffers, tripple buffering is 3).
    for (int i = 0; i < frameBufferCount; i++)
    {
        // first we get the n'th buffer in the swap chain and store it in the n'th
        // position of our ID3D12Resource array
        hr = swapChain->GetBuffer(i, IID_PPV_ARGS(&renderTargets[i]));
        if (FAILED(hr))
        {
            return false;
        }

        // the we "create" a render target view which binds the swap chain buffer (ID3D12Resource[n]) to the rtv handle
        device->CreateRenderTargetView(renderTargets[i], nullptr, rtvHandle);

        // we increment the rtv handle by the rtv descriptor size we got above
        rtvHandle.Offset(1, rtvDescriptorSize);
    }

    // -- Create the Command Allocators -- //

    for (int i = 0; i < frameBufferCount; i++)
    {
        hr = device->CreateCommandAllocator(D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(&commandAllocator[i]));
        if (FAILED(hr))
        {
            return false;
        }
    }

    // -- Create a Command List -- //

    // create the command list with the first allocator
    hr = device->CreateCommandList(0, D3D12_COMMAND_LIST_TYPE_DIRECT, commandAllocator[frameIndex], NULL, IID_PPV_ARGS(&commandList));
    if (FAILED(hr))
    {
        return false;
    }

    // -- Create a Fence & Fence Event -- //

    // create the fences
    for (int i = 0; i < frameBufferCount; i++)
    {
        hr = device->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&fence[i]));
        if (FAILED(hr))
        {
            return false;
        }
        fenceValue[i] = 0; // set the initial fence value to 0
    }

    // create a handle to a fence event
    fenceEvent = CreateEvent(nullptr, FALSE, FALSE, nullptr);
    if (fenceEvent == nullptr)
    {
        return false;
    }

    // create root signature

    // create a root descriptor, which explains where to find the data for this root parameter
    D3D12_ROOT_DESCRIPTOR rootCBVDescriptor;
    rootCBVDescriptor.RegisterSpace = 0;
    rootCBVDescriptor.ShaderRegister = 0;

    // create a root parameter and fill it out
    D3D12_ROOT_PARAMETER  rootParameters[1]; // only one parameter right now
    rootParameters[0].ParameterType = D3D12_ROOT_PARAMETER_TYPE_CBV; // this is a constant buffer view root descriptor
    rootParameters[0].Descriptor = rootCBVDescriptor; // this is the root descriptor for this root parameter
    rootParameters[0].ShaderVisibility = D3D12_SHADER_VISIBILITY_VERTEX; // our pixel shader will be the only shader accessing this parameter for now

    CD3DX12_ROOT_SIGNATURE_DESC rootSignatureDesc;
    rootSignatureDesc.Init(_countof(rootParameters), // we have 1 root parameter
        rootParameters, // a pointer to the beginning of our root parameters array
        0,
        nullptr,
        D3D12_ROOT_SIGNATURE_FLAG_ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT | // we can deny shader stages here for better performance
        D3D12_ROOT_SIGNATURE_FLAG_DENY_HULL_SHADER_ROOT_ACCESS |
        D3D12_ROOT_SIGNATURE_FLAG_DENY_DOMAIN_SHADER_ROOT_ACCESS |
        D3D12_ROOT_SIGNATURE_FLAG_DENY_GEOMETRY_SHADER_ROOT_ACCESS |
        D3D12_ROOT_SIGNATURE_FLAG_DENY_PIXEL_SHADER_ROOT_ACCESS);

    ID3DBlob* signature;
    hr = D3D12SerializeRootSignature(&rootSignatureDesc, D3D_ROOT_SIGNATURE_VERSION_1, &signature, nullptr);
    if (FAILED(hr))
    {
        return false;
    }

    hr = device->CreateRootSignature(0, signature->GetBufferPointer(), signature->GetBufferSize(), IID_PPV_ARGS(&rootSignature));
    if (FAILED(hr))
    {
        return false;
    }

    // create vertex and pixel shaders

    // when debugging, we can compile the shader files at runtime.
    // but for release versions, we can compile the hlsl shaders
    // with fxc.exe to create .cso files, which contain the shader
    // bytecode. We can load the .cso files at runtime to get the
    // shader bytecode, which of course is faster than compiling
    // them at runtime

    // compile vertex shader
    ID3DBlob* vertexShader; // d3d blob for holding vertex shader bytecode
    ID3DBlob* errorBuff; // a buffer holding the error data if any
    hr = D3DCompileFromFile(L"VertexShader.hlsl",
        nullptr,
        nullptr,
        "main",
        "vs_5_0",
        D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION,
        0,
        &vertexShader,
        &errorBuff);
    if (FAILED(hr))
    {
        OutputDebugStringA((char*)errorBuff->GetBufferPointer());
        return false;
    }

    // fill out a shader bytecode structure, which is basically just a pointer
    // to the shader bytecode and the size of the shader bytecode
    D3D12_SHADER_BYTECODE vertexShaderBytecode = {};
    vertexShaderBytecode.BytecodeLength = vertexShader->GetBufferSize();
    vertexShaderBytecode.pShaderBytecode = vertexShader->GetBufferPointer();

    // compile pixel shader
    ID3DBlob* pixelShader;
    hr = D3DCompileFromFile(L"PixelShader.hlsl",
        nullptr,
        nullptr,
        "main",
        "ps_5_0",
        D3DCOMPILE_DEBUG | D3DCOMPILE_SKIP_OPTIMIZATION,
        0,
        &pixelShader,
        &errorBuff);
    if (FAILED(hr))
    {
        OutputDebugStringA((char*)errorBuff->GetBufferPointer());
        return false;
    }

    // fill out shader bytecode structure for pixel shader
    D3D12_SHADER_BYTECODE pixelShaderBytecode = {};
    pixelShaderBytecode.BytecodeLength = pixelShader->GetBufferSize();
    pixelShaderBytecode.pShaderBytecode = pixelShader->GetBufferPointer();

    // create input layout

    // The input layout is used by the Input Assembler so that it knows
    // how to read the vertex data bound to it.

    D3D12_INPUT_ELEMENT_DESC inputLayout[] =
    {
        { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 },
        { "COLOR", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 0, 12, D3D12_INPUT_CLASSIFICATION_PER_VERTEX_DATA, 0 }
    };

    // fill out an input layout description structure
    D3D12_INPUT_LAYOUT_DESC inputLayoutDesc = {};

    // we can get the number of elements in an array by "sizeof(array) / sizeof(arrayElementType)"
    inputLayoutDesc.NumElements = sizeof(inputLayout) / sizeof(D3D12_INPUT_ELEMENT_DESC);
    inputLayoutDesc.pInputElementDescs = inputLayout;

    // create a pipeline state object (PSO)

    // In a real application, you will have many pso's. for each different shader
    // or different combinations of shaders, different blend states or different rasterizer states,
    // different topology types (point, line, triangle, patch), or a different number
    // of render targets you will need a pso

    // VS is the only required shader for a pso. You might be wondering when a case would be where
    // you only set the VS. It's possible that you have a pso that only outputs data with the stream
    // output, and not on a render target, which means you would not need anything after the stream
    // output.

    D3D12_GRAPHICS_PIPELINE_STATE_DESC psoDesc = {}; // a structure to define a pso
    psoDesc.InputLayout = inputLayoutDesc; // the structure describing our input layout
    psoDesc.pRootSignature = rootSignature; // the root signature that describes the input data this pso needs
    psoDesc.VS = vertexShaderBytecode; // structure describing where to find the vertex shader bytecode and how large it is
    psoDesc.PS = pixelShaderBytecode; // same as VS but for pixel shader
    psoDesc.PrimitiveTopologyType = D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE; // type of topology we are drawing
    psoDesc.RTVFormats[0] = DXGI_FORMAT_R8G8B8A8_UNORM; // format of the render target
    psoDesc.SampleDesc = sampleDesc; // must be the same sample description as the swapchain and depth/stencil buffer
    psoDesc.SampleMask = 0xffffffff; // sample mask has to do with multi-sampling. 0xffffffff means point sampling is done
    psoDesc.RasterizerState = CD3DX12_RASTERIZER_DESC(D3D12_DEFAULT); // a default rasterizer state.
    psoDesc.BlendState = CD3DX12_BLEND_DESC(D3D12_DEFAULT); // a default blent state.
    psoDesc.NumRenderTargets = 1; // we are only binding one render target
    psoDesc.DepthStencilState = CD3DX12_DEPTH_STENCIL_DESC(D3D12_DEFAULT); // a default depth stencil state

    // create the pso
    hr = device->CreateGraphicsPipelineState(&psoDesc, IID_PPV_ARGS(&pipelineStateObject));
    if (FAILED(hr))
    {
        return false;
    }

    // Create vertex buffer

    // a quad
    Vertex vList[] = {
        // front face
        { -0.5f,  0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
        {  0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
        { -0.5f, -0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
        {  0.5f,  0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

        // right side face
        {  0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
        {  0.5f,  0.5f,  0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
        {  0.5f, -0.5f,  0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
        {  0.5f,  0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

        // left side face
        { -0.5f,  0.5f,  0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
        { -0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
        { -0.5f, -0.5f,  0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
        { -0.5f,  0.5f, -0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

        // back face
        {  0.5f,  0.5f,  0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
        { -0.5f, -0.5f,  0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
        {  0.5f, -0.5f,  0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
        { -0.5f,  0.5f,  0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

        // top face
        { -0.5f,  0.5f, -0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
        { 0.5f,  0.5f,  0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
        { 0.5f,  0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
        { -0.5f,  0.5f,  0.5f, 0.0f, 1.0f, 0.0f, 1.0f },

        // bottom face
        {  0.5f, -0.5f,  0.5f, 1.0f, 0.0f, 0.0f, 1.0f },
        { -0.5f, -0.5f, -0.5f, 1.0f, 0.0f, 1.0f, 1.0f },
        {  0.5f, -0.5f, -0.5f, 0.0f, 0.0f, 1.0f, 1.0f },
        { -0.5f, -0.5f,  0.5f, 0.0f, 1.0f, 0.0f, 1.0f },
    };

    int vBufferSize = sizeof(vList);

    // create default heap
    // default heap is memory on the GPU. Only the GPU has access to this memory
    // To get data into this heap, we will have to upload the data using
    // an upload heap
    device->CreateCommittedResource(
        &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap
        D3D12_HEAP_FLAG_NONE, // no flags
        &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer
        D3D12_RESOURCE_STATE_COPY_DEST, // we will start this heap in the copy destination state since we will copy data
                                        // from the upload heap to this heap
        nullptr, // optimized clear value must be null for this type of resource. used for render targets and depth/stencil buffers
        IID_PPV_ARGS(&vertexBuffer));

    // we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at
    vertexBuffer->SetName(L"Vertex Buffer Resource Heap");

    // create upload heap
    // upload heaps are used to upload data to the GPU. CPU can write to it, GPU can read from it
    // We will upload the vertex buffer using this heap to the default heap
    ID3D12Resource* vBufferUploadHeap;
    device->CreateCommittedResource(
        &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap
        D3D12_HEAP_FLAG_NONE, // no flags
        &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer
        D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap
        nullptr,
        IID_PPV_ARGS(&vBufferUploadHeap));
    vBufferUploadHeap->SetName(L"Vertex Buffer Upload Resource Heap");

    // store vertex buffer in upload heap
    D3D12_SUBRESOURCE_DATA vertexData = {};
    vertexData.pData = reinterpret_cast<BYTE*>(vList); // pointer to our vertex array
    vertexData.RowPitch = vBufferSize; // size of all our triangle vertex data
    vertexData.SlicePitch = vBufferSize; // also the size of our triangle vertex data

    // we are now creating a command with the command list to copy the data from
    // the upload heap to the default heap
    UpdateSubresources(commandList, vertexBuffer, vBufferUploadHeap, 0, 0, 1, &vertexData);

    // transition the vertex buffer data from copy destination state to vertex buffer state
    commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(vertexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER));

    // Create index buffer

    // a quad (2 triangles)
    DWORD iList[] = {
        // ffront face
        0, 1, 2, // first triangle
        0, 3, 1, // second triangle

        // left face
        4, 5, 6, // first triangle
        4, 7, 5, // second triangle

        // right face
        8, 9, 10, // first triangle
        8, 11, 9, // second triangle

        // back face
        12, 13, 14, // first triangle
        12, 15, 13, // second triangle

        // top face
        16, 17, 18, // first triangle
        16, 19, 17, // second triangle

        // bottom face
        20, 21, 22, // first triangle
        20, 23, 21, // second triangle
    };

    int iBufferSize = sizeof(iList);

    numCubeIndices = sizeof(iList) / sizeof(DWORD);

    // create default heap to hold index buffer
    device->CreateCommittedResource(
        &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), // a default heap
        D3D12_HEAP_FLAG_NONE, // no flags
        &CD3DX12_RESOURCE_DESC::Buffer(iBufferSize), // resource description for a buffer
        D3D12_RESOURCE_STATE_COPY_DEST, // start in the copy destination state
        nullptr, // optimized clear value must be null for this type of resource
        IID_PPV_ARGS(&indexBuffer));

    // we can give resource heaps a name so when we debug with the graphics debugger we know what resource we are looking at
    vertexBuffer->SetName(L"Index Buffer Resource Heap");

    // create upload heap to upload index buffer
    ID3D12Resource* iBufferUploadHeap;
    device->CreateCommittedResource(
        &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // upload heap
        D3D12_HEAP_FLAG_NONE, // no flags
        &CD3DX12_RESOURCE_DESC::Buffer(vBufferSize), // resource description for a buffer
        D3D12_RESOURCE_STATE_GENERIC_READ, // GPU will read from this buffer and copy its contents to the default heap
        nullptr,
        IID_PPV_ARGS(&iBufferUploadHeap));
    vBufferUploadHeap->SetName(L"Index Buffer Upload Resource Heap");

    // store vertex buffer in upload heap
    D3D12_SUBRESOURCE_DATA indexData = {};
    indexData.pData = reinterpret_cast<BYTE*>(iList); // pointer to our index array
    indexData.RowPitch = iBufferSize; // size of all our index buffer
    indexData.SlicePitch = iBufferSize; // also the size of our index buffer

    // we are now creating a command with the command list to copy the data from
    // the upload heap to the default heap
    UpdateSubresources(commandList, indexBuffer, iBufferUploadHeap, 0, 0, 1, &indexData);

    // transition the vertex buffer data from copy destination state to vertex buffer state
    commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(indexBuffer, D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_VERTEX_AND_CONSTANT_BUFFER));

    // Create the depth/stencil buffer

    // create a depth stencil descriptor heap so we can get a pointer to the depth stencil buffer
    D3D12_DESCRIPTOR_HEAP_DESC dsvHeapDesc = {};
    dsvHeapDesc.NumDescriptors = 1;
    dsvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_DSV;
    dsvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE;
    hr = device->CreateDescriptorHeap(&dsvHeapDesc, IID_PPV_ARGS(&dsDescriptorHeap));
    if (FAILED(hr))
    {
        Running = false;
    }

    D3D12_DEPTH_STENCIL_VIEW_DESC depthStencilDesc = {};
    depthStencilDesc.Format = DXGI_FORMAT_D32_FLOAT;
    depthStencilDesc.ViewDimension = D3D12_DSV_DIMENSION_TEXTURE2D;
    depthStencilDesc.Flags = D3D12_DSV_FLAG_NONE;

    D3D12_CLEAR_VALUE depthOptimizedClearValue = {};
    depthOptimizedClearValue.Format = DXGI_FORMAT_D32_FLOAT;
    depthOptimizedClearValue.DepthStencil.Depth = 1.0f;
    depthOptimizedClearValue.DepthStencil.Stencil = 0;

    device->CreateCommittedResource(
        &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT),
        D3D12_HEAP_FLAG_NONE,
        &CD3DX12_RESOURCE_DESC::Tex2D(DXGI_FORMAT_D32_FLOAT, Width, Height, 1, 0, 1, 0, D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL),
        D3D12_RESOURCE_STATE_DEPTH_WRITE,
        &depthOptimizedClearValue,
        IID_PPV_ARGS(&depthStencilBuffer)
        );
    dsDescriptorHeap->SetName(L"Depth/Stencil Resource Heap");

    device->CreateDepthStencilView(depthStencilBuffer, &depthStencilDesc, dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart());

    // create the constant buffer resource heap
    // We will update the constant buffer one or more times per frame, so we will use only an upload heap
    // unlike previously we used an upload heap to upload the vertex and index data, and then copied over
    // to a default heap. If you plan to use a resource for more than a couple frames, it is usually more
    // efficient to copy to a default heap where it stays on the gpu. In this case, our constant buffer
    // will be modified and uploaded at least once per frame, so we only use an upload heap

    // first we will create a resource heap (upload heap) for each frame for the cubes constant buffers
    // As you can see, we are allocating 64KB for each resource we create. Buffer resource heaps must be
    // an alignment of 64KB. We are creating 3 resources, one for each frame. Each constant buffer is 
    // only a 4x4 matrix of floats in this tutorial. So with a float being 4 bytes, we have 
    // 16 floats in one constant buffer, and we will store 2 constant buffers in each
    // heap, one for each cube, thats only 64x2 bits, or 128 bits we are using for each
    // resource, and each resource must be at least 64KB (65536 bits)
    for (int i = 0; i < frameBufferCount; ++i)
    {
        // create resource for cube 1
        hr = device->CreateCommittedResource(
            &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), // this heap will be used to upload the constant buffer data
            D3D12_HEAP_FLAG_NONE, // no flags
            &CD3DX12_RESOURCE_DESC::Buffer(1024 * 64), // size of the resource heap. Must be a multiple of 64KB for single-textures and constant buffers
            D3D12_RESOURCE_STATE_GENERIC_READ, // will be data that is read from so we keep it in the generic read state
            nullptr, // we do not have use an optimized clear value for constant buffers
            IID_PPV_ARGS(&constantBufferUploadHeaps[i]));
        constantBufferUploadHeaps[i]->SetName(L"Constant Buffer Upload Resource Heap");

        ZeroMemory(&cbPerObject, sizeof(cbPerObject));

        CD3DX12_RANGE readRange(0, 0);    // We do not intend to read from this resource on the CPU. (so end is less than or equal to begin)
        
        // map the resource heap to get a gpu virtual address to the beginning of the heap
        hr = constantBufferUploadHeaps[i]->Map(0, &readRange, reinterpret_cast<void**>(&cbvGPUAddress[i]));

        // Because of the constant read alignment requirements, constant buffer views must be 256 bit aligned. Our buffers are smaller than 256 bits,
        // so we need to add spacing between the two buffers, so that the second buffer starts at 256 bits from the beginning of the resource heap.
        memcpy(cbvGPUAddress[i], &cbPerObject, sizeof(cbPerObject)); // cube1's constant buffer data
        memcpy(cbvGPUAddress[i] + ConstantBufferPerObjectAlignedSize, &cbPerObject, sizeof(cbPerObject)); // cube2's constant buffer data
    }

    // Now we execute the command list to upload the initial assets (triangle data)
    commandList->Close();
    ID3D12CommandList* ppCommandLists[] = { commandList };
    commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);

    // increment the fence value now, otherwise the buffer might not be uploaded by the time we start drawing
    fenceValue[frameIndex]++;
    hr = commandQueue->Signal(fence[frameIndex], fenceValue[frameIndex]);
    if (FAILED(hr))
    {
        Running = false;
    }

    // create a vertex buffer view for the triangle. We get the GPU memory address to the vertex pointer using the GetGPUVirtualAddress() method
    vertexBufferView.BufferLocation = vertexBuffer->GetGPUVirtualAddress();
    vertexBufferView.StrideInBytes = sizeof(Vertex);
    vertexBufferView.SizeInBytes = vBufferSize;

    // create a vertex buffer view for the triangle. We get the GPU memory address to the vertex pointer using the GetGPUVirtualAddress() method
    indexBufferView.BufferLocation = indexBuffer->GetGPUVirtualAddress();
    indexBufferView.Format = DXGI_FORMAT_R32_UINT; // 32-bit unsigned integer (this is what a dword is, double word, a word is 2 bytes)
    indexBufferView.SizeInBytes = iBufferSize;

    // Fill out the Viewport
    viewport.TopLeftX = 0;
    viewport.TopLeftY = 0;
    viewport.Width = Width;
    viewport.Height = Height;
    viewport.MinDepth = 0.0f;
    viewport.MaxDepth = 1.0f;

    // Fill out a scissor rect
    scissorRect.left = 0;
    scissorRect.top = 0;
    scissorRect.right = Width;
    scissorRect.bottom = Height;

    // build projection and view matrix
    XMMATRIX tmpMat = XMMatrixPerspectiveFovLH(45.0f*(3.14f/180.0f), (float)Width / (float)Height, 0.1f, 1000.0f);
    XMStoreFloat4x4(&cameraProjMat, tmpMat);

    // set starting camera state
    cameraPosition = XMFLOAT4(0.0f, 2.0f, -4.0f, 0.0f);
    cameraTarget = XMFLOAT4(0.0f, 0.0f, 0.0f, 0.0f);
    cameraUp = XMFLOAT4(0.0f, 1.0f, 0.0f, 0.0f);

    // build view matrix
    XMVECTOR cPos = XMLoadFloat4(&cameraPosition);
    XMVECTOR cTarg = XMLoadFloat4(&cameraTarget);
    XMVECTOR cUp = XMLoadFloat4(&cameraUp);
    tmpMat = XMMatrixLookAtLH(cPos, cTarg, cUp);
    XMStoreFloat4x4(&cameraViewMat, tmpMat);

    // set starting cubes position
    // first cube
    cube1Position = XMFLOAT4(0.0f, 0.0f, 0.0f, 0.0f); // set cube 1's position
    XMVECTOR posVec = XMLoadFloat4(&cube1Position); // create xmvector for cube1's position

    tmpMat = XMMatrixTranslationFromVector(posVec); // create translation matrix from cube1's position vector
    XMStoreFloat4x4(&cube1RotMat, XMMatrixIdentity()); // initialize cube1's rotation matrix to identity matrix
    XMStoreFloat4x4(&cube1WorldMat, tmpMat); // store cube1's world matrix

    // second cube
    cube2PositionOffset = XMFLOAT4(1.5f, 0.0f, 0.0f, 0.0f);
    posVec = XMLoadFloat4(&cube2PositionOffset) + XMLoadFloat4(&cube1Position); // create xmvector for cube2's position
                                                                                // we are rotating around cube1 here, so add cube2's position to cube1

    tmpMat = XMMatrixTranslationFromVector(posVec); // create translation matrix from cube2's position offset vector
    XMStoreFloat4x4(&cube2RotMat, XMMatrixIdentity()); // initialize cube2's rotation matrix to identity matrix
    XMStoreFloat4x4(&cube2WorldMat, tmpMat); // store cube2's world matrix

    return true;
}

void Update()
{
    // update app logic, such as moving the camera or figuring out what objects are in view

    // create rotation matrices
    XMMATRIX rotXMat = XMMatrixRotationX(0.0001f);
    XMMATRIX rotYMat = XMMatrixRotationY(0.0002f);
    XMMATRIX rotZMat = XMMatrixRotationZ(0.0003f);

    // add rotation to cube1's rotation matrix and store it
    XMMATRIX rotMat = XMLoadFloat4x4(&cube1RotMat) * rotXMat * rotYMat * rotZMat;
    XMStoreFloat4x4(&cube1RotMat, rotMat);

    // create translation matrix for cube 1 from cube 1's position vector
    XMMATRIX translationMat = XMMatrixTranslationFromVector(XMLoadFloat4(&cube1Position));

    // create cube1's world matrix by first rotating the cube, then positioning the rotated cube
    XMMATRIX worldMat = rotMat * translationMat;

    // store cube1's world matrix
    XMStoreFloat4x4(&cube1WorldMat, worldMat);

    // update constant buffer for cube1
    // create the wvp matrix and store in constant buffer
    XMMATRIX viewMat = XMLoadFloat4x4(&cameraViewMat); // load view matrix
    XMMATRIX projMat = XMLoadFloat4x4(&cameraProjMat); // load projection matrix
    XMMATRIX wvpMat = XMLoadFloat4x4(&cube1WorldMat) * viewMat * projMat; // create wvp matrix
    XMMATRIX transposed = XMMatrixTranspose(wvpMat); // must transpose wvp matrix for the gpu
    XMStoreFloat4x4(&cbPerObject.wvpMat, transposed); // store transposed wvp matrix in constant buffer

    // copy our ConstantBuffer instance to the mapped constant buffer resource
    memcpy(cbvGPUAddress[frameIndex], &cbPerObject, sizeof(cbPerObject));

    // now do cube2's world matrix
    // create rotation matrices for cube2
    rotXMat = XMMatrixRotationX(0.0003f);
    rotYMat = XMMatrixRotationY(0.0002f);
    rotZMat = XMMatrixRotationZ(0.0001f);

    // add rotation to cube2's rotation matrix and store it
    rotMat = rotZMat * (XMLoadFloat4x4(&cube2RotMat) * (rotXMat * rotYMat));
    XMStoreFloat4x4(&cube2RotMat, rotMat);

    // create translation matrix for cube 2 to offset it from cube 1 (its position relative to cube1
    XMMATRIX translationOffsetMat = XMMatrixTranslationFromVector(XMLoadFloat4(&cube2PositionOffset));

    // we want cube 2 to be half the size of cube 1, so we scale it by .5 in all dimensions
    XMMATRIX scaleMat = XMMatrixScaling(0.5f, 0.5f, 0.5f);

    // reuse worldMat. 
    // first we scale cube2. scaling happens relative to point 0,0,0, so you will almost always want to scale first
    // then we translate it. 
    // then we rotate it. rotation always rotates around point 0,0,0
    // finally we move it to cube 1's position, which will cause it to rotate around cube 1
    worldMat = scaleMat * translationOffsetMat * rotMat * translationMat;

    wvpMat = XMLoadFloat4x4(&cube2WorldMat) * viewMat * projMat; // create wvp matrix
    transposed = XMMatrixTranspose(wvpMat); // must transpose wvp matrix for the gpu
    XMStoreFloat4x4(&cbPerObject.wvpMat, transposed); // store transposed wvp matrix in constant buffer

    // copy our ConstantBuffer instance to the mapped constant buffer resource
    memcpy(cbvGPUAddress[frameIndex] + ConstantBufferPerObjectAlignedSize, &cbPerObject, sizeof(cbPerObject));

    // store cube2's world matrix
    XMStoreFloat4x4(&cube2WorldMat, worldMat);
}

void UpdatePipeline()
{
    HRESULT hr;

    // We have to wait for the gpu to finish with the command allocator before we reset it
    WaitForPreviousFrame();

    // we can only reset an allocator once the gpu is done with it
    // resetting an allocator frees the memory that the command list was stored in
    hr = commandAllocator[frameIndex]->Reset();
    if (FAILED(hr))
    {
        Running = false;
    }

    // reset the command list. by resetting the command list we are putting it into
    // a recording state so we can start recording commands into the command allocator.
    // the command allocator that we reference here may have multiple command lists
    // associated with it, but only one can be recording at any time. Make sure
    // that any other command lists associated to this command allocator are in
    // the closed state (not recording).
    // Here you will pass an initial pipeline state object as the second parameter,
    // but in this tutorial we are only clearing the rtv, and do not actually need
    // anything but an initial default pipeline, which is what we get by setting
    // the second parameter to NULL
    hr = commandList->Reset(commandAllocator[frameIndex], pipelineStateObject);
    if (FAILED(hr))
    {
        Running = false;
    }

    // here we start recording commands into the commandList (which all the commands will be stored in the commandAllocator)

    // transition the "frameIndex" render target from the present state to the render target state so the command list draws to it starting from here
    commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET));

    // here we again get the handle to our current render target view so we can set it as the render target in the output merger stage of the pipeline
    CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHandle(rtvDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), frameIndex, rtvDescriptorSize);

    // get a handle to the depth/stencil buffer
    CD3DX12_CPU_DESCRIPTOR_HANDLE dsvHandle(dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart());

    // set the render target for the output merger stage (the output of the pipeline)
    commandList->OMSetRenderTargets(1, &rtvHandle, FALSE, &dsvHandle);

    // Clear the render target by using the ClearRenderTargetView command
    const float clearColor[] = { 0.0f, 0.2f, 0.4f, 1.0f };
    commandList->ClearRenderTargetView(rtvHandle, clearColor, 0, nullptr);

    // clear the depth/stencil buffer
    commandList->ClearDepthStencilView(dsDescriptorHeap->GetCPUDescriptorHandleForHeapStart(), D3D12_CLEAR_FLAG_DEPTH, 1.0f, 0, 0, nullptr);

    // set root signature
    commandList->SetGraphicsRootSignature(rootSignature); // set the root signature

    // draw triangle
    commandList->RSSetViewports(1, &viewport); // set the viewports
    commandList->RSSetScissorRects(1, &scissorRect); // set the scissor rects
    commandList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST); // set the primitive topology
    commandList->IASetVertexBuffers(0, 1, &vertexBufferView); // set the vertex buffer (using the vertex buffer view)
    commandList->IASetIndexBuffer(&indexBufferView);

    // first cube

    // set cube1's constant buffer
    commandList->SetGraphicsRootConstantBufferView(0, constantBufferUploadHeaps[frameIndex]->GetGPUVirtualAddress());

    // draw first cube
    commandList->DrawIndexedInstanced(numCubeIndices, 1, 0, 0, 0);

    // second cube

    // set cube2's constant buffer. You can see we are adding the size of ConstantBufferPerObject to the constant buffer
    // resource heaps address. This is because cube1's constant buffer is stored at the beginning of the resource heap, while
    // cube2's constant buffer data is stored after (256 bits from the start of the heap).
    commandList->SetGraphicsRootConstantBufferView(0, constantBufferUploadHeaps[frameIndex]->GetGPUVirtualAddress() + ConstantBufferPerObjectAlignedSize);

    // draw second cube
    commandList->DrawIndexedInstanced(numCubeIndices, 1, 0, 0, 0);

    // transition the "frameIndex" render target from the render target state to the present state. If the debug layer is enabled, you will receive a
    // warning if present is called on the render target when it's not in the present state
    commandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(renderTargets[frameIndex], D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT));

    hr = commandList->Close();
    if (FAILED(hr))
    {
        Running = false;
    }
}

void Render()
{
    HRESULT hr;

    UpdatePipeline(); // update the pipeline by sending commands to the commandqueue

    // create an array of command lists (only one command list here)
    ID3D12CommandList* ppCommandLists[] = { commandList };

    // execute the array of command lists
    commandQueue->ExecuteCommandLists(_countof(ppCommandLists), ppCommandLists);

    // this command goes in at the end of our command queue. we will know when our command queue 
    // has finished because the fence value will be set to "fenceValue" from the GPU since the command
    // queue is being executed on the GPU
    hr = commandQueue->Signal(fence[frameIndex], fenceValue[frameIndex]);
    if (FAILED(hr))
    {
        Running = false;
    }

    // present the current backbuffer
    hr = swapChain->Present(0, 0);
    if (FAILED(hr))
    {
        Running = false;
    }
}

void Cleanup()
{
    // wait for the gpu to finish all frames
    for (int i = 0; i < frameBufferCount; ++i)
    {
        frameIndex = i;
        WaitForPreviousFrame();
    }

    // get swapchain out of full screen before exiting
    BOOL fs = false;
    if (swapChain->GetFullscreenState(&fs, NULL))
        swapChain->SetFullscreenState(false, NULL);

    SAFE_RELEASE(device);
    SAFE_RELEASE(swapChain);
    SAFE_RELEASE(commandQueue);
    SAFE_RELEASE(rtvDescriptorHeap);
    SAFE_RELEASE(commandList);

    for (int i = 0; i < frameBufferCount; ++i)
    {
        SAFE_RELEASE(renderTargets[i]);
        SAFE_RELEASE(commandAllocator[i]);
        SAFE_RELEASE(fence[i]);
    };

    SAFE_RELEASE(pipelineStateObject);
    SAFE_RELEASE(rootSignature);
    SAFE_RELEASE(vertexBuffer);
    SAFE_RELEASE(indexBuffer);

    SAFE_RELEASE(depthStencilBuffer);
    SAFE_RELEASE(dsDescriptorHeap);

    for (int i = 0; i < frameBufferCount; ++i)
    {
        SAFE_RELEASE(constantBufferUploadHeaps[i]);
    };
}

void WaitForPreviousFrame()
{
    HRESULT hr;

    // swap the current rtv buffer index so we draw on the correct buffer
    frameIndex = swapChain->GetCurrentBackBufferIndex();

    // if the current fence value is still less than "fenceValue", then we know the GPU has not finished executing
    // the command queue since it has not reached the "commandQueue->Signal(fence, fenceValue)" command
    if (fence[frameIndex]->GetCompletedValue() < fenceValue[frameIndex])
    {
        // we have the fence create an event which is signaled once the fence's current value is "fenceValue"
        hr = fence[frameIndex]->SetEventOnCompletion(fenceValue[frameIndex], fenceEvent);
        if (FAILED(hr))
        {
            Running = false;
        }

        // We will wait until the fence has triggered the event that it's current value has reached "fenceValue". once it's value
        // has reached "fenceValue", we know the command queue has finished executing
        WaitForSingleObject(fenceEvent, INFINITE);
    }

    // increment fenceValue for next frame
    fenceValue[frameIndex]++;
}

参考链接:

  1. https://docs.microsoft.com/en-us/windows/win32/direct3d12/directx-12-programming-guide
  2. http://www.d3dcoder.net/
  3. https://www.braynzarsoft.net/viewtutorial/q16390-04-directx-12-braynzar-soft-tutorials
  4. https://developer.nvidia.com/dx12-dos-and-donts
  5. https://www.3dgep.com/learning-directx-12-1/
  6. https://gpuopen.com/learn/lets-learn-directx12/
  7. https://alain.xyz/blog/raw-directx12
  8. https://www.rastertek.com/tutdx12.html
  9. https://digitalerr0r.net/2015/08/19/quickstart-directx-12-programming/
  10. https://walbourn.github.io/getting-started-with-direct3d-12/
  11. https://docs.aws.amazon.com/lumberyard/latest/userguide/graphics-rendering-directx.html
  12. http://diligentgraphics.com/diligent-engine/samples/
  13. https://www.programmersought.com/article/2904113865/
  14. https://www.tutorialspoint.com/directx/directx_first_hlsl.htm
  15. http://rbwhitaker.wikidot.com/hlsl-tutorials
  16. https://digitalerr0r.net/2015/08/19/quickstart-directx-12-programming/
  17. https://www.ronja-tutorials.com/post/002-hlsl/

变换和世界视图投影空间矩阵_DirectX12从0到1变换和世界视图投影空间矩阵_DirectX12从0到1_02变换和世界视图投影空间矩阵_DirectX12从0到1_03