文档列表见:Rust 移动端跨平台复杂图形渲染项目开发系列总结(目录)

gfx-hal接口以1:1模仿Vulkan,下面改用Vulkan接口作说明。由于Vulkan接口粒度过细,比OpenGL / ES难学数倍。根据个人经验,对于移动端图形开发者,照着OpenGL ES的接口讲解Vulkan可降低学习难度。从逐帧渲染部分开始学习,跳过这些数据结构的初始化过程,有利于把握Vulkan的核心流程。

OpenGL / ES 逐帧渲染流程示例

// 准备渲染目标环境
glBindFramebuffer();
glFramebufferTexture2D(); glCheckFramebufferStatus(); // 假如渲染到纹理
glViewport(x, y, width, height);
// 准备Shader需要读取的数据
glUseProgram(x);
glBindBuffer(i)
loop i in 0..VertexVarCount {
    glEnableVertexAttribArray(i);
    glVertexAttribPointer(i, ...); 
}
loop i in 0..UniformVarCount {
    switch UniformType {
        case NoTexture: glUniformX(i, data); break;
        case Texture: {
            glActiveTexture(j);
            glBindTexture(type, texture_name);
            glUniform1i(location, j);
            break;
        }
        default:ERROR();
    }
}
// 配置其他Fragment操作,比如glBlend, glStencil
glDrawArrays/Elements/ArraysInstanced...
// 到此完成Draw Call,视情况调用EGL函数交换前后帧缓冲区,非GL函数,
// 渲染到纹理则无此操作。
// 为了不干扰后续绘制,恢复刚才设置的Fragment操作为默认值。
eglSwapbuffers()/[EAGLContext presentRenderbuffer]; 
复制代码

可见,OpenGL / ES的接口屏蔽了绝大部分细节,整体代码量显得很少,但初学时也不好理解,用久了就成套路,觉得就该这样,以致于第一次接触Vulkan发现很多细节之前完全不了解,有点懵。

OpenGL / ES形成套路后的缺点是,出错的第一时间很难定位出是项目代码的问题,比如状态机没设置好,还是驱动的问题,在iOS上还好,Android真是眼黑。我猜你会说有各厂家的Profile工具和Google gapid,都特么不好用,高通的技术支持建议我们用Android 8.0 + Root设备。但是,往往出问题的都是Android 4.x,所以,这特么在逗我们呢。

渲染到视图

gfx-hal(Vulkan)逐帧渲染到视图的核心调用流程如下所示:

EventSource ->[CommandPool -> ComanndBuffer
                -> Submit -> Submission
                -> QueueGroup -> CommandQueue]
-> GraphicsHardware
复制代码

说明:

  • EventSource:表示信号源,比如相机回调一帧图像、屏幕的vsync信号、用户输入等。
  • CommandQueue:用于执行不同类型任务的队列,比如渲染任务、计算任务。
  • QueueGroup:CommandQueue集合
  • GraphicsHardware:图形硬件

具体流程代码:

  • 重置Fence,给后面提交Submission到队列使用。
device.reset_fence(&frame_fence);
复制代码
  • 重置CommandPool,即重置从此Pool中创建的CommandBuffer。如果CommandBuffer还在中,需要开发者实现资源同步操作。
command_pool.reset();
复制代码
  • 从SwapChain获取Image索引
let frame = swap_chain.acquire_image(!0, FrameSync::Semaphore(&mut frame_semaphore));
复制代码
  • 通过CommandPool创建、配置CommandBuffer,命令录制结束后得到有效的Submit对象
let mut cmd_buffer = command_pool.acquire_command_buffer(false);
// 一系列类似OpenGL / ES的Fragment操作、绑定数据到Program的配置
// 两个值得注意的Pipeline操作
cmd_buffer.bind_graphics_pipeline(&pipeline);
cmd_buffer.bind_graphics_descriptor_sets(&pipeline_layout, 0, Some(&desc_set), &[]);
// 联合RenderPass的操作
let mut encoder = cmd_buffer.begin_render_pass_inline(&render_pass,...);
let submit = cmd_buffer.finish()
复制代码
  • 通过Submit创建Submission
let submission = Submission::new()
    .wait_on(&[(&frame_semaphore, PipelineStage::BOTTOM_OF_PIPE)])
    .submit(Some(submit));
复制代码
  • 提交Submission到队列
queue.submit(submission, Some(&mut frame_fence));
复制代码
  • 等待CPU编码完成
device.wait_for_fence(&frame_fence, !0);
复制代码
  • 交换前后帧缓冲区,相当于eglSwapbuffers
swap_chain.present(&mut queue_group.queues[0], frame, &[])
复制代码

配置CommandBuffer的进一步介绍

OpenGL / ES 2/3.x没CommandPool与CommandBuffer数据结构,除了最新的OpenGL小版本才加入了SPIR-V和Command,但OpenGL ES还没更新。Metal的CommandBuffer接口定义不同于Vulkan。Metal创建MTLCommandBuffer,由Buffer与RenderPassDescriptor一起创建出 Enconder,然后打包本次渲染相关的资源,最后提交Buffer到队列让GPU执行。Vulkan基本把Metal的Encoder操作放到CommandBuffer,只留了很薄的Encoder操作。

总体流程:

  • 由Command Pool分配可用Command Buffer
  • 配置viewport等信息
  • 配置顶点数据缓冲区
  • 配置Uniform与Buffer的对应关系
  • 设置输出目标RenderPass
  • 设置绘制方式,draw/draw_indexed/draw_indirect等等
  • 结束配置

代码示例如下:

let submit = {
    // 从缓冲区中取出一个实际为RawCommandBuffer的实例,
    // 加上线程安全对象,组装成线程安全的CommandBuffer实例,
    // 这是HAL的编程“套路”,还有很多这类数据结构
    let mut cmd_buffer = command_pool.acquire_command_buffer(false);

    cmd_buffer.set_viewports(0, &[viewport]);
    cmd_buffer.set_scissors(0, &[viewport.rect]);
    cmd_buffer.bind_graphics_pipeline(&pipeline);
    cmd_buffer.bind_vertex_buffers(0, pso::VertexBufferSet(vec![(&vertex_buffer, 0)]));
    cmd_buffer.bind_graphics_descriptor_sets(&pipeline_layout, 0, Some(&desc_set)); //TODO

    {
        let mut encoder = cmd_buffer.begin_render_pass_inline(
            &render_pass,
            &framebuffers[frame.id()],
            viewport.rect,
            &[command::ClearValue::Color(command::ClearColor::Float([0.8, 0.8, 0.8, 1.0]))],
        );
        encoder.draw(0..6, 0..1);
    }

    cmd_buffer.finish()
};
复制代码

这段代码显示了CommandBuffer两个很关键的操作:bind_graphics_pipeline(GraphicsPipeline)和bind_graphics_descriptor_sets(PipelineLayout, DescriptorSet)。GraphicsPipeline相当于OpenGL / ES的Program,PipelineLayout和DescriptorSet描述了Shader的Uniform变量如何读取Buffer的数据,这两个数据结构的初始化极其复杂,我之前看了都想骂人,在此另起文档说明。

渲染到纹理

渲染到纹理(Render to Texture, RtT)场景没SwapChain,此时要么配置RenderPass.Attachment.format为纹理的格式,或者硬编码。接着Submmit到Queue,流程就结束了,无需且无法调用swap_chain.present()。如果要获取该CommandBuffer的GPU操作结束事件或耗时,添加相应的回调函数给CommandBuffer即可。不扯淡了,无码言卵,码上见真相。

配置RenderPass.Attachment.format

核心流程

let render_pass = {
    // attachment是Render to Texture的关键
    let attachment = pass::Attachment {}
    let subpass = pass::SubpassDesc {}
    let dependency = pass::SubpassDependency {}
}    
复制代码

详细内容:

let render_pass = {
    let attachment = pass::Attachment {
        format: Some(format),
        samples: 1,
        ops: pass::AttachmentOps::new(
            pass::AttachmentLoadOp::Clear,
            pass::AttachmentStoreOp::Store,
        ),
        stencil_ops: pass::AttachmentOps::DONT_CARE,
        layouts: image::Layout::Undefined..image::Layout::Present,
    };

    let subpass = pass::SubpassDesc {
        colors: &[(0, image::Layout::ColorAttachmentOptimal)],
        depth_stencil: None,
        inputs: &[],
        resolves: &[],
        preserves: &[],
    };

    let dependency = pass::SubpassDependency {
        passes: pass::SubpassRef::External..pass::SubpassRef::Pass(0),
        stages: PipelineStage::COLOR_ATTACHMENT_OUTPUT..PipelineStage::COLOR_ATTACHMENT_OUTPUT,
        accesses: image::Access::empty()
            ..(image::Access::COLOR_ATTACHMENT_READ | image::Access::COLOR_ATTACHMENT_WRITE),
    };

    device
        .create_render_pass(&[attachment], &[subpass], &[dependency])
        .expect("Can't create render pass")
}; // End: RenderPass init
复制代码

提交到CommandQueue

和渲染到视图一样提交即可,少一步swap_chain.present()。如何验证到这步就够了呢?看源码是一种方案,如果是Metal,用Xcode Capture GPU Frame也是一种方案。如何对Cargo项目进行Xcode Capture GPU Frame?参考我另一个文档:Xcode External Build System 失败的 Capture GPU Frame 经历、解决方案与复盘,血泪教训。

// ... lots of previous stuff
queue.submit(submission, Some(&mut frame_fence)); // 大功告成
device.wait_for_fence(&frame_fence, !0);
复制代码

HAL相关数据结构定义

FrameSync定义

/// Synchronization primitives which will be signalled once a frame got retrieved.
///
/// The semaphore or fence _must_ be unsignalled.
pub enum FrameSync<'a, B: Backend> {
    /// Semaphore used for synchronization.
    ///
    /// Will be signaled once the frame backbuffer is available.
    Semaphore(&'a B::Semaphore),

    /// Fence used for synchronization.
    ///
    /// Will be signaled once the frame backbuffer is available.
    Fence(&'a B::Fence),
}
复制代码

CommandBuffer(关键数据结构)

/// A strongly-typed command buffer that will only implement methods that are valid for the operations
/// it supports.
pub struct CommandBuffer<'a, B: Backend, C, S: Shot = OneShot, L: Level = Primary> {
    pub(crate) raw: &'a mut B::CommandBuffer,
    pub(crate) _marker: PhantomData<(C, S, L)>
}
复制代码

Submit

/// Thread-safe finished command buffer for submission.
pub struct Submit<B: Backend, C, S, L>(pub(crate) B::CommandBuffer, pub(crate) PhantomData<(C, S, L)>);
impl<B: Backend, C, S, L> Submit<B, C, S, L> {
    fn new(buffer: B::CommandBuffer) -> Self {
        Submit(buffer, PhantomData)
    }
}
unsafe impl<B: Backend, C, S, L> Send for Submit<B, C, S, L> {}
复制代码

Submission

/// Submission information for a command queue, generic over a particular
/// backend and a particular queue type.
pub struct Submission<'a, B: Backend, C> {
    cmd_buffers: SmallVec<[Cow<'a, B::CommandBuffer>; 16]>,
    wait_semaphores: SmallVec<[(&'a B::Semaphore, pso::PipelineStage); 16]>,
    signal_semaphores: SmallVec<[&'a B::Semaphore; 16]>,
    marker: PhantomData<C>,
}

/// submit接口 /
/// Append a new list of finished command buffers to this submission.
///
/// All submits for this call must be of the same type.
/// Submission will be automatically promoted to to the minimum required capability
/// to hold all passed submits.
pub fn submit<I, K>(mut self, submits: I) -> Submission<'a, B, <(C, K) as Upper>::Result>
where
    I: IntoIterator,
    I::Item: Submittable<'a, B, K, Primary>,
    (C, K): Upper
{
    self.cmd_buffers.extend(submits.into_iter().map(
        |s| { unsafe { s.into_buffer() } }
    ));
    Submission {
        cmd_buffers: self.cmd_buffers,
        wait_semaphores: self.wait_semaphores,
        signal_semaphores: self.signal_semaphores,
        marker: PhantomData,
    }
}
复制代码

Vulkan与Metal的CommandBuffer复用与性能等问题讨论

根据实践,持续更新。

CommandBuffer重用

Metal的CommandBuffer一旦Commit到Queue,则不能再次使用。Vulkan可多次提交。

After a command buffer has been committed for execution, the only valid operations on the command buffer are to wait for it to be scheduled or completed (using synchronous calls or handler blocks) and to check the status of the command buffer execution. When used, scheduled and completed handlers are blocks that are invoked in execution order. These handlers should perform quickly; if expensive or blocking work needs to be scheduled, defer that work to another thread.

In a multithreaded app, it’s advisable to break your overall task into subtasks that can be encoded separately. Create a command buffer for each chunk of work, then call the enqueue() method on these command buffer objects to establish the order of execution. Fill each buffer object (using multiple threads) and commit them. The command queue automatically schedules and executes these command buffers as they become available.

developer.apple.com/documentati…

提交到队列的函数名区别

提交CommandBuffer到Queue,Metal和Vulkan用了不同的单词。Metal = commit(),Vulkan = submit()