armv8, dup

报错
/tmp/ccPrZnMZ.s: Assembler messages:
/tmp/ccPrZnMZ.s:111: Error: operand mismatch – `dup v0.4s,x3’
/tmp/ccPrZnMZ.s:111: Info: did you mean this?
/tmp/ccPrZnMZ.s:111: Info: dup v0.4s, w3
/tmp/ccPrZnMZ.s:111: Info: other valid variant(s):
/tmp/ccPrZnMZ.s:111: Info: dup v0.8b, w3
/tmp/ccPrZnMZ.s:111: Info: dup v0.16b, w3
/tmp/ccPrZnMZ.s:111: Info: dup v0.4h, w3
/tmp/ccPrZnMZ.s:111: Info: dup v0.8h, w3
/tmp/ccPrZnMZ.s:111: Info: dup v0.2s, w3
/tmp/ccPrZnMZ.s:111: Info: dup v0.2d, x3

code

void batch_assembly(float* src, float* out, int count, float u, float std, float w, float b)
{
    int i = 10;
    asm volatile(
        "dup    v0.4s, %4           \n"
        "dup    v1.4s, %w4           \n"
        "dup    v2.4s, %w5           \n"
        "dup    v3.4s, %w6           \n"
        "1:                         \n"
        "prfm pldl1keep, [%1, #128] \n"
        "ld1  {v0.4s}, [%1], #16    \n"
        "fabs  v0.4s, v0.4s         \n"
        "subs %2, %2, #4            \n"
        "st1  {v0.4s}, [%0], #16    \n"
        "bgt 1b                     \n "
        :"=r"(out)           // 0, x0 
        :"r"(src),          // 1, x1
        "0"(out),           // 
        "r"(count),         // 3, w2
        "r"(u),             // 4
        "r"(std),
        "r"(w),
        "r"(b)
        :"cc", "memory", "v0", "v1", "v2", "v3"
    );    
}

出错的就是这句"dup v0.4s, %4 \n"

解释

  1. 因为输入参数列表中r(u)表示采用一个寄存器表示u,所以%4等价于某个x寄存器(64位寄存器)
  2. dup vd.4s, rn 第二个参数又不能是64位,只能用32位的寄存器,但是u又是哪一个32位的寄存器呢?报错的提示信息dup v0.4h, w3可以看到是w3寄存器
  3. "dup v0.4s, %4 \n"改为"dup v0.4s, w3 \n", 或者"dup v0.4s, %w4 \n",第二种方式的w4,是输出+输入参数列表的排序, 因为u 从0开始排第4,感觉这里的w就是words的意思,将参数4表示成32位的格式。因为在armv7中直接用%4就可以了。
  4.