【C语言】关于结构体最后的长度为0或1数组的思考

原创

itcast0 2022-03-16 17:28:28 博主文章分类：C和C++ ©著作权

文章标签 C语言数组为0 数组长度为0 数组柔性数组 文章分类 代码人生

©著作权归作者所有：来自51CTO博客作者itcast0的原创作品，请联系作者获取转载授权，否则将追究法律责任

需要引起注意的：ISO/IEC 9899-1999里面，这么写是非法的，这个仅仅是GNU C的扩展，gcc可以允许这一语法现象的存在。但最新的C/C++不知道是否可以，我没有测试过。（C99允许。微软的VS系列报一个WARNING，即非常的标准扩展。）

结构体最后使用0或1的长度数组的原因，主要是为了方便的管理内存缓冲区，如果你直接使用指针而不使用数组，那么，你在分配内存缓冲区时，就必须分配结构体一次，然后再分配结构体内的指针一次，（而此时分配的内存已经与结构体的内存不连续了，所以要分别管理即申请和释放）而如果使用数组，那么只需要一次就可以全部分配出来，（见下面的例子），反过来，释放时也是一样，使用数组，一次释放，使用指针，得先释放结构体内的指针，再释放结构体。还不能颠倒次序。

其实就是分配一段连续的的内存，减少内存的碎片化。

Linux目录中

[root@deng /]# vim usr/include/linux/if_pppox.h

在Linux系统里，/usr/include/linux/if_pppox.h里面有这样一个结构：

struct pppoe_tag {

__u16 tag_type;

__u16 tag_len;

char tag_data[0];

} __attribute ((packed));

最后一个成员为可变长的数组，对于TLV（Type-Length-Value）形式的结构，或者其他需要变长度的结构体，用这种方式定义最好。使用起来非常方便，创建时，malloc一段结构体大小加上可变长数据长度的空间给它，可变长部分可按数组的方式访问，释放时，直接把整个结构体free掉就可以了。例子如下：

struct pppoe_tag *sample_tag;

__u16 sample_tag_len = 10;

sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag)+sizeof(char)*sample_tag_len);

sample_tag->tag_type = 0xffff;

sample_tag->tag_len = sample_tag_len;

sample_tag->tag_data[0]=....

...

释放时，

free(sample_tag)

是否可以用 char *tag_data 代替呢？其实它和 char *tag_data 是有很大的区别，为了说明这个问题，我写了以下的程序：

例1：test_size.c

10 struct tag1

20 {

30 int a;

40 int b;

50 }__attribute ((packed));

70 struct tag2

80 {

90 int a;

100 int b;

110 char *c;

120 }__attribute ((packed));

130

140 struct tag3

150 {

160 int a;

170 int b;

180 char c[0];
190 }__attribute ((packed));

200

210 struct tag4

220 {

230 int a;

240 int b;

250 char c[1];
260 }__attribute ((packed));

270

280 int main()

290 {

300 struct tag2 l_tag2;

310 struct tag3 l_tag3;

320 struct tag4 l_tag4;

330

340 memset(&l_tag2,0,sizeof(struct tag2));

350 memset(&l_tag3,0,sizeof(struct tag3));

360 memset(&l_tag4,0,sizeof(struct tag4));

370

380 printf("size of tag1 = %d\n",sizeof(struct tag1));

390 printf("size of tag2 = %d\n",sizeof(struct tag2));

400 printf("size of tag3 = %d\n",sizeof(struct tag3));

410

420 printf("l_tag2 = %p,&l_tag2.c = %p,l_tag2.c = %p\n",&l_tag2,&l_tag2.c,l_tag2.c);

430 printf("l_tag3 = %p,l_tag3.c = %p\n",&l_tag3,l_tag3.c);

440 printf("l_tag4 = %p,l_tag4.c = %p\n",&l_tag4,l_tag4.c);

450 exit(0);

460 }

__attribute ((packed)) 是为了强制不进行4字节对齐，这样比较容易说明问题。

程序的运行结果如下：

size of tag1 = 8

size of tag2 = 12

size of tag3 = 8

size of tag4 = 9

l_tag2 = 0xbffffad0,&l_tag2.c = 0xbffffad8,l_tag2.c = (nil)

l_tag3 = 0xbffffac8,l_tag3.c = 0xbffffad0

l_tag4 = 0xbffffabc,l_tag4.c = 0xbffffac4

从上面程序和运行结果可以看出：tag1本身包括两个32位整数，所以占了8个字节的空间。tag2包括了两个32位的整数，外加一个char *的指针，所以占了12个字节。tag3才是真正看出char c[0]和char *c的区别，char c[0]中的c并不是指针，是一个偏移量，这个偏移量指向的是a、b后面紧接着的空间，所以它其实并不占用任何空间。tag4更加补充说明了这一点。所以，上面的struct pppoe_tag的最后一个成员如果用char *tag_data定义，除了会占用多4个字节的指针变量外，用起来会比较不方便：

方法一，创建时，可以首先为struct pppoe_tag分配一块内存，再为tag_data分配内存，这样在释放时，要首先释放tag_data占用的内存，再释放pppoe_tag占用的内存；

方法二，创建时，直接为struct pppoe_tag分配一块struct pppoe_tag大小加上tag_data的内存，从例一的420行可以看出，tag_data的内容要进行初始化，要让tag_data指向strct pppoe_tag后面的内存。

struct pppoe_tag {

__u16 tag_type;

__u16 tag_len;

char *tag_data;

} __attribute ((packed));

struct pppoe_tag *sample_tag;

__u16 sample_tag_len = 10;

方法一：

sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag));

sample_tag->tag_len = sample_tag_len;

sample_tag->tag_data = malloc(sizeof(char)*sample_tag_len);

sample_tag->tag_data[0]=...

释放时：

free(sample_tag->tag_data);

free(sample_tag);

方法二：

sample_tag = (struct pppoe_tag *)malloc(sizeof(struct pppoe_tag)+sizeof(char)*sample_tag_len);

sample_tag->tag_len = sample_tag_len;

sample_tag->tag_data = ((char *)sample_tag)+sizeof(struct pppoe_tag);

sample_tag->tag_data[0]=...

释放时：

free(sample_tag);

所以无论使用那种方法，都没有char tag_data[0]这样的定义来得方便。

讲了这么多，其实本质上涉及到的是一个C语言里面的数组和指针的区别问题（也就是我们提到的内存管理问题，数组分配的是在结构体空间地址后一段连续的空间，而指针是在一个随机的空间分配的一段连续空间）。char a[1]里面的a和char *b的b相同吗？《Programming Abstractions in C》（Roberts, E. S.，机械工业出版社，2004.6）82页里面说：“arr is defined to be identical to &arr[0]”。也就是说，char a[1]里面的a实际是一个常量，等于&a[0]。而char *b是有一个实实在在的指针变量b存在。所以，a=b是不允许的，而b=a是允许的。两种变量都支持下标式的访问，那么对于a[0]和b[0]本质上是否有区别？我们可以通过一个例子来说明。

例二：

10 #include <stdio.h>

20 #include <stdlib.h>

40 int main()

50 {

60 char a[10];

70 char *b;

90 a[2]=0xfe;

100 b[2]=0xfe;

110 exit(0);

120 }

编译后，用objdump可以看到它的汇编：

080483f0 <main>:

80483f0: 55 push %ebp

80483f1: 89 e5 mov %esp,%ebp

80483f3: 83 ec 18 sub $0x18,%esp

80483f6: c6 45 f6 fe movb $0xfe,0xfffffff6(%ebp)

80483fa: 8b 45 f0 mov 0xfffffff0(%ebp),%eax

80483fd: 83 c0 02 add $0x2,%eax

8048400: c6 00 fe movb $0xfe,(%eax)

8048403: 83 c4 f4 add $0xfffffff4,%esp

8048406: 6a 00 push $0x0

8048408: e8 f3 fe ff ff call 8048300 <_init+0x68>

804840d: 83 c4 10 add $0x10,%esp

8048410: c9 leave

8048411: c3 ret

8048412: 8d b4 26 00 00 00 00 lea 0x0(%esi,1),%esi

8048419: 8d bc 27 00 00 00 00 lea 0x0(%edi,1),%edi

可以看出，a[2]＝0xfe是直接寻址，直接将0xfe写入&a[0]+2的地址，而b[2]=0xfe是间接寻址，先将b的内容（地址）拿出来，加2，再0xfe写入计算出来的地址。所以a[0]和b[0]本质上是不同的。

但当数组作为参数时，和指针就没有区别了。
int do1(char a[],int len);

int do2(char *a,int len);

这两个函数中的a并无任何区别。都是实实在在存在的指针变量。

顺便再说一下，对于struct pppoe_tag的最后一个成员的定义是char tag_data[0]，某些编译器不支持长度为0的数组的定义，在这种情况下，只能将它定义成char tag_data[1]，使用方法相同。

在openoffice的源代码中看到如下数据结构，是一个unicode字符串结构，他的最后就用长度为1数组，可能是为了兼容或者跨编译器。

typedef struct _rtl_uString

{

sal_Int32 refCount;

sal_Int32 length;

sal_Unicode buffer[1];
} rtl_uString;

这是不定长字符串。大概意思是这样：

rtl_uString * str = malloc(256);

str->length = 256;

str->buffer现在就指向一个长度为256 - 8的缓冲区

总结：通过上面的转载的文章，可以清晰的发现，这种方法的优势其实就是为了简化内存的管理，我们假设在理想的内存状态下，那么分配的内存空间，可以是按序下来的（当然，实际因为内存碎片等的原因会不同的）我们可以利用最后一个数组的指针直接无间隔的跳到分配的数组缓冲区，这在LINUX下非常常见，在WINDOWS下的我只是在MFC里见过类似的，别的情况下记不清楚了，只记得MFC里的是这么讲的，可以用分配的结构体的指针（this）直接+1（详细的方法请看我的博客:CE分类里的：内存池技术的应用和详细说明），就跳到实际的内存空间，当初也是想了半天，所以说，很多东西看似很复杂，其实都是基础的东西，要好好打实基础，这才是万丈高楼拔地巍峨的前提和保障，学习亦是如是，切忌好高骛远，应该脚踏实地，一步一步的向前走，而且要不时的总结自己的心得和体会，理论和实践不断的相互印证，才能够走得更远，看到更美丽的风景。

【柔性数组结构成员

　　C99中，结构中的最后一个元素允许是未知大小的数组，这就叫做柔性数组成员，但结构中的柔性数组成员前面必须至少一个其他成员。柔性数组成员允许结构中包含一个大小可变的数组。sizeof返回的这种结构大小不包括柔性数组的内存。包含柔性数组成员的结构用malloc ()函数进行内存的动态分配，并且分配的内存应该大于结构的大小，以适应柔性数组的预期大小。】

C语言大全，“柔性数组成员”

看看 C99 标准中灵活数组成员：

结构体变长的妙用——0个元素的数组

有时我们需要产生一个结构体，实现了一种可变长度的结构。如何来实现呢？

看这个结构体的定义：

typedef struct st_type

{

int nCnt;

int item[0];

}type_a;

（有些编译器会报错无法编译可以改成：）

typedef struct st_type

{

int nCnt;

int item[];

}type_a;

这样我们就可以定义一个可变长的结构，用sizeof(type_a)得到的只有4，就是sizeof(nCnt)=sizeof(int)那

个0个元素的数组没有占用空间，而后我们可以进行变长操作了。

C语言版：

type_a *p = (type_a*)malloc(sizeof(type_a)+100*sizeof(int));

C++语言版:

type_a *p = (type_a*)new char[sizeof(type_a)+100*sizeof(int)];

这样我们就产生了一个长为100的type_a类型的东西用p->item[n]就能简单地访问可变长元素，原理十分简单

，分配了比sizeof(type_a)多的内存后int item[];就有了其意义了，它指向的是int nCnt;后面的内容，是没

有内存需要的，而在分配时多分配的内存就可以由其来操控，是个十分好用的技巧。

而释放同样简单：

C语言版：

free(p);

C++语言版：

delete []p;

其实这个叫灵活数组成员(fleible array member)C89不支持这种东西，C99把它作为一种特例加入了标准。但

是，C99所支持的是incomplete type，而不是zero array，形同int item[0];这种形式是非法的，C99支持的

形式是形同int item[];只不过有些编译器把int item[0];作为非标准扩展来支持，而且在C99发布之前已经有

了这种非标准扩展了，C99发布之后，有些编译器把两者合而为一。

下面是C99中的相关内容：

6.7.2.1 Structure and union specifiers

As a special case, the last element of a structure with more than one named member may have

an incomplete array type; this is called a flexible array member. With two exceptions, the

flexible array member is ignored. First, the size of the structure shall be equal to the offset

of the last element of an otherwise identical structure that replaces the flexible array member

with an array of unspecified length.106) Second, when a . (or ->) operator has a left operand

that is (a pointer to) a structure with a flexible array member and the right operand names that

member, it behaves as if that member were replaced with the longest array (with the same element

type) that would not make the structure larger than the object being accessed; the offset of the

array shall remain that of the flexible array member, even if this would differ from that of the

replacement array. If this array would have no elements, it behaves as if it had one element but

the behavior is undefined if any attempt is made to access that element or to generate a pointer

one past it.

例如在VC++6里使用两者之一都能通过编译并且完成操作，而会产生warning C4200: nonstandard extension

used : zero-sized array in struct/union的警告消息。

而在DEVCPP里两者同样可以使用，并且不会有警告消息

参考例子：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

struct a_t{
    char ch1;
    char ch2;
    char str[0];
};

struct a1_t{
    char ch1;
    char ch2;
};

int main(void)
{

    printf("sizeof(a) = %d\n", sizeof(struct a_t));
    printf("sizeof(a1) = %d\n", sizeof(struct a1_t));

    struct a_t *a = (void *)malloc(sizeof(struct a_t) + 10);
    a->ch1 = 96; 
    a->ch2 = 97; 
    a->ch3 = 98; 

    memset(a->str, 0, 10);
    memcpy(a->str, "hello", 5); 

    printf("====> %s\n", a->str);
    free(a);
    return 0;
}