最近看了一些博客讲这个评估指标,根据定义写了写程序,简单总结一下。
Ground truth
一般来说目标跟踪的数据集中都有这个文件,如下图所示。
Precision plot
它呢,主要是对坐标进行讨论,而不考虑跟踪框的大小。
通俗来说,它的曲线的纵坐标是精度,横坐标是阈值,设置不同的阈值,就组成了一个曲线。当然,精度越高,就表示这个相对更好一点,一般阈值设定是20个像素点。
个人理解,这里主要看0-20像素,即小的阈值表示距离更近的时候跟踪的好。
数组的定义为(左上角坐标x,y,宽,高)
data=importdata('E:\BaiduNetdiskDownload\OTB100\Tiger1\groundtruth_rect.txt')
%data_test algorithm B
%data_old algorithm A
%data ground truth
sub=data-data_test
subold=data-data_old
%sub =GROUND TRUTH - algorithm B
%subold =GROUND TRUTH - algorithm A
frame=254
interval=51
c(interval)=0;
d(interval)=0;
error_threshold=(0:2:100)
%Count the number of frames that meet the error threshold conditions
for i=1:frame
for j=1:interval
if abs(sub(i,1))<error_threshold(j) & abs(sub(i,2))<error_threshold(j)
c(j)=c(j)+1
end
if abs(subold(i,1))<error_threshold(j) & abs(subold(i,2))<error_threshold(j)
d(j)=d(j)+1
end
end
end
for j=1:interval
c(j)=c(j)/frame
d(j)=d(j)/frame
end
subplot(2,2,1)
plot(error_threshold,c,'r')
xlabel('Location error threshold')
ylabel('Precision')
hold on
plot(error_threshold,d,'b')
subplot(2,2,2)
plot(error_threshold,(c-d),'g')
xlabel('Location error threshold')
ylabel('error')
Success plot
这个指标呢,通常讲的就是重合率(overlap score),直观说就是真实和实际面积的公共区域的程度,用一个指标来评判。
因此,在计算它的时候需要考虑坐标和宽高的所有变量,横坐标为阈值,纵坐标为重合率,一般设定为0.5。那么计算它重合的区域,画个示意图就很容易得出它的重合面积就是交集,并集的话两个矩形相加减去交集就可以。
个人理解,这里看大的阈值,因为大的阈值,重合面积大。
data=importdata('E:\BaiduNetdiskDownload\OTB100\Tiger1\groundtruth_rect.txt')
%data_test algorithm B
%data_old algorithm A
frame=254
interval=51
e(interval)=0
f(interval)=0
%Count the number of frames that meet the overloop threshold conditions
for i=1:frame
%data_old =CAM
x1(i)=data_old(i,1)
y1(i)=data_old(i,2)
w1(i)=data_old(i,3)
h1(i)=data_old(i,4)
%data_test =CAM+KALMAN
x2(i)=data_test(i,1)
y2(i)=data_test(i,2)
w2(i)=data_test(i,3)
h2(i)=data_test(i,4)
%data_old =GROUND TRUTH
x3(i)=data(i,1)
y3(i)=data(i,2)
w3(i)=data(i,3)
h3(i)=data(i,4)
%caculate algorithm A or GROUNDTRUTH
area_or1=(w1(i)-abs(x1(i)-x3(i)))*(h1(i)-abs(y1(i)-y3(i)))
area_and1=w1(i)*h1(i)-area_or1
overlap_score1=area_or1/area_and1
%caculate algorithm B or GROUNDTRUTH
area_or2=(w2(i)-abs(x2(i)-x3(i)))*(h2(i)-abs(y2(i)-y3(i)))
area_and2=w2(i)*h2(i)-area_or2
overlap_score2=area_or2/area_and2
for j=1:interval
if overlap_score1>overlap_threshold(j)
e(j)=e(j)+1
end
if overlap_score2>overlap_threshold(j)
f(j)=f(j)+1
end
end
end
for j=1:interval
e(j)=e(j)/frame
f(j)=f(j)/frame
end
subplot(1,2,1)
plot(overlap_threshold,e,'r')
xlabel('Overlap threshold')
ylabel('Success ')
hold on
plot(overlap_threshold,f,'b')
subplot(1,2,2)
plot(overlap_threshold,(e-f),'g')
xlabel('Overlap error threshold')
ylabel('error')
以上两种常见的评估方式一般都是用ground-truth中目标的位置初始化第一帧,然后运行跟踪算法得到平均精度和成功率。这种方法被称为one-pass evaluation (OPE)。这种方法有2个缺点。一是一个跟踪算法可能对第一帧给定的初始位置比较敏感,在不同位置或者帧初始会造成比较大的影响。二是大多数算法遇到跟踪失败后没有重新初始化的机制。
运行后的结果如下图:
鲁棒性评估
上述的评估办法由很大的局限性,而通过从时间(temporally,从不同帧起始)和空间(spatially,不同的bounding box)上打乱,然后进行评估。可以分为:temporal robustness evaluation (TRE) 和 spatial robustness evaluation (SRE)。
Temporal robustness evaluation
: Each tracking algorithm is evaluated numerous times from different starting frames across an image sequence. In each test, an algorithm is evaluated from a particular starting frame, with the initialization of the corresponding ground-truth object state, until the end of an image sequence. The tracking results of all the tests are averaged to generate the TRE score.
在一个图片/视频序列中,每个跟踪算法从不同的帧作为起始进行追踪(比如分别从第一帧开始进行跟踪,从第十帧开始进行跟踪,从第二十帧开始进行跟踪等),初始化采用的bounding box即为对应帧标注的ground-truth。最后对这些结果取平均值,得到TRE score。
Spatial robustness evaluation
: To evaluate whether a tracking method is sensitive to initialization errors, we generate the object states by slightly shifting or scaling the ground-truth bounding box of a target object. In this work, we use eight spatial shifts (four center shifts and four corner shifts), and four scale variations (see Fig. 2). The amount for shift is 10 percent of the target size, and the scale ratio varies from 80 to 120 percent of the ground truth at the increment of 10 percent. The SRE score is the average of these 12 evaluations.
由于有些算法对初始化时给定的bounding box比较敏感,而目前测评用的ground-truth都是人工标注的,因此可能会对某些跟踪算法产生影响。因此为了评估这些跟踪算法是否对初始化敏感,作者通过将ground-truth轻微的平移和尺度的扩大与缩小来产生bounding box。平移的大小为目标物体大小的10%,尺度变化范围为ground-truth的80%到120%,每10%依次增加。最后取这些结果的平均值作为SRE score。