前言

本文主要整理总结face landmark有关的数据集。

Face 2D Keypoint ‒ MMPose 1.1.0 documentation

https://github.com/open-mmlab/mmpose/blob/main/docs/en/dataset_zoo/2d_face_keypoint.md

关键特征点个数有5/15/68/98/106...

数据集

300W dataset

68个点,Indoor和Outdoor目录各300个人脸及其68个点的标注文件;

数据集下载

https://ibug.doc.ic.ac.uk/download/annotations/300w.zip.001

https://ibug.doc.ic.ac.uk/download/annotations/300w.zip.002

https://ibug.doc.ic.ac.uk/download/annotations/300w.zip.003

https://ibug.doc.ic.ac.uk/download/annotations/300w.zip.004

其他数据集下载链接

xm2vts: 只有2360个68点的标注文件

https://ibug.doc.ic.ac.uk/download/annotations/xm2vts.zip

frgc:只有4950个68点的标注文件

https://ibug.doc.ic.ac.uk/download/annotations/frgc.zip

lfpw:

测试和训练集共有2070/2=1035张图像及其68个点标注文件;

https://ibug.doc.ic.ac.uk/download/annotations/lfpw.zip

helen:

测试和训练集共有4660/2=2330个人脸及其68个点标注文件;

https://ibug.doc.ic.ac.uk/download/annotations/helen.zip

AFW:

674/2=337个人脸及其68个点标注文件;

https://ibug.doc.ic.ac.uk/download/annotations/afw.zip

ibug:

270/2=135个人脸及其68个点标注文件;

https://ibug.doc.ic.ac.uk/download/annotations/ibug.zip

WFLW dataset

Look at Boundary: A Boundary-Aware Face Alignment Algorithm

WFLW Dataset - Machine Learning Datasets

这是由商汤提供的,98个关键点,还包括occlusion, position, make-up, lighting, blur, and expression等人脸属性;训练集7500images(list_98pt_rect_attr_train.txt),测试集2500images(list_98pt_rect_attr_test.txt);

标签格式:196+4+6+1=207;

coordinates of 98 landmarks (196) + coordinates of upper left corner and lower right corner of detection rectangle (4) + attributes annotations (6) + image name (1)
  x0 y0 ... x97 y97 x_min_rect y_min_rect x_max_rect y_max_rect pose expression illumination make-up occlusion blur image_name

标注图像

【CV数据集总结】face_landmark_dataset总结_特征点

 为什么出现两个wflw_annotations.tar文件呢?

Lapa dataset

GitHub - JDAI-CV/lapa-dataset: A large-scale dataset for face parsing (AAAI2020) 106points;

GitHub - lucia123/lapa-dataset: A large-scale dataset for face parsing (AAAI2020)

A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing

AAAI2020,京东AI-CV研究团队-LaPa-Dataset。

公开22,000多张人脸图像数据集,在表情、姿势和遮挡方面有着丰富的变化,每张LaPa图像都提供了11类像素级标签图和106点特征点,主要用于人脸解析。

LaPa数据集test2000+train18168+val2000,其中labels是像素级标签图,landmarks是106点landmarks标签;能够有效帮助降低在大姿态测试集上的loss;

LaPa-Dataset:京东人脸106特征点数据集_OneboWang的博客

LaPa简介

  1. 京东训练hourglassnet作为半自动人脸标注工具;
  2. 人工后期调整少量的难例样本;

数据集组成

  1. 提供原始图像、像素级标签图以及106点landmark标签;
  2. landmark顺序需要自行调整,没有给出具体对应的脸部位置(可视化);
  3. 像素级标签用于人脸解析,文章里在公开此数据集之外,还用此数据集进行人脸的语义分割和解析模型,因此处与日常工作无关,感兴趣的同学可以自行阅读人脸解析算法部分;

数据集目录

./LaPa
├── test
│   ├── images
│   ├── labels
│   └── landmarks
├── train
│   ├── images
│   ├── labels
│   └── landmarks
└── val
    ├── images
    ├── labels
    └── landmarks

View Code

JD-landmark

Grand Challenge of 106-p Facial Landmark Localization

https://sites.google.com/view/hailin-shi

Grand Challenge of 106-Point Facial Landmark Localization

106个关键特征点;

需要注意的是每个图仅仅标注了一张人脸关键点。需要注意的坑是其中#75和#105重合,#84和#106重合。

【CV数据集总结】face_landmark_dataset总结_ci_02

合并WFLW和JD-landmark两个数据集为98关键点数据集,去除JD-landmark中56/66/57/65/58/64/75/84点。

合并后数据集链接: https://pan.baidu.com/s/179crM6svNbK3w28Z0ycBHg 提取码: 7guh

 

Kaggle dataset

Facial Keypoints Detection 4+15points;

P1_Facial_Keypoints

GitHub - udacity/P1_Facial_Keypoints: First project for CVND: facial keypoint detection.

数据集在P1_Facial_Keypoints repo, in the subdirectory data

landmarks以csv文件存储,每行内容是imgname+68*2共137列数据,也是68个关键点;

https://aistudio.baidu.com/aistudio/projectdetail/1487972

基于空间注意力机制SAM的GoogLeNet实现人脸关键点检测并自动添加表情贴纸_Mr.郑先生_的博客-CSDN博客

分析总结

68个关键点的数据集:

300w(600) / lfpw(1035) / helen(2330) / AFW(337) / ibug(135);

600+1035+2330+337+135=4437;

300w_name,包含indoor和outdoor;

lfpw_train/test_name,训练集和测试集的名字重名,需要区分开,直接使用train/test或者0/1指定;

helen,train/test数据集,应该没有重名的,可以直接使用,需要验证注意,因为名字没有规律;

afw,与helen相似,不知道二者有没有重复的;

ibug,有规律,但是不知道会不会和lfpw重复;

故,最好都加上原数据集的名称,然后组成新的数据集,再分割train/valid;

gen68kp.sh

#!/bin/sh
'''
generate 68 keypoints face landmark dataset from 300w/lfpw/helen/afw/ibug dataset.
300w  01_Indoor/02_Outdoor 300w_name
lfpw  trainset/testset     lfpw0/1_name
helen trainset/testset     helen_name
afw                        afw_name
ibug                       ibug_name
'''

script_path="$(pwd)"
kp68path="$script_path/68kp"
# 300w
for file in $script_path/300w/300w/01_Indoor/*; do
    echo $file
    base=$(basename $file)
    newfile=$kp68path/"300w_"$base
    cp $file $newfile
done
for file in $script_path/300w/300w/02_Outdoor/*; do
    echo $file
    base=$(basename $file)
    newfile=$kp68path/"300w_"$base
    cp $file $newfile
done
# lfpw
for file in $script_path/lfpw/trainset/*; do
    echo $file
    base=$(basename $file)
    newfile=$kp68path/"lfpw0_"$base
    cp $file $newfile
done
for file in $script_path/lfpw/testset/*; do
    echo $file
    base=$(basename $file)
    newfile=$kp68path/"lfpw1_"$base
    cp $file $newfile
done
# helen/afw/ibug
# jpg ---> png
for file in $kp68path/*.jpg; do
# for file in $script_path/aaa/*jpg; do
    pngname=${file%.jpg}.png
    # convert "$file" "${file%.jpg}.png"
    ffmpeg -pix_fmt rgb24 -i $file -pix_fmt rgb24 $pngname
    rm $file
done

# split dataset to train/valid with png/pts.
'''
.dataset68
├── train
│   ├── png
│   └── pts
└── valid
    ├── png
    └── pts
'''
dataset_path="$script_path/dataset68"
cd $script_path
find $script_path/68kp/ -name "*.png" > $script_path/image.txt
rm -r $dataset_path
mkdir $dataset_path
cd $dataset_path
mkdir train valid
cd train
mkdir png pts
cd ../valid
mkdir png pts
cd $script_path
python genpath.py  # 分割数据集

View Code

98个关键点的数据集:

wflw(10000) / JD-landmarks-98(unavaiable?)

合并WFLW和JD-landmark两个数据集为98关键点数据集,去除JD-landmark中56/66/57/65/58/64/75/84点。

二者合并之后的数据集,合并后数据集链接: https://pan.baidu.com/s/179crM6svNbK3w28Z0ycBHg 提取码: 7guh

共有25393个数据图像,也就是JD-landmarks-98数据集共有15393个数据;

106个关键点的数据集:

JD-landmark(?) / Lapa(22000)

可以先使用68个点的进行训练,后续再训练98/106的,需要预处理数据集;

参考

  1. Face 2D Keypoint ‒ MMPose 1.1.0 documentation
  2. mmlab_2d_face_keypoint
  3. Look at Boundary: A Boundary-Aware Face Alignment Algorithm
  4. GitHub - lucia123/lapa-dataset: A large-scale dataset for face parsing (AAAI2020)
  5. GitHub - JDAI-CV/lapa-dataset: A large-scale dataset for face parsing (AAAI2020)
  6. Grand Challenge of 106-p Facial Landmark Localization
  7. Facial Keypoints Detection