论文:https://cs.nyu.edu/~silberman/papers/indoor_seg_support.pdf

数据集地址:https://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html

数据集包含以下几个部分:

  1. 有标签的:视频数据的一个子集,1449张处理好的有标签和补全深度的。伴随着密集多标签。此数据也已经被预处理,以填补缺少的深度标签。
  2. 原始数据集:利用Kinect测得的原始的RGB、Depth、加速度数据。
  3. 工具箱:用于操作数据和标签的有用的工具。
  4. 用于评估的训练和测试部分。

有标签的数据集


NYU-Depth数据集_数据集

RGB图像(左) 预处理后的深度(中) 图像的一组标签(右)

 

有标签的数据集是原始数据集的子集。它是由成对的RGB和深度帧同步组成的,并且每个图像都有多个标签。除了加上标签的深度地图之外(右),还包含了一组预处理的深度地图(中),该预处理的深度地图已经利用Colorization方法把确实数据填充了。与原始数据集不同,有标签数据集是一个matlab中的mat文件,有如下几个变量:

  • accelData – Nx4 matrix of accelerometer values indicated when each frame was taken. The columns contain the roll, yaw, pitch and tilt angle of the device.
  • depths – HxWxN matrix of depth maps where H and W are the height and width, respectively and N is the number of images. The values of the depth elements are in meters.
  • images – HxWx3xN matrix of RGB images where H and W are the height and width, respectively, and N is the number of images.
  • labels – HxWxN matrix of label masks where H and W are the height and width, respectively and N is the number of images. The labels range from 1..C where C is the total number of classes. If a pixel’s label value is 0, then that pixel is ‘unlabeled’.
  • names – Cx1 cell array of the english names of each class.
  • namesToIds – map from english label names to IDs (with C key-value pairs)
  • rawDepths – HxWxN matrix of depth maps where H and W are the height and width, respectively, and N is the number of images. These depth maps are the raw output from the kinect.
  • scenes – Cx1 cell array of the name of the scene from which each image was taken

原始数据集


NYU-Depth数据集_数据集_02

RGB图像(左) 深度图像(右)

 

原始数据集包含利用Kinect得到的原始的图像以及加速度存储(accelerometer dumps)。RGB相机和深度相机采样率是20至30 FPS(帧/秒)(随着时间会有变化)。因为帧不是同步的,所以在每个文件夹里都包含了每个rgb图像、深度图以及加速计的时间戳。t。数据集分为不同的文件夹,这对应于每个场景的拍摄,都被命名为‘living_room_0012′ 或者 ‘office_0014′。文件结构如下所示:

/
 ../bedroom_0001/
 ../bedroom_0001/a-1294886363.011060-3164794231.dump
 ../bedroom_0001/a-1294886363.016801-3164794231.dump
 ...
 ../bedroom_0001/d-1294886362.665769-3143255701.pgm
 ../bedroom_0001/d-1294886362.793814-3151264321.pgm
 ...
 ../bedroom_0001/r-1294886362.238178-3118787619.ppm
 ../bedroom_0001/r-1294886362.814111-3152792506.ppm
  • a-开头的是 accelerometer dumps
  • d-开头的是深度图像
  • r-开头的是原始rgb图像
    Note:由于原始数据集没有经过预处理,所以原始深度图像需要投影到RGB坐标系使得其对齐。

工具箱

  • demo_synched_projected_frames.m – Demos synchronization of the raw rgb and depth images as well as alignment of the rgb and raw depth.
  • eval_seg.m – Evaluates the predicted segmentation against the ground truth label map.
  • get_projected_depth.m – Projects the raw depth image onto the rgb image plane. This file contains the calibration parameters that are specific to the kinect used to gather the data.
  • get_accel_data.m – Extracts the accelerometer data from the binary files in the raw dataset.
  • get_projection_mask.m – Gets a mask for the images that crops areas whose depth had to be inferred, rather than directly projected from the Kinect signal.
  • get_synched_frames.m – Returns a list of synchronized rgb and depth frames from a given scene in the raw dataset.
  • get_train_test_split.m – Splits the data in a way that ensures that images from the same scene are not found in both train and test sets.