3D目标检测数据集——kitti数据集

KITTI官网网址：The KITTI Vision Benchmark Suite
下载数据集：The KITTI Vision Benchmark Suite
KITTI数据集论文：CMSY9
github可视化代码：GitHub - kuixu/kitti_object_vis: KITTI Object Visualization (Birdview, Volumetric LiDar point cloud )

kitti数据集

简介

KITTI数据集是由德国卡尔斯鲁厄理工学院 Karlsruhe Institute of Technology (KIT) 和美国芝加哥丰田技术研究院 Toyota Technological Institute at Chicago (TTI-C) 于2012年联合创办，是目前国际上最为常用的自动驾驶场景下的计算机视觉算法评测数据集之一。

该数据集用于评测立体图像(stereo)，光流(optical flow)，视觉测距(visual odometry)，3D物体检测(object detection)和3D跟踪(tracking)等计算机视觉技术在车载环境下的性能。

KITTI数据集包含市区、乡村和高速公路等场景采集的真实图像数据，每张图像中最多达15辆车和30个行人，还有各种程度的遮挡与截断。 KITTI数据集针对3D目标检测任务提供了14999张图像以及对应的点云，其中7481组用于训练，7518组用于测试，针对场景中的汽车、行人、自行车三类物体进行标注，共计80256个标记对象。

传感器布置图

KITTI数据集采集车的传感器布置平面如上图所示，车辆装配有2个灰度摄像机（cam0、cam1），2个彩色摄像机（cam2、cam3），一个Velodyne 64线3D激光雷达，4个光学镜头，以及1个GPS导航系统，在上图中使用了红色标记。

2个一百四十万像素的PointGray Flea2灰度相机
2个一百四十万像素的PointGray Flea2彩色相机
1个64线的Velodyne激光雷达，10Hz，角分辨率为0.09度，每秒约一百三十万个点，水平视场360°，垂直视场26.8°，至多120米的距离范围
4个Edmund的光学镜片，水平视角约为90°，垂直视角约为35°
1个OXTS RT 3003的惯性导航系统(GPS/IMU)，6轴，100Hz，分别率为0.02米，0.1°

采集车以及坐标系

下图中蓝色的坐标系表示激光点云坐标系；

红色的坐标系表示相机坐标系；

绿色的坐标系标红惯导坐标系；

数据结构

-- training
|-- calib
|-- image_2
|-- label_2
`-- velodyne

calib

calib文件是相机、雷达、惯导等传感器的校正参数；如下training/calib/000001.txt，

P0: 7.070493000000e+02 0.000000000000e+00 6.040814000000e+02 0.000000000000e+00 0.000000000000e+00 7.070493000000e+02 1.805066000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.070493000000e+02 0.000000000000e+00 6.040814000000e+02 -3.797842000000e+02 0.000000000000e+00 7.070493000000e+02 1.805066000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.070493000000e+02 0.000000000000e+00 6.040814000000e+02 4.575831000000e+01 0.000000000000e+00 7.070493000000e+02 1.805066000000e+02 -3.454157000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 4.981016000000e-03
P3: 7.070493000000e+02 0.000000000000e+00 6.040814000000e+02 -3.341081000000e+02 0.000000000000e+00 7.070493000000e+02 1.805066000000e+02 2.330660000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 3.201153000000e-03
R0_rect: 9.999128000000e-01 1.009263000000e-02 -8.511932000000e-03 -1.012729000000e-02 9.999406000000e-01 -4.037671000000e-03 8.470675000000e-03 4.123522000000e-03 9.999556000000e-01
Tr_velo_to_cam: 6.927964000000e-03 -9.999722000000e-01 -2.757829000000e-03 -2.457729000000e-02 -1.162982000000e-03 2.749836000000e-03 -9.999955000000e-01 -6.127237000000e-02 9.999753000000e-01 6.931141000000e-03 -1.143899000000e-03 -3.321029000000e-01
Tr_imu_to_velo: 9.999976000000e-01 7.553071000000e-04 -2.035826000000e-03 -8.086759000000e-01 -7.854027000000e-04 9.998898000000e-01 -1.482298000000e-02 3.195559000000e-01 2.024406000000e-03 1.482454000000e-02 9.998881000000e-01 -7.997231000000e-01

P0-3：并不是cam0-3的相机内参；而是参考相机0到相机i的外参变换矩阵（各相机之间无旋转，且只在X方向有平移）与相机i的内参矩阵的乘积；

R0_rect：R0_rect表示的是矩形校正矩阵(rectifying rotation matrix)、即0号相机坐标系到矫正坐标系的旋转矩阵；矩形校正是立体视觉计算中常用的一步，用于使得成对的立体相机的成像平面对齐，这样可以简化如立体匹配这样的后续处理步骤。实质上，它通过一个旋转将两个相机成像平面调整为共面，且使得它们的光轴平行。R0_rect 矩阵用于将点从未校正的相机坐标系变换到校正后的坐标系。这一步是使用其他有关投影数据(如相机内部矩阵、深度信息等)之前必须进行的步骤。

Tr_velo_to_cam：从雷达到相机的旋转平移矩阵；

Tr_imu_to_velo：从惯导到相机的旋转平移矩阵；

将一个三维点投影到图像上如P2图像上，计算公式如下：

image_2：2D图像数据
label_2：Ground Truth

Car 0.00 0 -1.67 642.24 178.50 680.14 208.68 1.38 1.49 3.32 2.41 1.66 34.98 -1.60

其中，

第4个字段：观察角度: 表示在相机坐标系下，以相机原点为中心，以相机原点到物体中心的连线为半径，将物体绕相机y轴绕至相机z轴，此时物体方向与相机x轴之间的夹角。

第7个字段3d标注的坐标是在相机坐标系下的目标3D框底面中心坐标；

第8个字段表示3D物体的空间方向，表示在相机坐标系下，物体的全局方向角（物体前进方向与相机坐标系x轴的夹角）；

velodyne：

velodyne文件是激光雷达的测量数据，以浮点二进制文件格式存储，每行包含8个数据，每个数据由四位十六进制数表示（浮点数），每个数据通过空格隔开。一个点云数据由四个浮点数数据构成，分别表示点云的x、y、z、r（强度 or 反射值）。

可视化

测试用例在kitti_object.py 的基础上进行了封装，模块较为清晰，可视化测试如下：

import mayavi.mlab as mlab
from kitti_object import kitti_object, show_image_with_boxes, show_lidar_on_image, \show_lidar_with_boxes, show_lidar_topview_with_boxes, get_lidar_in_image_fov, \show_lidar_with_depth
from viz_util import draw_lidar
import cv2
from PIL import Image
import timeclass visualization:# data_idx: determine data_idxdef __init__(self, root_dir='kitti_object_vis-master/data/object', data_idx=2):dataset = kitti_object(root_dir=root_dir)# Load data from datasetobjects = dataset.get_label_objects(data_idx)print("There are {} objects.".format(len(objects)))img = dataset.get_image(data_idx)img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)img_height, img_width, img_channel = img.shapepc_velo = dataset.get_lidar(data_idx)[:, 0:3]  # 显示bev视图需要改动为[:, 0:4]calib = dataset.get_calibration(data_idx)# init the paramsself.objects = objectsself.img = imgself.img_height = img_heightself.img_width = img_widthself.img_channel = img_channelself.pc_velo = pc_veloself.calib = calib# 1. 图像显示def show_image(self):Image.fromarray(self.img).show()cv2.waitKey(0)# 2. 图片上绘制2D bboxdef show_image_with_2d_boxes(self):show_image_with_boxes(self.img, self.objects, self.calib, show3d=False)cv2.waitKey(0)# 3. 图片上绘制3D bboxdef show_image_with_3d_boxes(self):show_image_with_boxes(self.img, self.objects, self.calib, show3d=True)cv2.waitKey(0)# 4. 图片上绘制Lidar投影def show_image_with_lidar(self):show_lidar_on_image(self.pc_velo, self.img, self.calib, self.img_width, self.img_height)mlab.show()# 5. Lidar绘制3D bboxdef show_lidar_with_3d_boxes(self):show_lidar_with_boxes(self.pc_velo, self.objects, self.calib, True, self.img_width, self.img_height)mlab.show()# 6. Lidar绘制FOV图def show_lidar_with_fov(self):imgfov_pc_velo, pts_2d, fov_inds = get_lidar_in_image_fov(self.pc_velo, self.calib,0, 0, self.img_width, self.img_height, True)draw_lidar(imgfov_pc_velo)mlab.show()# 7. Lidar绘制3D图def show_lidar_with_3dview(self):draw_lidar(self.pc_velo)mlab.show()# 8. Lidar绘制BEV图def show_lidar_with_bev(self):from kitti_util import draw_top_image, lidar_to_toptop_view = lidar_to_top(self.pc_velo)top_image = draw_top_image(top_view)cv2.imshow("top_image", top_image)cv2.waitKey(0)# 9. Lidar绘制BEV图+2D bboxdef show_lidar_with_bev_2d_bbox(self):show_lidar_topview_with_boxes(self.pc_velo, self.objects, self.calib)mlab.show()if __name__ == '__main__':kitti_vis = visualization()kitti_vis.show_image()# kitti_vis.show_image_with_2d_boxes()# kitti_vis.show_image_with_3d_boxes()# kitti_vis.show_image_with_lidar()# kitti_vis.show_lidar_with_3d_boxes()# kitti_vis.show_lidar_with_fov()# kitti_vis.show_lidar_with_3dview()# kitti_vis.show_lidar_with_bev()# kitti_vis.show_lidar_with_bev_2d_bbox()# print('...')# cv2.waitKey(0)