DETRs with Hybrid Matching（H-DETR）

摘要

Abstract

DETRs

网络结构

三种混合分支的方案

Hybrid-Branch

Hybrid-Rpoch

Hybrid-Layer

二分匹配

消融实验

代码

总结

摘要

DETRs with Hybrid Matching针对DETR一对一匹配导致的正样本训练效率低下，并导致大量查询未被有效利用的问题。提出了一种混合匹配策略，在训练过程中结合原始的一对一匹配分支和辅助的一对多匹配分支。该方法允许每个真实标签与多个查询进行匹配，从而增加了正样本的数量，提高了训练效率。在预测过程中，只使用原始的一对一匹配分支，既保持了DETR端到端的优点和相同的推理效率，同时也提高了模型的精度。该方法被命名为H-DETR，在目标检测、实例分割、全景分割、姿态估计、目标跟踪等，都显示出了有效性，并且能够提升一系列DETR方法的性能。H-DETR在COCO数据集上，相比于Deformable-DETR，提升了1.7%的mAP。

Abstract

DETRs with Hybrid Matching addresses the issue of low positive sample training efficiency caused by one-to-one matching in DETR, which leads to a large number of queries not being effectively utilized. DETRs proposes a hybrid matching strategy that combines the original one-to-one matching branch with an auxiliary one-to-many matching branch during the training process. This method allows each ground truth label to match with multiple queries, thereby increasing the number of positive samples and improving training efficiency. During the inference process, only the original one-to-one matching branch is used, which not only maintains the end-to-end advantages of DETR and the same inference efficiency but also enhances the model's accuracy. This method is named H-DETR and has shown effectiveness in tasks such as object detection, instance segmentation, panoptic segmentation, pose estimation, and object tracking, and can improve the performance of a series of DETR methods. H-DETR has improved the mAP by 1.7% on the COCO dataset compared to Deformable-DETR.

DETRs

论文地址：[2207.13080] DETRs with Hybrid Matching

项目地址：H-DETR

我们可以认为DETRs是一种运用一对多匹配的训练方法，可以广泛用于各种代表性DETR方法，如：Deformable-DETR、PETRv2、PETR、TransTrack等，以克服一对一匹配的缺点并提高训练效率。改进效果如下图所示：

在了解DETRs如何改进之前，我们需要先知道Transformer是如何应用到目标检测任务中，可以查看之前有关DETR的博客。

网络结构

DETRs网络结构图，如下图所示：

首先，输入图像I，DETRs通过主干网络和Transformer编码器提取一系列增强的像素嵌入 $X=\left \{ x_{0},x_{1},\cdots ,x_{N} \right \}$ 。其次，将上述像素嵌入和一组默认的对象查询嵌入 $Q=\left \{ q_{0},q_{1},\cdots ,q_{n} \right \}$ 送入Transformer解码器。第三，DETRs在每个Transformer解码器层之后，使用任务特定的预测头更新对象查询嵌入Q，生成一组独立的预测 $P=\left \{ p_{0},p_{1},\cdots ,p_{n} \right \}$ 。最后，在预测P和真实边界框及标签 $G=\left \{ g_{0},g_{1},\cdots ,g_{m} \right \}$ 之间执行一对一的二分匹配。

DETRs针对于DETR的一对多匹配改进就在查询Q处。

三种混合分支的方案

Hybrid-Branch

一对一查询： $Q=\left \{ q_{1},q_{2},\cdots ,q_{n} \right \}$ ；一对多查询： $\widehat{Q}=\left \{ \widehat{q_{1}},\widehat{q_{2}},\cdots ,\widehat{q_{T}} \right \}$ 。

One-to-one matching

使用L层Transformer解码器处理第一组查询Q，并分别对每个解码器层的输出进行预测。然后，在每一层上对{预测，真实标签}执行二分匹配，损失函数如下：

$L_{one2one}=\sum_{l=1}^{L}L_{Hungarian}(P^{l},G)$

P表示由第 l 层Transformer解码器输出的预测结果。

One-to-many matching

同样，使用L层Transformer解码器处理第二组查询 $\widehat{Q}$ ，并得到 L 组预测结果。为了执行一对多匹配，将真实标签重复K次，得到一个扩展的真实目标集合 $\widehat{G}=\left \{ G^{1},G^{2},\cdots ,G^{K} \right \}$ ， $G^{1}=G^{2}=\cdots =G^{K}=G$ 。并在每一层上对{预测，扩展真实标签}执行二分匹配，损失函数如下：

$L_{one2one}=\sum_{l=1}^{L}L_{Hungarian}(\widehat{P}^{l}, \widehat{G})$

Hybrid-Rpoch

在 $\rho$ 个周期中使用一对多匹配，在 $1-\rho$ 个周期中使用一对一匹配，查询都使用 $\widetilde{Q}=\left \{\widetilde{q}_{1},\widetilde{q}_{2},\cdots ,\widetilde{q}_{M} \right \}$ 。

如Hybrid-Branch可知：

One-to-one matching

$L_{one2one}=\sum_{l=1}^{L}L_{Hungarian}(\widetilde{P}^{l},G)$

One-to-many matching

$L_{one2one}=\sum_{l=1}^{L}L_{Hungarian}(\widetilde{P}^{l}, \widetilde{G})$

Hybrid-Layer

Hybrid-Layer与前两种方法不同之处在于它是将之前的L层Transformer解码器分为两个部分 $L_{1}$ 和 $L_{2}$ 。

One-to-many matching

前 $L_{1}$ 层Transformer解码器输出执行一对多匹配，损失函数如下：

$L_{one2one}=\sum_{l=1}^{L_{1}}L_{Hungarian}(\overline{P}^{l}, \overline{G})$

One-to-one matching

后 $L_{2}$ 层执行Transformer解码器输出执行一对一匹配，损失函数如下：

$L_{one2one}=\sum_{l=1}^{L_{1}+L_{2}}L_{Hungarian}(\overline{P}^{l}, G)$

以上三种方法，图中颜色相同的部分参数共享。

二分匹配

采用二分图匹配的形式与ground truth框进行一对一的匹配，就无需非极大值抑制处理。

假设a、b、c点到达X、Y、Z点分别有着不同的代价，而它们分别到达每一点的代价图称为cost matrix。在scipy中的linear-sum-assignment函数能够计算出最优化匹配，使得abc到达XYZ的总价值最小。

我们可以理解为a、b、c代表着N个预测框，而X、Y、Z代表ground truth框。遍历所有预测框和ground truth框计算cost，得到最终的cost matrix。cost计算公式如下所示：

然后，利用scipy中的linear-sum-assignment函数计算出cost matrix的最优化匹配。这样就实现了预测框和真实框的一对一匹配，没有出现冗余的框。

最后，在将预测框和真实框进行类别预测和框预测的损失计算，即可反向传播优化模型。损失函数公式如下所示：

$L_{Hungarian}(y,\hat{y})=\sum_{i=1}^{N}[-log\hat{p}_{\hat{\sigma (i)}}(c_{i})+1_{(c_{i}\neq \phi )}L_{box}(b_{i},\hat{b}_{\hat{\sigma }}(i))]$

通过以上方法结合一对一匹配方案和一对多匹配方案的优势，其中一对一匹配对于去除NMS是必要的，而一对多匹配则增加了与真实标签匹配的查询数量，提高了训练效率。

消融实验

在COCO验证集上评估，H-Deformable-DETR在COCO验证集上达到了59.4%的AP，超过了DINO-DETR方法，以及其他表现歌更好的方法。如下图所示：

代码

混合匹配损失函数：

骨干网络采用ResNet-50，该骨干网络在ImageNet上预训练，DETRs模型训练PyTorch代码如下：

#-------------------------------------#
#       对数据集进行训练
#-------------------------------------#
import datetime
import os
from functools import partialimport numpy as np
import torch
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.nn as nn
import torch.optim as optim
from torch import nn
from torch.utils.data import DataLoaderfrom nets.detr import DETR
from nets.detr_training import (build_loss, get_lr_scheduler, set_optimizer_lr,weights_init)
from utils.callbacks import EvalCallback, LossHistory
from utils.dataloader import DetrDataset, detr_dataset_collate
from utils.utils import (get_classes, seed_everything, show_config,worker_init_fn)
from utils.utils_fit import fit_one_epochif __name__ == "__main__":#---------------------------------##   Cuda    是否使用Cuda#           没有GPU可以设置成False#---------------------------------#Cuda            = True#----------------------------------------------##   Seed    用于固定随机种子#           使得每次独立训练都可以获得一样的结果#----------------------------------------------#seed            = 11#---------------------------------------------------------------------##   distributed     用于指定是否使用单机多卡分布式运行#                   终端指令仅支持Ubuntu。CUDA_VISIBLE_DEVICES用于在Ubuntu下指定显卡。#                   Windows系统下默认使用DP模式调用所有显卡，不支持DDP。#   DP模式：#       设置            distributed = False#       在终端中输入    CUDA_VISIBLE_DEVICES=0,1 python train.py#   DDP模式：#       设置            distributed = True#       在终端中输入    CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py#---------------------------------------------------------------------#distributed     = False#---------------------------------------------------------------------##   fp16        是否使用混合精度训练#               可减少约一半的显存、需要pytorch1.7.1以上#---------------------------------------------------------------------#fp16            = False#---------------------------------------------------------------------##   classes_path    指向model_data下的txt，与自己训练的数据集相关 #                   训练前一定要修改classes_path，使其对应自己的数据集#---------------------------------------------------------------------#classes_path    = 'model_data/voc_classes.txt'#----------------------------------------------------------------------------------------------------------------------------##   权值文件的下载请看README，可以通过网盘下载。模型的 预训练权重 对不同数据集是通用的，因为特征是通用的。#   模型的 预训练权重 比较重要的部分是 主干特征提取网络的权值部分，用于进行特征提取。#   预训练权重对于99%的情况都必须要用，不用的话主干部分的权值太过随机，特征提取效果不明显，网络训练的结果也不会好##   如果训练过程中存在中断训练的操作，可以将model_path设置成logs文件夹下的权值文件，将已经训练了一部分的权值再次载入。#   同时修改下方的 冻结阶段 或者 解冻阶段 的参数，来保证模型epoch的连续性。#   #   当model_path = ''的时候不加载整个模型的权值。##   此处使用的是整个模型的权重，因此是在train.py进行加载的，下面的pretrain不影响此处的权值加载。#   如果想要让模型从主干的预训练权值开始训练，则设置model_path = ''，下面的pretrain = True，此时仅加载主干。#   如果想要让模型从0开始训练，则设置model_path = ''，下面的pretrain = Fasle，Freeze_Train = Fasle，此时从0开始训练，且没有冻结主干的过程。#   #   一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！#   如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。#----------------------------------------------------------------------------------------------------------------------------#model_path      = 'model_data/detr_resnet50_weights_coco.pth'#------------------------------------------------------##   input_shape     输入的shape大小#------------------------------------------------------#input_shape     = [800, 800]#---------------------------------------------##   resnet50#   resnet101#---------------------------------------------#backbone        = "resnet50"#----------------------------------------------------------------------------------------------------------------------------##   pretrained      是否使用主干网络的预训练权重，此处使用的是主干的权重，因此是在模型构建的时候进行加载的。#                   如果设置了model_path，则主干的权值无需加载，pretrained的值无意义。#                   如果不设置model_path，pretrained = True，此时仅加载主干开始训练。#                   如果不设置model_path，pretrained = False，Freeze_Train = Fasle，此时从0开始训练，且没有冻结主干的过程。#----------------------------------------------------------------------------------------------------------------------------#pretrained      = False#----------------------------------------------------------------------------------------------------------------------------##   训练分为两个阶段，分别是冻结阶段和解冻阶段。设置冻结阶段是为了满足机器性能不足的同学的训练需求。#   冻结训练需要的显存较小，显卡非常差的情况下，可设置Freeze_Epoch等于UnFreeze_Epoch，此时仅仅进行冻结训练。#      #   在此提供若干参数设置建议，各位训练者根据自己的需求进行灵活调整：#   （一）从整个模型的预训练权重开始训练： #       AdamW：#           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adamw'，Init_lr = 1e-4，weight_decay = 1e-4。（冻结）#           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adamw'，Init_lr = 1e-4，weight_decay = 1e-4。（不冻结）#       其中：UnFreeze_Epoch可以在100-300之间调整。#   （二）从主干网络的预训练权重开始训练：#       AdamW：#           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 300，Freeze_Train = True，optimizer_type = 'adamw'，Init_lr = 1e-4，weight_decay = 1e-4。（冻结）#           Init_Epoch = 0，UnFreeze_Epoch = 300，Freeze_Train = False，optimizer_type = 'adamw'，Init_lr = 1e-4，weight_decay = 1e-4。（不冻结）#       其中：由于从主干网络的预训练权重开始训练，主干的权值不一定适合目标检测，需要更多的训练跳出局部最优解。#             UnFreeze_Epoch可以在150-300之间调整，YOLOV5和YOLOX均推荐使用300。#             Adam相较于SGD收敛的快一些。因此UnFreeze_Epoch理论上可以小一点，但依然推荐更多的Epoch。#   （三）batch_size的设置：#       在显卡能够接受的范围内，以大为好。显存不足与数据集大小无关，提示显存不足（OOM或者CUDA out of memory）请调小batch_size。#       受到BatchNorm层影响，batch_size最小为2，不能为1。#       正常情况下Freeze_batch_size建议为Unfreeze_batch_size的1-2倍。不建议设置的差距过大，因为关系到学习率的自动调整。#----------------------------------------------------------------------------------------------------------------------------##------------------------------------------------------------------##   冻结阶段训练参数#   此时模型的主干被冻结了，特征提取网络不发生改变#   占用的显存较小，仅对网络进行微调#   Init_Epoch          模型当前开始的训练世代，其值可以大于Freeze_Epoch，如设置：#                       Init_Epoch = 60、Freeze_Epoch = 50、UnFreeze_Epoch = 100#                       会跳过冻结阶段，直接从60代开始，并调整对应的学习率。#                       （断点续练时使用）#   Freeze_Epoch        模型冻结训练的Freeze_Epoch#                       (当Freeze_Train=False时失效)#   Freeze_batch_size   模型冻结训练的batch_size#                       (当Freeze_Train=False时失效)#------------------------------------------------------------------#Init_Epoch          = 0Freeze_Epoch        = 50Freeze_batch_size   = 8#------------------------------------------------------------------##   解冻阶段训练参数#   此时模型的主干不被冻结了，特征提取网络会发生改变#   占用的显存较大，网络所有的参数都会发生改变#   UnFreeze_Epoch          模型总共训练的epoch#                           SGD需要更长的时间收敛，因此设置较大的UnFreeze_Epoch#                           Adam可以使用相对较小的UnFreeze_Epoch#   Unfreeze_batch_size     模型在解冻后的batch_size#------------------------------------------------------------------#UnFreeze_Epoch      = 300Unfreeze_batch_size = 4#------------------------------------------------------------------##   Freeze_Train    是否进行冻结训练#                   默认先冻结主干训练后解冻训练。#------------------------------------------------------------------#Freeze_Train        = True#------------------------------------------------------------------##   其它训练参数：学习率、优化器、学习率下降有关#------------------------------------------------------------------##------------------------------------------------------------------##   Init_lr         模型的最大学习率，在DETR中，Backbone的学习率为Transformer模块的0.1倍#   Min_lr          模型的最小学习率，默认为最大学习率的0.01#------------------------------------------------------------------#Init_lr             = 1e-4Min_lr              = Init_lr * 0.01#------------------------------------------------------------------##   optimizer_type  使用到的优化器种类，可选的有adam、sgd#                   当使用Adam优化器时建议设置  Init_lr=1e-4#                   当使用AdamW优化器时建议设置  Init_lr=1e-4#                   当使用SGD优化器时建议设置   Init_lr=1e-2#   momentum        优化器内部使用到的momentum参数#   weight_decay    权值衰减，可防止过拟合#                   adam会导致weight_decay错误，使用adam时建议设置为0。#------------------------------------------------------------------#optimizer_type      = "adamw"momentum            = 0.9weight_decay        = 1e-4#------------------------------------------------------------------##   lr_decay_type   使用到的学习率下降方式，可选的有step、cos#------------------------------------------------------------------#lr_decay_type       = "cos"#------------------------------------------------------------------##   save_period     多少个epoch保存一次权值#------------------------------------------------------------------#save_period         = 10#------------------------------------------------------------------##   save_dir        权值与日志文件保存的文件夹#------------------------------------------------------------------#save_dir            = 'logs'#------------------------------------------------------------------##   eval_flag       是否在训练时进行评估，评估对象为验证集#                   安装pycocotools库后，评估体验更佳。#   eval_period     代表多少个epoch评估一次，不建议频繁的评估#                   评估需要消耗较多的时间，频繁评估会导致训练非常慢#   此处获得的mAP会与get_map.py获得的会有所不同，原因有二：#   （一）此处获得的mAP为验证集的mAP。#   （二）此处设置评估参数较为保守，目的是加快评估速度。#------------------------------------------------------------------#eval_flag           = Trueeval_period         = 10#------------------------------------------------------------------##   官方提示为TODO this is a hack#   稳定性未知，默认为不开启#------------------------------------------------------------------#aux_loss            = False#------------------------------------------------------------------##   num_workers     用于设置是否使用多线程读取数据#                   开启后会加快数据读取速度，但是会占用更多内存#                   内存较小的电脑可以设置为2或者0  #------------------------------------------------------------------#num_workers         = 4#----------------------------------------------------##   获得图片路径和标签#----------------------------------------------------#train_annotation_path   = '2007_train.txt'val_annotation_path     = '2007_val.txt'seed_everything(seed)#------------------------------------------------------##   设置用到的显卡#------------------------------------------------------#ngpus_per_node  = torch.cuda.device_count()if distributed:dist.init_process_group(backend="nccl")local_rank  = int(os.environ["LOCAL_RANK"])rank        = int(os.environ["RANK"])device      = torch.device("cuda", local_rank)if local_rank == 0:print(f"[{os.getpid()}] (rank = {rank}, local_rank = {local_rank}) training...")print("Gpu Device Count : ", ngpus_per_node)else:device          = torch.device('cuda' if torch.cuda.is_available() else 'cpu')local_rank      = 0rank            = 0#----------------------------------------------------##   获取classes和anchor#----------------------------------------------------#class_names, num_classes = get_classes(classes_path)#------------------------------------------------------##   创建detr模型#------------------------------------------------------#model = DETR(backbone, 'sine', 256, num_classes, 100, pretrained=pretrained)if model_path != '':#------------------------------------------------------##   权值文件请看README，百度网盘下载#------------------------------------------------------#if local_rank == 0:print('Load weights {}.'.format(model_path))#------------------------------------------------------##   根据预训练权重的Key和模型的Key进行加载#------------------------------------------------------#model_dict      = model.state_dict()pretrained_dict = torch.load(model_path, map_location = device)load_key, no_load_key, temp_dict = [], [], {}for k, v in pretrained_dict.items():if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):temp_dict[k] = vload_key.append(k)else:no_load_key.append(k)model_dict.update(temp_dict)model.load_state_dict(model_dict)#------------------------------------------------------##   显示没有匹配上的Key#------------------------------------------------------#if local_rank == 0:print("\nSuccessful Load Key:", str(load_key)[:500], "……\nSuccessful Load Key Num:", len(load_key))print("\nFail To Load Key:", str(no_load_key)[:500], "……\nFail To Load Key num:", len(no_load_key))print("\n\033[1;33;44m温馨提示，head部分没有载入是正常现象，Backbone部分没有载入是错误的。\033[0m")#----------------------##   获得损失函数#----------------------#detr_loss = build_loss(num_classes)#----------------------##   记录Loss#----------------------#if local_rank == 0:time_str        = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S')log_dir         = os.path.join(save_dir, "loss_" + str(time_str))loss_history    = LossHistory(log_dir, model, input_shape=input_shape)else:loss_history    = None#------------------------------------------------------------------##   torch 1.2不支持amp，建议使用torch 1.7.1及以上正确使用fp16#   因此torch1.2这里显示"could not be resolve"#------------------------------------------------------------------#if fp16:from torch.cuda.amp import GradScaler as GradScalerscaler = GradScaler()else:scaler = Nonemodel_train     = model.train()if Cuda:if distributed:#----------------------------##   多卡平行运行#----------------------------#model_train = model_train.cuda(local_rank)detr_loss   = detr_loss.cuda(local_rank)model_train = torch.nn.parallel.DistributedDataParallel(model_train, device_ids=[local_rank], find_unused_parameters=True)else:model_train = torch.nn.DataParallel(model)cudnn.benchmark = Truemodel_train = model_train.cuda()detr_loss   = detr_loss.cuda()#---------------------------##   读取数据集对应的txt#---------------------------#with open(train_annotation_path) as f:train_lines = f.readlines()with open(val_annotation_path) as f:val_lines   = f.readlines()num_train   = len(train_lines)num_val     = len(val_lines)if local_rank == 0:show_config(classes_path = classes_path, model_path = model_path, input_shape = input_shape, \Init_Epoch = Init_Epoch, Freeze_Epoch = Freeze_Epoch, UnFreeze_Epoch = UnFreeze_Epoch, \Freeze_batch_size = Freeze_batch_size, Unfreeze_batch_size = Unfreeze_batch_size, Freeze_Train = Freeze_Train, \Init_lr = Init_lr, Min_lr = Min_lr, optimizer_type = optimizer_type, momentum = momentum, lr_decay_type = lr_decay_type, \save_period = save_period, save_dir = save_dir, num_workers = num_workers, num_train = num_train, num_val = num_val)#---------------------------------------------------------##   总训练世代指的是遍历全部数据的总次数#   总训练步长指的是梯度下降的总次数 #   每个训练世代包含若干训练步长，每个训练步长进行一次梯度下降。#   此处仅建议最低训练世代，上不封顶，计算时只考虑了解冻部分#----------------------------------------------------------#wanted_step = 5e4 if optimizer_type == "sgd" else 1.5e4total_step  = num_train // Unfreeze_batch_size * UnFreeze_Epochif total_step <= wanted_step:if num_train // Unfreeze_batch_size == 0:raise ValueError('数据集过小，无法进行训练，请扩充数据集。')wanted_epoch = wanted_step // (num_train // Unfreeze_batch_size) + 1print("\n\033[1;33;44m[Warning] 使用%s优化器时，建议将训练总步长设置到%d以上。\033[0m"%(optimizer_type, wanted_step))print("\033[1;33;44m[Warning] 本次运行的总训练数据量为%d，Unfreeze_batch_size为%d，共训练%d个Epoch，计算出总训练步长为%d。\033[0m"%(num_train, Unfreeze_batch_size, UnFreeze_Epoch, total_step))print("\033[1;33;44m[Warning] 由于总训练步长为%d，小于建议总步长%d，建议设置总世代为%d。\033[0m"%(total_step, wanted_step, wanted_epoch))#------------------------------------------------------##   主干特征提取网络特征通用，冻结训练可以加快训练速度#   也可以在训练初期防止权值被破坏。#   Init_Epoch为起始世代#   Freeze_Epoch为冻结训练的世代#   UnFreeze_Epoch总训练世代#   提示OOM或者显存不足请调小Batch_size#------------------------------------------------------#if True:UnFreeze_flag = False#------------------------------------##   冻结一定部分训练#------------------------------------#if Freeze_Train:for param in model.backbone.parameters():param.requires_grad = False# ------------------------------------##   冻结bn层# ------------------------------------#model.freeze_bn()#-------------------------------------------------------------------##   如果不冻结训练的话，直接设置batch_size为Unfreeze_batch_size#-------------------------------------------------------------------#batch_size = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size#-------------------------------------------------------------------##   判断当前batch_size，自适应调整学习率#-------------------------------------------------------------------#if optimizer_type in ['adam', 'adamw']:Init_lr_fit = Init_lrMin_lr_fit  = Min_lrelse:nbs             = 64lr_limit_max    = 5e-2lr_limit_min    = 5e-4Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)#---------------------------------------##   根据optimizer_type选择优化器#---------------------------------------#param_dicts = [{"params": [p for n, p in model.named_parameters() if "backbone" not in n]},{"params": [p for n, p in model.named_parameters() if "backbone" in n],"lr": Init_lr_fit / 10,},]optimizer = {'adam'  : optim.Adam(param_dicts, Init_lr_fit, betas = (momentum, 0.999), weight_decay=weight_decay),'adamw' : optim.AdamW(param_dicts, Init_lr_fit, betas = (momentum, 0.999), weight_decay=weight_decay),'sgd'   : optim.SGD(param_dicts, Init_lr_fit, momentum = momentum, nesterov=True, weight_decay=weight_decay),}[optimizer_type]lr_scale_ratio = [1, 0.1]#---------------------------------------##   获得学习率下降的公式#---------------------------------------#lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)#---------------------------------------##   判断每一个世代的长度#---------------------------------------#epoch_step      = num_train // batch_sizeepoch_step_val  = num_val // batch_sizeif epoch_step == 0 or epoch_step_val == 0:raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")#---------------------------------------##   构建数据集加载器。#---------------------------------------#train_dataset   = DetrDataset(train_lines, input_shape, num_classes, train = True)val_dataset     = DetrDataset(val_lines, input_shape, num_classes, train = False)if distributed:train_sampler   = torch.utils.data.distributed.DistributedSampler(train_dataset, shuffle=True,)val_sampler     = torch.utils.data.distributed.DistributedSampler(val_dataset, shuffle=False,)batch_size      = batch_size // ngpus_per_nodeshuffle         = Falseelse:train_sampler   = Noneval_sampler     = Noneshuffle         = Truegen             = DataLoader(train_dataset, shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True,drop_last=True, collate_fn=detr_dataset_collate, sampler=train_sampler, worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))gen_val         = DataLoader(val_dataset  , shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, drop_last=True, collate_fn=detr_dataset_collate, sampler=val_sampler, worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))#----------------------##   记录eval的map曲线#----------------------#if local_rank == 0:eval_callback   = EvalCallback(model, input_shape[0], class_names, num_classes, val_lines, log_dir, Cuda, \eval_flag=eval_flag, period=eval_period)else:eval_callback   = None#---------------------------------------##   开始模型训练#---------------------------------------#for epoch in range(Init_Epoch, UnFreeze_Epoch):#---------------------------------------##   如果模型有冻结学习部分#   则解冻，并设置参数#---------------------------------------#if epoch >= Freeze_Epoch and not UnFreeze_flag and Freeze_Train:batch_size = Unfreeze_batch_size#-------------------------------------------------------------------##   判断当前batch_size，自适应调整学习率#-------------------------------------------------------------------#if optimizer_type in ['adam', 'adamw']:Init_lr_fit = Init_lrMin_lr_fit  = Min_lrelse:nbs             = 64lr_limit_max    = 5e-2lr_limit_min    = 5e-4Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)#---------------------------------------##   获得学习率下降的公式#---------------------------------------#lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)for param in model.backbone.parameters():param.requires_grad = True# ------------------------------------##   冻结bn层# ------------------------------------#model.freeze_bn()epoch_step      = num_train // batch_sizeepoch_step_val  = num_val // batch_sizeif epoch_step == 0 or epoch_step_val == 0:raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")if distributed:batch_size = batch_size // ngpus_per_nodegen             = DataLoader(train_dataset, shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True,drop_last=True, collate_fn=detr_dataset_collate, sampler=train_sampler, worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))gen_val         = DataLoader(val_dataset  , shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, drop_last=True, collate_fn=detr_dataset_collate, sampler=val_sampler, worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))UnFreeze_flag = Trueif distributed:train_sampler.set_epoch(epoch)set_optimizer_lr(optimizer, lr_scheduler_func, epoch, lr_scale_ratio)fit_one_epoch(model_train, model, detr_loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, UnFreeze_Epoch, Cuda, fp16, scaler, save_period, save_dir, local_rank)if distributed:dist.barrier()if local_rank == 0:loss_history.writer.close()

DETRs网络模型如下：

import colorsys
import os
import timeimport numpy as np
import torch
import torch.nn as nn
from PIL import ImageDraw, ImageFontfrom nets.detr import DETR
from utils.utils import (cvtColor, get_classes, preprocess_input,resize_image, show_config)
from utils.utils_bbox import DecodeBox'''
训练自己的数据集必看注释！
'''
class Detection_Transformers(object):_defaults = {#--------------------------------------------------------------------------##   使用自己训练好的模型进行预测一定要修改model_path和classes_path！#   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt##   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。#   验证集损失较低不代表mAP较高，仅代表该权值在验证集上泛化性能较好。#   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改#--------------------------------------------------------------------------#"model_path"        : 'logs/best_epoch_weights.pth',"classes_path"      : 'model_data/voc_classes.txt',#---------------------------------------------------------------------##   输入图片的大小#---------------------------------------------------------------------#"min_length"        : 800,#---------------------------------------------------------------------##   只有得分大于置信度的预测框会被保留下来#---------------------------------------------------------------------#"confidence"        : 0.5,#---------------------------------------------------------------------##   主干网络的种类#---------------------------------------------------------------------#"backbone"          : 'resnet50',#-------------------------------##   是否使用Cuda#   没有GPU可以设置成False#-------------------------------#"cuda"              : True,}@classmethoddef get_defaults(cls, n):if n in cls._defaults:return cls._defaults[n]else:return "Unrecognized attribute name '" + n + "'"#---------------------------------------------------##   初始化detr#---------------------------------------------------#def __init__(self, **kwargs):self.__dict__.update(self._defaults)for name, value in kwargs.items():setattr(self, name, value)self._defaults[name] = value #---------------------------------------------------##   获得种类和先验框的数量#---------------------------------------------------#self.class_names, self.num_classes  = get_classes(self.classes_path)self.bbox_util                      = DecodeBox()#---------------------------------------------------##   画框设置不同的颜色#---------------------------------------------------#hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))self.generate()show_config(**self._defaults)#---------------------------------------------------##   生成模型#---------------------------------------------------#def generate(self, onnx=False):#---------------------------------------------------##   建立detr模型，载入detr模型的权重#---------------------------------------------------#self.net    = DETR(self.backbone, 'sine', 256, self.num_classes, num_queries=100)device      = torch.device('cuda' if torch.cuda.is_available() else 'cpu')self.net.load_state_dict(torch.load(self.model_path, map_location=device))self.net    = self.net.eval()print('{} model, anchors, and classes loaded.'.format(self.model_path))if not onnx:if self.cuda:self.net = nn.DataParallel(self.net)self.net = self.net.cuda()#---------------------------------------------------##   检测图片#---------------------------------------------------#def detect_image(self, image, crop = False, count = False):image_shape = np.array(np.shape(image)[0:2])#---------------------------------------------------------##   在这里将图像转换成RGB图像，防止灰度图在预测时报错。#   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB#---------------------------------------------------------#image       = cvtColor(image)#---------------------------------------------------------##   给图像增加灰条，实现不失真的resize#   也可以直接resize进行识别#---------------------------------------------------------#image_data  = resize_image(image, self.min_length)#---------------------------------------------------------##   添加上batch_size维度#---------------------------------------------------------#image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)with torch.no_grad():images          = torch.from_numpy(image_data)images_shape    = torch.unsqueeze(torch.from_numpy(image_shape), 0)if self.cuda:images          = images.cuda()images_shape    = images_shape.cuda()#---------------------------------------------------------##   将图像输入网络当中进行预测！#---------------------------------------------------------#outputs = self.net(images)results = self.bbox_util(outputs, images_shape, self.confidence)if results[0] is None: return image_results    = results[0].cpu().numpy()top_label   = np.array(_results[:, 5], dtype = 'int32')top_conf    = _results[:, 4]top_boxes   = _results[:, :4]#---------------------------------------------------------##   设置字体与边框厚度#---------------------------------------------------------#font        = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))thickness   = int(max((image.size[0] + image.size[1]) // self.min_length, 1))#---------------------------------------------------------##   计数#---------------------------------------------------------#if count:print("top_label:", top_label)classes_nums    = np.zeros([self.num_classes])for i in range(self.num_classes):num = np.sum(top_label == i)if num > 0:print(self.class_names[i], " : ", num)classes_nums[i] = numprint("classes_nums:", classes_nums)#---------------------------------------------------------##   是否进行目标的裁剪#---------------------------------------------------------#if crop:for i, c in list(enumerate(top_label)):top, left, bottom, right = top_boxes[i]top     = max(0, np.floor(top).astype('int32'))left    = max(0, np.floor(left).astype('int32'))bottom  = min(image.size[1], np.floor(bottom).astype('int32'))right   = min(image.size[0], np.floor(right).astype('int32'))dir_save_path = "img_crop"if not os.path.exists(dir_save_path):os.makedirs(dir_save_path)crop_image = image.crop([left, top, right, bottom])crop_image.save(os.path.join(dir_save_path, "crop_" + str(i) + ".png"), quality=95, subsampling=0)print("save crop_" + str(i) + ".png to " + dir_save_path)#---------------------------------------------------------##   图像绘制#---------------------------------------------------------#for i, c in list(enumerate(top_label)):predicted_class = self.class_names[int(c)]box             = top_boxes[i]score           = top_conf[i]top, left, bottom, right = boxtop     = max(0, np.floor(top).astype('int32'))left    = max(0, np.floor(left).astype('int32'))bottom  = min(image.size[1], np.floor(bottom).astype('int32'))right   = min(image.size[0], np.floor(right).astype('int32'))label = '{} {:.2f}'.format(predicted_class, score)draw = ImageDraw.Draw(image)label_size = draw.textsize(label, font)label = label.encode('utf-8')print(label, top, left, bottom, right)if top - label_size[1] >= 0:text_origin = np.array([left, top - label_size[1]])else:text_origin = np.array([left, top + 1])for i in range(thickness):draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c])draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c])draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)del drawreturn imagedef get_FPS(self, image, test_interval):image_shape = np.array(np.shape(image)[0:2])image       = cvtColor(image)image_data  = resize_image(image, self.min_length)image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)with torch.no_grad():images          = torch.from_numpy(image_data)images_shape    = torch.unsqueeze(torch.from_numpy(image_shape), 0)if self.cuda:images          = images.cuda()images_shape    = images_shape.cuda()outputs = self.net(images)results = self.bbox_util(outputs, images_shape, self.confidence)t1 = time.time()for _ in range(test_interval):with torch.no_grad():images          = torch.from_numpy(image_data)images_shape    = torch.unsqueeze(torch.from_numpy(image_shape), 0)if self.cuda:images          = images.cuda()images_shape    = images_shape.cuda()outputs = self.net(images)results = self.bbox_util(outputs, images_shape, self.confidence)t2 = time.time()tact_time = (t2 - t1) / test_intervalreturn tact_timedef convert_to_onnx(self, simplify, model_path):import onnxself.generate(onnx=True)im                  = torch.zeros(1, 3, *self.input_shape).to('cpu')  # image size(1, 3, 512, 512) BCHWinput_layer_names   = ["images"]output_layer_names  = ["output"]# Export the modelprint(f'Starting export with onnx {onnx.__version__}.')torch.onnx.export(self.net,im,f               = model_path,verbose         = False,opset_version   = 12,training        = torch.onnx.TrainingMode.EVAL,do_constant_folding = True,input_names     = input_layer_names,output_names    = output_layer_names,dynamic_axes    = None)# Checksmodel_onnx = onnx.load(model_path)  # load onnx modelonnx.checker.check_model(model_onnx)  # check onnx model# Simplify onnxif simplify:import onnxsimprint(f'Simplifying with onnx-simplifier {onnxsim.__version__}.')model_onnx, check = onnxsim.simplify(model_onnx,dynamic_input_shape=False,input_shapes=None)assert check, 'assert check failed'onnx.save(model_onnx, model_path)print('Onnx model save as {}'.format(model_path))def get_map_txt(self, image_id, image, class_names, map_out_path):f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") image_shape = np.array(np.shape(image)[0:2])#---------------------------------------------------------##   在这里将图像转换成RGB图像，防止灰度图在预测时报错。#   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB#---------------------------------------------------------#image       = cvtColor(image)#---------------------------------------------------------##   给图像增加灰条，实现不失真的resize#   也可以直接resize进行识别#---------------------------------------------------------#image_data  = resize_image(image, self.min_length)#---------------------------------------------------------##   添加上batch_size维度#---------------------------------------------------------#image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)with torch.no_grad():images          = torch.from_numpy(image_data)images_shape    = torch.unsqueeze(torch.from_numpy(image_shape), 0)if self.cuda:images          = images.cuda()images_shape    = images_shape.cuda()#---------------------------------------------------------##   将图像输入网络当中进行预测！#---------------------------------------------------------#outputs = self.net(images)results = self.bbox_util(outputs, images_shape, self.confidence)if results[0] is None: return _results    = results[0].cpu().numpy()top_label   = np.array(_results[:, 5], dtype = 'int32')top_conf    = _results[:, 4]top_boxes   = _results[:, :4]for i, c in list(enumerate(top_label)):predicted_class = self.class_names[int(c)]box             = top_boxes[i]score           = str(top_conf[i])top, left, bottom, right = boxif predicted_class not in class_names:continuef.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))f.close()return

将训练好的模型参数引入DETRs代码中，再进行预测操作，代码如下：

import timeimport cv2
import numpy as np
from PIL import Imagefrom detr import Detection_Transformersif __name__ == "__main__":detr = Detection_Transformers()mode = "predict"#-------------------------------------------------------------------------##   crop                指定了是否在单张图片预测后对目标进行截取#   count               指定了是否进行目标的计数#   crop、count仅在mode='predict'时有效#-------------------------------------------------------------------------#crop            = Falsecount           = False#----------------------------------------------------------------------------------------------------------##   video_path          用于指定视频的路径，当video_path=0时表示检测摄像头#                       想要检测视频，则设置如video_path = "xxx.mp4"即可，代表读取出根目录下的xxx.mp4文件。#   video_save_path     表示视频保存的路径，当video_save_path=""时表示不保存#                       想要保存视频，则设置如video_save_path = "yyy.mp4"即可，代表保存为根目录下的yyy.mp4文件。#   video_fps           用于保存的视频的fps##   video_path、video_save_path和video_fps仅在mode='video'时有效#   保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。#----------------------------------------------------------------------------------------------------------#video_path      = 0video_save_path = ""video_fps       = 25.0#----------------------------------------------------------------------------------------------------------##   test_interval       用于指定测量fps的时候，图片检测的次数。理论上test_interval越大，fps越准确。#   fps_image_path      用于指定测试的fps图片#   #   test_interval和fps_image_path仅在mode='fps'有效#----------------------------------------------------------------------------------------------------------#test_interval   = 100fps_image_path  = "img/street.jpg"#-------------------------------------------------------------------------##   dir_origin_path     指定了用于检测的图片的文件夹路径#   dir_save_path       指定了检测完图片的保存路径#   #   dir_origin_path和dir_save_path仅在mode='dir_predict'时有效#-------------------------------------------------------------------------#dir_origin_path = "img/"dir_save_path   = "img/"#-------------------------------------------------------------------------##   simplify            使用Simplify onnx#   onnx_save_path      指定了onnx的保存路径#-------------------------------------------------------------------------#simplify        = Trueonnx_save_path  = "model_data/models.onnx"if mode == "predict":'''1、如果想要进行检测完的图片的保存，利用r_image.save("img.jpg")即可保存，直接在predict.py里进行修改即可。 2、如果想要获得预测框的坐标，可以进入detr.detect_image函数，在绘图部分读取top，left，bottom，right这四个值。3、如果想要利用预测框截取下目标，可以进入detr.detect_image函数，在绘图部分利用获取到的top，left，bottom，right这四个值在原图上利用矩阵的方式进行截取。4、如果想要在预测图上写额外的字，比如检测到的特定目标的数量，可以进入detr.detect_image函数，在绘图部分对predicted_class进行判断，比如判断if predicted_class == 'car': 即可判断当前目标是否为车，然后记录数量即可。利用draw.text即可写字。'''while True:img = input('Input image filename:')try:image = Image.open(img)except:print('Open Error! Try again!')continueelse:r_image = detr.detect_image(image, crop = crop, count=count)r_image.show()elif mode == "video":capture = cv2.VideoCapture(video_path)if video_save_path!="":fourcc  = cv2.VideoWriter_fourcc(*'XVID')size    = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))out     = cv2.VideoWriter(video_save_path, fourcc, video_fps, size)ref, frame = capture.read()if not ref:raise ValueError("未能正确读取摄像头（视频），请注意是否正确安装摄像头（是否正确填写视频路径）。")fps = 0.0while(True):t1 = time.time()# 读取某一帧ref, frame = capture.read()if not ref:break# 格式转变，BGRtoRGBframe = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)# 转变成Imageframe = Image.fromarray(np.uint8(frame))# 进行检测frame = np.array(detr.detect_image(frame))# RGBtoBGR满足opencv显示格式frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR)fps  = ( fps + (1./(time.time()-t1)) ) / 2print("fps= %.2f"%(fps))frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)cv2.imshow("video",frame)c= cv2.waitKey(1) & 0xff if video_save_path!="":out.write(frame)if c==27:capture.release()breakprint("Video Detection Done!")capture.release()if video_save_path!="":print("Save processed video to the path :" + video_save_path)out.release()cv2.destroyAllWindows()elif mode == "fps":img = Image.open(fps_image_path)tact_time = detr.get_FPS(img, test_interval)print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1')elif mode == "dir_predict":import osfrom tqdm import tqdmimg_names = os.listdir(dir_origin_path)for img_name in tqdm(img_names):if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')):image_path  = os.path.join(dir_origin_path, img_name)image       = Image.open(image_path)r_image     = detr.detect_image(image)if not os.path.exists(dir_save_path):os.makedirs(dir_save_path)r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0)elif mode == "export_onnx":detr.convert_to_onnx(simplify, onnx_save_path)else:raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps', 'export_onnx', 'dir_predict'.")