欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 教育 > 锐评 > YOLO11改进-注意力-引入通道压缩的自注意力机制CRA

YOLO11改进-注意力-引入通道压缩的自注意力机制CRA

2025/2/25 19:37:12 来源:https://blog.csdn.net/qq_64693987/article/details/143328870  浏览:    关键词:YOLO11改进-注意力-引入通道压缩的自注意力机制CRA

           在语义分割任务中存在 MetaFormer 架构应用局限于自注意力计算效率低的问题。为解决这些问题,提出 提出 CRA 模块。CRA 它通过将查询和键的通道维度缩减为一维,在考虑全局上下文提取的同时,显著降低了自注意力的计算成本,提高了网络的计算效率。本文将CRA与C2PSA相结合,在降低计算成本的同时提高精度。  

1. Channel Reduction Attention (CRA)结构介绍          

        Channel Reduction Attention (CRA) 是一种基于通道压缩的自注意力机制,主要应用于语义分割中的全局特征捕捉。CRA的设计通过将查询(Query)和键(Key)压缩到一维,显著降低了计算复杂度,从而减少了计算资源的消耗。

  1. 多头自注意力机制:CRA基于多头自注意力(Multi-Head Self-Attention, MHSA),每个注意力头独立计算全局相似性,并将不同头的结果合并,以增强模型对不同特征的表达能力。输入特征被划分为多个注意力头,每个头的维度是一定的。通道压缩使得查询和键的维度分别降低,从而减少了计算复杂度。

  2. 查询和键的生成及通道压缩:在CRA中,输入特征经过线性投影生成查询、键和值。查询是通过一个线性投影矩阵直接生成,而键和值则是经过平均池化处理后生成的。经过池化和投影后的查询、键和值的维度各不相同,确保了信息的有效传递。

  3. 注意力权重的计算:为了提高数值计算的稳定性,查询和键之间的点积计算会乘以一个缩放因子。然后,通过矩阵乘法计算查询和键之间的相似性,生成注意力权重。

  4. 全局特征捕捉:将注意力矩阵与值进行相乘,得到加权后的特征表达。注意力矩阵将查询和键的相似性与值进行加权求和,从而生成全局上下文特征,最终输出特征。

  5. 计算效率提升:与传统方法相比,CRA的创新在于查询和键的通道压缩至一维,这大幅度减少了查询-键操作中的计算量,使得总计算量比传统的自注意力方法减少近一半,同时仍能有效地捕捉全局特征。

2. YOLOv11与CRA的结合

     CRA 模块的优点在于它通过将查询和键的通道维度缩减为一维,显著降低计算成本,同时能够充分提取全局相似性,有效考虑全局信息,提升精度。

        1. 本文将CRA与C2PSA相结合,使用CRA替换其中的FFN模块,构建C2PSA_CRA模块。

        2. 本文将CRA与C3K2相结合,使用CRA替换conv模块,构建C3k2_CRA模块。

3. Partial Convolution-based Feed-forward Network (PCFN)代码部分

import torch.nn as nn
import math
from timm.models.layers import trunc_normal_
import torch
from .block import PSABlock, C2PSA, C2f, C3, Bottleneck
from .conv import Conv# 定义一个通道压缩注意力模块类
class ChannelReductionAttention(nn.Module):def __init__(self, dim1, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., pool_ratio=2):super().__init__()# 确保dim1可以被head数量整除assert dim1 % num_heads == 0, f"dim {dim1} should be divided by num_heads {num_heads}."self.dim1 = dim1self.pool_ratio = pool_ratio  # 用于池化的比例self.num_heads = num_heads  # 注意力头数head_dim = dim1 // num_heads  # 每个注意力头的维度# 设置缩放因子,如果未提供qk_scale,则使用head_dim的倒数平方根self.scale = qk_scale or head_dim ** -0.5# 定义查询(q)、键(k)、值(v)的线性层self.q = nn.Linear(dim1, self.num_heads, bias=qkv_bias)self.k = nn.Linear(dim1, self.num_heads, bias=qkv_bias)self.v = nn.Linear(dim1, dim1, bias=qkv_bias)# 定义注意力和投影的dropout层self.attn_drop = nn.Dropout(attn_drop)self.proj = nn.Linear(dim1, dim1)self.proj_drop = nn.Dropout(proj_drop)# 定义池化和卷积操作,平均池化降低空间维度,卷积保持通道数self.pool = nn.AvgPool2d(pool_ratio, pool_ratio)self.sr = nn.Conv2d(dim1, dim1, kernel_size=1, stride=1)  # 1x1卷积保持输入和输出通道一致# 定义LayerNorm和激活函数self.norm = nn.LayerNorm(dim1)self.act = nn.GELU()# 初始化权重self.apply(self._init_weights)# 定义初始化函数,适用于线性层、LayerNorm和卷积层def _init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std=.02)  # 截断正态分布初始化if isinstance(m, nn.Linear) and m.bias is not None:nn.init.constant_(m.bias, 0)  # 偏置初始化为0elif isinstance(m, nn.LayerNorm):nn.init.constant_(m.bias, 0)  # LayerNorm的偏置初始化为0nn.init.constant_(m.weight, 1.0)  # LayerNorm的权重初始化为1elif isinstance(m, nn.Conv2d):# 使用Kaiming方法初始化卷积层fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsfan_out //= m.groupsm.weight.data.normal_(0, math.sqrt(2.0 / fan_out))if m.bias is not None:m.bias.data.zero_()  # 偏置初始化为0# 前向传播过程def forward(self, x):n_, _, h_, w_ = x.shapex = x.flatten(2).transpose(1, 2)B, N, C = x.shape  # 获取batch大小,序列长度和通道数# 计算查询q,将输入x通过线性层生成多头的查询向量q = self.q(x).reshape(B, N, self.num_heads).permute(0, 2, 1).unsqueeze(-1)# 将输入x调整为卷积所需的形状,并通过池化和卷积层处理x_ = x.permute(0, 2, 1).reshape(B, C, h_, w_)x_ = self.sr(self.pool(x_)).reshape(B, C, -1).permute(0, 2, 1)# 归一化并激活处理后的x_x_ = self.norm(x_)x_ = self.act(x_)# 计算键k和值v,类似于查询q的过程k = self.k(x_).reshape(B, -1, self.num_heads).permute(0, 2, 1).unsqueeze(-1)v = self.v(x_).reshape(B, -1, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)# 计算注意力得分,使用缩放因子进行缩放,然后应用softmaxattn = (q @ k.transpose(-2, -1)) * self.scaleattn = attn.softmax(dim=-1)attn = self.attn_drop(attn)  # 加入dropout# 将注意力分数和v相乘,得到注意力加权输出x = (attn @ v).transpose(1, 2).reshape(B, N, C)# 投影输出并添加投影的dropoutx = self.proj(x)x = self.proj_drop(x)x = x.permute(0, 2, 1).reshape(n_, -1, h_, w_)return xclass PSABlock_CRA(PSABlock):def __init__(self, c, qk_dim =16 , pdim=32, shortcut=True) -> None:"""Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction."""super().__init__(c)self.ffn = ChannelReductionAttention(c)class C2PSA_CRA(C2PSA):def __init__(self, c1, c2, n=1, e=0.5):"""Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio."""super().__init__(c1, c2)assert c1 == c2self.c = int(c1 * e)self.m = nn.Sequential(*(PSABlock_CRA(self.c, qk_dim =16 , pdim=32) for _ in range(n)))class Bottleneck_CRA(nn.Module):"""Standard bottleneck."""def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):"""Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""super().__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, k[0], 1)self.cv2 = ChannelReductionAttention(c_)self.add = shortcut and c1 == c2def forward(self, x):"""Applies the YOLO FPN to input data."""return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))class C3k(C3):"""C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):"""Initializes the C3k module with specified channels, number of layers, and configurations."""super().__init__(c1, c2, n, shortcut, g, e)c_ = int(c2 * e)  # hidden channels# self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))self.m = nn.Sequential(*(Bottleneck_CRA(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))# 在c3k=True时,使用C3k2_PCFN特征融合,为false的时候我们使用普通的Bottleneck提取特征
class C3k2_CRA(C2f):"""Faster Implementation of CSP Bottleneck with 2 convolutions."""def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):"""Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""super().__init__(c1, c2, n, shortcut, g, e)self.m = nn.ModuleList(C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n))if __name__ == '__main__':CRA = ChannelReductionAttention(256)#创建一个输入张量batch_size = 8input_tensor=torch.randn(batch_size, 256, 64, 64 )#运行模型并打印输入和输出的形状output_tensor =CRA(input_tensor)print("Input shape:",input_tensor.shape)print("0utput shape:",output_tensor.shape)

 4. 将CRA引入到YOLOv11中

第一: 将下面的核心代码复制到D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\nn路径下,如下图所示。

第二:在task.py中导入CRA包

第三:在task.py中的模型配置部分下面代码

        第一个改进需要修改的地方

         第二个改进需要修改的地方

第四:将模型配置文件复制到YOLOV11.YAMY文件中

        第一个改进的配置文件

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'# [depth, width, max_channels]n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPss: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPsm: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPsl: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPsx: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs# YOLO11n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 2, C3k2, [256, False, 0.25]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 2, C3k2, [512, False, 0.25]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 2, C3k2, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 2, C3k2, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9- [-1, 2, C2PSA_CRA, [1024]] # 10# YOLO11n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 2, C3k2, [512, False]] # 13- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 13], 1, Concat, [1]] # cat head P4- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 10], 1, Concat, [1]] # cat head P5- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

        第二个改进的配置文件

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'# [depth, width, max_channels]n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPss: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPsm: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPsl: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPsx: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs# YOLO11n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 2, C3k2_CRA, [256, False, 0.25]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 2, C3k2_CRA, [512, False, 0.25]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 2, C3k2_CRA, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 2, C3k2_CRA, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9- [-1, 2, C2PSA, [1024]] # 10# YOLO11n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 2, C3k2_CRA, [512, False]] # 13- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 2, C3k2_CRA, [256, False]] # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 13], 1, Concat, [1]] # cat head P4- [-1, 2, C3k2_CRA, [512, False]] # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 10], 1, Concat, [1]] # cat head P5- [-1, 2, C3k2_CRA, [1024, True]] # 22 (P5/32-large)- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

第五:运行成功


from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorldif __name__=="__main__":# 使用自己的YOLOv11.yamy文件搭建模型并加载预训练权重训练模型model = YOLO(r"D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\cfg\models\11\yolo11_CRA.yaml")\.load(r'D:\bilibili\model\YOLO11\ultralytics-main\yolo11n.pt')  # build from YAML and transfer weightsresults = model.train(data=r'D:\bilibili\model\ultralytics-main\ultralytics\cfg\datasets\VOC_my.yaml',epochs=100, imgsz=640, batch=4)

 

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

热搜词