YOLOv8改进 | 激活函数 | 十余种常见的激活函数一键替换【完整代码】

秋招面试专栏推荐 ：深度学习算法工程师面试问题总结【百面算法工程师】——点击即可跳转

💡💡💡本专栏所有程序均经过测试，可成功执行💡💡💡

专栏目录：《YOLOv8改进有效涨点》专栏介绍 & 专栏目录 | 目前已有80+篇内容，内含各种Head检测头、损失函数Loss、Backbone、Neck、NMS等创新点改进——点击即可跳转

本文给大家介绍的是常见的十余种激活函数替换，因为每种激活函数都有一定的优势，因此我们可以在实验中尝试不同的激活函数进行实验。文章在介绍激活函数的主要原理后，将手把手教学如何进行模块的代码添加和修改，并将修改后的完整代码放在文章的最后，方便大家一键运行，小白也可轻松上手实践。以帮助您更好地学习深度学习目标检测YOLO系列的挑战。

专栏地址：YOLOv8改进——更新各种有效涨点方法——点击即可跳转 订阅专栏学习不迷路

1. YOLO训练中常见激活函数介绍

激活函数是神经网络中的关键组件，它们引入非线性特性，使模型能够学习复杂的模式。以下是几个常用的激活函数的优缺点及公式，以表格形式呈现：

激活函数	公式	优点	缺点
SiLU	$\text{SiLU}(x) = x \cdot \sigma(x)$	平滑非线性；在某些任务上比ReLU效果更好	计算复杂度稍高
ReLU	$\text{ReLU}(x) = \max(0, x)$	计算简单；收敛速度快	神经元死亡（Dead Neurons）问题
LeakyReLU	$\text{LeakyReLU}(x) = \max(0.01x, x)$	缓解神经元死亡问题	输出不以零为中心
Hardswish	$\text{Hardswish}(x) = x \cdot \frac{\max(0, \min(x+3, 6))}{6}$	近似Swish但计算更高效	相对于ReLU计算复杂度稍高
Mish	$\text{Mish}(x) = x \cdot \tanh(\ln(1 + e^x))$	平滑非线性；在某些任务上优于ReLU和Swish	计算复杂度高；训练时间较长
ELU	$\text{ELU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ \alpha (e^x - 1) & \text{if } x < 0 \end{cases}$	缓解神经元死亡问题；负值区域输出	计算复杂度稍高；α需要调优
GELU	$\text{GELU}(x) = x \cdot \Phi(x)$ (Φ(x)是标准正态分布的累积分布函数)	理论上优于ReLU；平滑非线性	计算复杂度高；训练时间较长
SELU	$\text{SELU}(x) = \lambda \begin{cases} x & \text{if } x \geq 0 \\ \alpha (e^x - 1) & \text{if } x < 0 \end{cases}$	自动标准化输出；在深层网络中效果较好	计算复杂度高；对参数和网络架构有一定要求
RReLU	$\text{RReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ r \cdot x & \text{if } x < 0 \end{cases}$ (r为在某区间内随机采样的值)	防止过拟合；在训练期间有正则化效果	在推理阶段需确定r的值；计算复杂度稍高
PReLU	$\text{PReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ \alpha x & \text{if } x < 0 \end{cases}$	缓解神经元死亡问题；参数可训练	增加了模型的参数数量

详细说明：

SiLU (Swish-1):
- 优点: 平滑非线性特性，在某些任务上优于ReLU。
- 缺点: 计算复杂度比ReLU稍高。
- 公式: $\text{SiLU}(x) = x \cdot \sigma(x)$
ReLU (Rectified Linear Unit):
- 优点: 计算简单，收敛速度快。
- 缺点: 可能导致神经元死亡，即在训练过程中某些神经元永远不会被激活。
- 公式: $\text{ReLU}(x) = \max(0, x)$
LeakyReLU:
- 优点: 缓解了ReLU的神经元死亡问题。
- 缺点: 输出不以零为中心，可能会影响梯度的均衡。
- 公式: $\text{LeakyReLU}(x) = \max(0.01x, x)$
Hardswish:
- 优点: 近似Swish但计算更高效。
- 缺点: 相对于ReLU，计算复杂度稍高。
- 公式: $\text{Hardswish}(x) = x \cdot \frac{\max(0, \min(x+3, 6))}{6}$
Mish:
- 优点: 平滑非线性特性，在某些任务上优于ReLU和Swish。
- 缺点: 计算复杂度高，训练时间较长。
- 公式: $\text{Mish}(x) = x \cdot \tanh(\ln(1 + e^x))$
ELU (Exponential Linear Unit):
- 优点: 缓解ReLU的神经元死亡问题，负值区域有输出。
- 缺点: 计算复杂度稍高，参数α需要调优。
- 公式: $\text{ELU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ \alpha (e^x - 1) & \text{if } x < 0 \end{cases}$
GELU (Gaussian Error Linear Unit):
- 优点: 理论上优于ReLU，平滑非线性。
- 缺点: 计算复杂度高，训练时间较长。
- 公式: $\text{GELU}(x) = x \cdot \Phi(x)$ ，其中Φ(x)是标准正态分布的累积分布函数。
SELU (Scaled Exponential Linear Unit):
- 优点: 自动标准化输出，在深层网络中效果较好。
- 缺点: 计算复杂度高，对参数和网络架构有一定要求。
- 公式: $\text{SELU}(x) = \lambda \begin{cases} x & \text{if } x \geq 0 \\ \alpha (e^x - 1) & \text{if } x < 0 \end{cases}$
RReLU (Randomized Leaky ReLU):
- 优点: 防止过拟合，在训练期间有正则化效果。
- 缺点: 在推理阶段需确定r的值，计算复杂度稍高。
- 公式: $\text{RReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ r \cdot x & \text{if } x < 0 \end{cases}$ ，其中r为在某区间内随机采样的值。
PReLU (Parametric ReLU):
- 优点: 缓解ReLU的神经元死亡问题，参数可训练。
- 缺点: 增加了模型的参数数量。
- 公式: $\text{PReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ \alpha x & \text{if } x < 0 \end{cases}$ ，其中α是可训练的参数。

下面这些激活函数也都是大家耳熟能详的

激活函数	公式	优点	缺点
Sigmoid	$\sigma(x) = \frac{1}{1 + e^{-x}}$	- 平滑，输出范围在 (0,1) - 适合处理概率问题	- 梯度消失问题 - 输出不是零中心 - 计算开销大
Tanh	$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$	- 输出零中心 - 梯度比Sigmoid更大	- 梯度消失问题 - 计算开销大
ReLU (Rectified Linear Unit)	$f(x) = \max(0, x)$	- 简单且高效 - 收敛速度快	- 梯度爆炸问题 - Dying ReLU问题（神经元死亡）
Leaky ReLU	$f(x) = \max(0.01x, x)$	- 缓解Dying ReLU问题 - 保留ReLU的优点	- 仍然可能发生梯度爆炸
Parametric ReLU (PReLU)	$f(x) = \max(\alpha x, x)$	- 通过学习参数α来改进Leaky ReLU - 更加灵活	- 计算开销稍高
ELU (Exponential Linear Unit)	$f(x) = \begin{cases} x & \text{if } x > 0 \\ \alpha(e^x - 1) & \text{if } x \le 0 \end{cases}$	- 缓解梯度消失问题 - 更好的鲁棒性	- 计算更复杂 - 需要调整参数α
Swish	$( f(x) = x \cdot \sigma(x) )$	- 训练效果优于ReLU - 平滑梯度	- 计算复杂 - 需要额外的计算资源
Softplus	$f(x) = \ln(1 + e^x)$	- 平滑ReLU - 没有Dying ReLU问题	- 梯度消失问题 - 计算开销大
GELU (Gaussian Error Linear Unit)	$f(x) = x \cdot \Phi(x)$ 其中 Phi(x)是标准正态分布的累积分布函数	- 在某些任务上表现更好 - 平滑梯度	- 计算复杂 - 需要额外的计算资源
Maxout	$f(x) = \max(w_1^T x + b_1, w_2^T x + b_2)$	- 更强的表示能力 - 解决Dying ReLU问题	- 参数多，计算开销大 - 容易过拟合

2 .修改YOLOv8的激活函数

YOLOv8中默认是的激活函数是Silu激活函数

修改激活函数的只有一个步骤，很简单。因为YOLOv8已经给我们封装好了

详细的代码如下

class Conv(nn.Module):"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""default_act = nn.SiLU()  # default activationdef __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):"""Initialize Conv layer with given arguments including activation."""super().__init__()self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)self.bn = nn.BatchNorm2d(c2)self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()def forward(self, x):"""Apply convolution, batch normalization and activation to input tensor."""return self.act(self.bn(self.conv(x)))def forward_fuse(self, x):"""Perform transposed convolution of 2D data."""return self.act(self.conv(x))

阅读上面的代码可以知道，我们的只要找到default_act即可，阅读完整代码发现，在task.py中可以给default_act进行传参。

def parse_model(d, ch, verbose=True):  # model_dict, input_channels(3)"""Parse a YOLO model.yaml dictionary into a PyTorch model."""import ast# Argsmax_channels = float("inf")nc, act, scales = (d.get(x) for x in ("nc", "activation", "scales"))depth, width, kpt_shape = (d.get(x, 1.0) for x in ("depth_multiple", "width_multiple", "kpt_shape"))if scales:scale = d.get("scale")if not scale:scale = tuple(scales.keys())[0]LOGGER.warning(f"WARNING ⚠️ no model scale passed. Assuming scale='{scale}'.")depth, width, max_channels = scales[scale]if act:Conv.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = nn.SiLU()if verbose:LOGGER.info(f"{colorstr('activation:')} {act}")  # print

同时这里也写明了，在yaml文件重新定义act即可

所以我们应该在yaml文件中新增像 activation = nn.SiLU 即可

activation = nn.SiLU() 
activation = nn.ReLU()
activation = nn.LeakyReLU()
activation = nn.Hardswish()
activation = nn.Mish()
activation = nn.ELU()  
activation = nn.GELU() 
activation = nn.SELU()
activation = nn.RReLU() 
activation = nn.PReLU()

完整的yaml文件

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'# [depth, width, max_channels]n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPss: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPsm: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPsl: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPsx: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPsactivation: nn.SiLU()  # 选择你需要的进行反注释，只留下一个激活函数即可
# activation: nn.ReLU()
# activation: nn.LeakyReLU()
# activation: nn.Hardswish()
# activation: nn.Mish()
# activation: nn.ELU()  
# activation: nn.GELU() 
# activation: nn.SELU()
# activation: nn.RReLU() 
# activation: nn.PReLU() # YOLOv8.0n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 6, C2f, [256, True]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 6, C2f, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 3, C2f, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9# YOLOv8.0n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 3, C2f, [512]] # 12- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 3, C2f, [256]] # 15 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 12], 1, Concat, [1]] # cat head P4- [-1, 3, C2f, [512]] # 18 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 9], 1, Concat, [1]] # cat head P5- [-1, 3, C2f, [1024]] # 21 (P5/32-large)- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)

3. 完整代码分享

https://pan.baidu.com/s/1vSuA60RTZjfUQlVMy7HYrw?pwd=a8ji

提取码: a8ji

YOLOv8改进 | 激活函数 | 十余种常见的激活函数一键替换【完整代码】

1. YOLO训练中常见激活函数介绍

2 .修改YOLOv8的激活函数

3. 完整代码分享

相关资讯

热文排行

最新新闻

推荐新闻

热搜词