欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 文旅 > 八卦 > 神经网络 -- 卷积层

神经网络 -- 卷积层

2025/3/16 13:56:40 来源:https://blog.csdn.net/weixin_73557167/article/details/146188963  浏览:    关键词:神经网络 -- 卷积层

神经网络之卷积层的介绍

这里主要介绍 Conv2d:
在最简单的情况下,
输入尺寸为 ( N , C i n , H , W ) (N, C_{in}, H, W) (N,Cin,H,W)
输出尺寸为 ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) (N,Cout,Hout,Wout)
的层的输出值可以精确描述为:
out ( N i , C o u t j ) = bias ( C o u t j ) + ∑ k = 0 C i n − 1 weight ( C o u t j , k ) ⋆ input ( N i , k ) \text{out}(N_i, C_{out_j}) = \text{bias}(C_{out_j}) + \sum_{k=0}^{C_{in}-1} \text{weight}(C_{out_j}, k) \star \text{input}(N_i, k) out(Ni,Coutj)=bias(Coutj)+k=0Cin1weight(Coutj,k)input(Ni,k)这里, ⋆ \star 是有效的2D互相关操作符, N N N 是批量大小, C C C 表示通道数, H H H 是输入平面的高度(以像素为单位), W W W 是宽度(以像素为单位)。
Conv2d的传参如下:

classtorch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, 
dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

下面详细解释各个参数的含义:

  • in_channels (int) – 输入图像的通道数

  • out_channels (int) – 卷积产生的输出通道数

  • kernel_size (int 或 tuple) – 卷积核的大小

  • stride (int 或 tuple, 可选) – 卷积的步长。默认值:1

  • padding (int, tuple 或 str, 可选) – 添加到输入四边的填充。默认值:0

  • dilation (int 或 tuple, 可选) – 卷积核元素之间的间距。默认值:1

  • groups (int, 可选) – 从输入通道到输出通道的阻塞连接数。默认值:1

  • bias (bool, 可选) – 如果为True,则在输出中添加一个可学习的偏置。默认值:True

  • padding_mode (str, 可选) – 填充模式,可选值为 ‘zeros’, ‘reflect’, ‘replicate’ 或 ‘circular’。默认值:‘zeros’

输出的形状是啥样的呢,有以下公式进行确定:
假定输入:

  • ( N , C i n , H i n , W i n ) (N, C_{in}, H_{in}, W_{in}) (N,Cin,Hin,Win) ( C i n , H i n , W i n ) (C_{in}, H_{in}, W_{in}) (Cin,Hin,Win)

则输出:

  • ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) (N,Cout,Hout,Wout) ( C o u t , H o u t , W o u t ) (C_{out}, H_{out}, W_{out}) (Cout,Hout,Wout),其中

H o u t = ⌊ H i n + 2 × padding [ 0 ] − dilation [ 0 ] × ( kernel_size [ 0 ] − 1 ) − 1 stride [ 0 ] + 1 ⌋ H_{out} = \left\lfloor \frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1 \right\rfloor Hout=stride[0]Hin+2×padding[0]dilation[0]×(kernel_size[0]1)1+1

W o u t = ⌊ W i n + 2 × padding [ 1 ] − dilation [ 1 ] × ( kernel_size [ 1 ] − 1 ) − 1 stride [ 1 ] + 1 ⌋ W_{out} = \left\lfloor \frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1 \right\rfloor Wout=stride[1]Win+2×padding[1]dilation[1]×(kernel_size[1]1)1+1

关于其中的 weightbias,其实是通过采样得到的,具体采样方式如下:

  • weight (Tensor) – 模块的可学习权重,形状为 ( out_channels , in_channels groups , kernel_size [ 0 ] , kernel_size [ 1 ] ) (\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size}[0], \text{kernel\_size}[1]) (out_channels,groupsin_channels,kernel_size[0],kernel_size[1])。这些权重的值从 U ( − k , k ) U(-k, k) U(k,k) 中采样,其中 k = g r o u p s C i n × ∏ i = 0 1 kernel_size [ i ] k = \sqrt{\frac{groups}{C_{in} \times \prod_{i=0}^{1} \text{kernel\_size}[i]}} k=Cin×i=01kernel_size[i]groups

  • bias (Tensor) – 模块的可学习偏置,形状为 ( out_channels ) (\text{out\_channels}) (out_channels)。如果 bias 为 True,则这些权重的值从 U ( − k , k ) U(-k, k) U(k,k) 中采样,其中 k = g r o u p s C i n × ∏ i = 0 1 kernel_size [ i ] k = \sqrt{\frac{groups}{C_{in} \times \prod_{i=0}^{1} \text{kernel\_size}[i]}} k=Cin×i=01kernel_size[i]groups

下面结合之前下载的FashionMNIST数据集进行卷积,然后进行输出观察结果:

需要注意的是,由于FashionMNIST数据集是黑白照,也就是通道数为1,因此我这里定义的James 类中的 conv1 输入通道就是1,要是换用其他的数据集,需要根据图片的通道尺寸进行确定,可以使用 print(img.shape) 进行查看。

dataset = torchvision.datasets.FashionMNIST("./data", train=False, transform=torchvision.transforms.ToTensor(),download=True)dataloader = DataLoader(dataset, batch_size=64, shuffle=False)  # 定义批量大小为 64class James(nn.Module):def __init__(self):super(James, self).__init__()self.conv1 = Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=0)def forward(self, x):x = self.conv1(x)return xdef run_error():james = James()writer = SummaryWriter("logs")epoch = 0step = 0for data in dataloader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , input_size = {}".format(epoch, imgs.shape))# Epoch : 0 , input_size = torch.Size([64, 1, 28, 28])print("Epoch : {} , output_size = {}".format(epoch, output.shape))# Epoch : 0 , output_size = torch.Size([64, 3, 26, 26])writer.add_images("in_e", imgs, step)writer.add_images("out_e", output, step)writer.add_images("out_e_plus", output, step)# 这里进行了两次卷积后的输出,尽管卷积操作相同,但我们发现在 tensorboard 中展示的图片颜色不一致# 查阅资料可知:卷积层生成的特征图通常包含负值或大于1的值,但图像像素值需要在0到1之间或者0到255之间的整数值。# 此外,这些特征图并不直接对应于可视化的RGB颜色空间,这导致了你观察到的颜色不一致现象。epoch = epoch + 1step = step + 1writer.close()def run_right():james = James()writer = SummaryWriter("logs")epoch = 0step = 0for data in dataloader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , input_size = {}".format(epoch, imgs.shape))# Epoch : 0 , input_size = torch.Size([64, 1, 28, 28])print("Epoch : {} , output_size = {}".format(epoch, output.shape))# Epoch : 0 , output_size = torch.Size([64, 3, 26, 26])writer.add_images("in_r", imgs, step)# 我希望将其变成 torch.Size([64, 3, 26, 26]) -> torch.Size([xxx, 1, 26, 26]),# 因为 FashionMNIST 数据集是黑白照片集,其通道数只有1,而不是传统的RGB三通道# 第一个数不知道是多少的时候直接写 -1 ,程序会根据后面进行计算output = torch.reshape(output, (-1, 1, 26, 26))print("Epoch : {} , reshape_output_size = {}".format(epoch, output.shape))writer.add_images("out_r", output, step)epoch = epoch + 1step = step + 1writer.close()

详细的解释在代码中有体现,这里就不进行赘述,主要需要说明的是:

  • 代码中进行了两次卷积后的输出,尽管卷积操作相同,但在 tensorboard 中展示的图片颜色不一致,因为这些特征图并不直接对应于可视化的RGB颜色空间
  • 要是不知道某一个位置的尺寸大小具体是多少,可以将其设置成 -1 ,这样程序回自动计算该处的数值。即 t o r c h . S i z e ( [ 64 , 3 , 26 , 26 ] ) − > t o r c h . S i z e ( [ − 1 , 1 , 26 , 26 ] ) torch.Size([64, 3, 26, 26]) -> torch.Size([-1, 1, 26, 26]) torch.Size([64,3,26,26])>torch.Size([1,1,26,26])

部分图片展示如下:
在这里插入图片描述

当然,对于使用下面的代码下载的数据集:

dataset = torchvision.datasets.XXX("./data", train=False, transform=torchvision.transforms.ToTensor(),download=True)

(这里 XXX 表示你需要下载的数据集),确实能解决大部分的问题。但是对于我们想自己创建自己的数据集来说,这样使用上述命令就无效了,因为网上没有这类的数据集,因此我们希望编写相关的代码定义自己的数据集,并进行上述类似的卷积操作。

下面编写代码,使用我们之前用到的数据集----蚂蚁蜜蜂数据集(下载链接),定义属于自己的数据集,并进行卷积,输出图片,看看效果:
具体代码如下:

class James_bees_ants(nn.Module):def __init__(self):super(James_bees_ants, self).__init__()self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)def forward(self, x):x = self.conv1(x)return xclass CustomAntsAndBeesImageDataset(Dataset):def __init__(self, root_dir, target_dir, transform=None):self.img_dir = os.path.join(root_dir, target_dir)self.transform = transform# 获取目录中的所有图像文件名self.img_names = [f for f in os.listdir(self.img_dir) iff.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif'))]def __len__(self):return len(self.img_names)def __getitem__(self, idx):img_name = self.img_names[idx]img_path = os.path.join(self.img_dir, img_name)image = Image.open(img_path).convert('RGB')if self.transform:image = self.transform(image)return image, img_namedef run_ants_and_bees():root_dir = "data/hymenoptera_data/train"ants_target_dir = "ants_image"bees_target_dir = "bees_image"# img_ants_dir = os.path.join(root_dir, ants_target_dir)# img_bees_dir = os.path.join(root_dir, bees_target_dir)trans_img = transforms.Compose([transforms.Resize((256, 256)),  # 确保所有图像都调整为256x256transforms.ToTensor(),# 如果需要的话# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),])ants_dataset = CustomAntsAndBeesImageDataset(root_dir=root_dir, target_dir=ants_target_dir, transform=trans_img)bees_dataset = CustomAntsAndBeesImageDataset(root_dir=root_dir, target_dir=bees_target_dir, transform=trans_img)ants_loader = DataLoader(dataset=ants_dataset, batch_size=64, shuffle=True)bees_loader = DataLoader(dataset=bees_dataset, batch_size=64, shuffle=True)james = James_bees_ants()writer = SummaryWriter("logs")epoch = 0step = 0# 输出蚂蚁数据集的原始图片和卷积后的图片for data in ants_loader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , ants_input_size = {}".format(epoch, imgs.shape))print("Epoch : {} , ants_output_size = {}".format(epoch, output.shape))writer.add_images("ants_in", imgs, step)#  将 6 通道转换回 3 通道output = torch.reshape(output, (-1, 3, 254, 254))print("Epoch : {} , ants_reshape_output_size = {}".format(epoch, output.shape))writer.add_images("ants_out", output, step)epoch = epoch + 1step = step + 1for data in bees_loader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , bees_input_size = {}".format(epoch, imgs.shape))print("Epoch : {} , bees_output_size = {}".format(epoch, output.shape))writer.add_images("bees_in", imgs, step)#  将 6 通道转换回 3 通道output = torch.reshape(output, (-1, 3, 254, 254))print("Epoch : {} , bees_reshape_output_size = {}".format(epoch, output.shape))writer.add_images("bees_out", output, step)epoch = epoch + 1step = step + 1writer.close()

在上述我们稍微处理了一下,因为每张图片的尺寸不一致,因此我们需要进行相应的裁剪,在代码中的Compose部分有体现。

图片展示如下:在这里插入图片描述
在这里插入图片描述
上述是我们自定义的相关数据集的操作,理论上来说只要写的对,适用于任何图片的情况,当然,我在查询PyTorch的相关文档时,发现PyTorch也提供了创建基于上传图片的数据。

具体的API为 torchvision.datasets.ImageFolder。

但是这个有一定的文件排布的要求,需要你的文件排布有规则。一个常见的做法是按照类别来组织图片,每个类别的图片放在单独的子文件夹中。例如:

dataset/class_1/img1.jpgimg2.jpg...class_2/img1.jpgimg2.jpg...

这样做有助于后续使用PyTorch加载数据集时更方便地应用标签。

我也是用了这个API 进行实现,我的文件排布如下:

Centered Image

使用下面的代码进行实现:
def use_ImageFolder():transform = transforms.Compose([transforms.Resize((256, 256)),transforms.ToTensor(),])# 加载数据集ants_train_dataset = datasets.ImageFolder(root='data/hymenoptera_data/train', transform=transform)ants_train_loader = DataLoader(ants_train_dataset, batch_size=64, shuffle=True)james = James_bees_ants()writer = SummaryWriter("logs")epoch = 0step = 0# 输出蚂蚁数据集的原始图片和卷积后的图片for data in ants_train_loader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , use_ImageFolder_ants_input_size = {}".format(epoch, imgs.shape))print("Epoch : {} , use_ImageFolder_ants_output_size = {}".format(epoch, output.shape))# writer.add_images("use_ImageFolder_ants_in", imgs, step)#  将 6 通道转换回 3 通道output = torch.reshape(output, (-1, 3, 254, 254))print("Epoch : {} , use_ImageFolder_ants_reshape_output_size = {}".format(epoch, output.shape))writer.add_images("use_ImageFolder_ants_out", output, step)epoch = epoch + 1step = step + 1writer.close()

生成的结果也是差不多的。

Centered Image

实际上,要是对图片的要求不是很高的话,可以直接调用torchvision.datasets.ImageFolder来快速加载数据集。这样可以节约大量的时间,并将这些时间用于优化或者设计算法上面去。

最后,完整的代码附上:

import osimport torch
import torchvisionfrom torchvision import transforms, datasets
from torch.nn import Conv2d
from torch.utils.data import DataLoader, Dataset
from torch import nn
from torch.utils.tensorboard import SummaryWriter
from PIL import Imagedataset = torchvision.datasets.FashionMNIST("./data", train=False, transform=torchvision.transforms.ToTensor(),download=True)dataloader = DataLoader(dataset, batch_size=64, shuffle=False)  # 定义批量大小为 64class James(nn.Module):def __init__(self):super(James, self).__init__()self.conv1 = Conv2d(in_channels=1, out_channels=3, kernel_size=3, stride=1, padding=0)def forward(self, x):x = self.conv1(x)return xdef run_error():james = James()writer = SummaryWriter("logs")epoch = 0step = 0for data in dataloader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , input_size = {}".format(epoch, imgs.shape))# Epoch : 0 , input_size = torch.Size([64, 1, 28, 28])print("Epoch : {} , output_size = {}".format(epoch, output.shape))# Epoch : 0 , output_size = torch.Size([64, 3, 26, 26])writer.add_images("in_e", imgs, step)writer.add_images("out_e", output, step)writer.add_images("out_e_plus", output, step)# 这里进行了两次卷积后的输出,尽管卷积操作相同,但我们发现在 tensorboard 中展示的图片颜色不一致# 查阅资料可知:卷积层生成的特征图通常包含负值或大于1的值,但图像像素值需要在0到1之间或者0到255之间的整数值。# 此外,这些特征图并不直接对应于可视化的RGB颜色空间,这导致了你观察到的颜色不一致现象。epoch = epoch + 1step = step + 1writer.close()def run_right():james = James()writer = SummaryWriter("logs")epoch = 0step = 0for data in dataloader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , input_size = {}".format(epoch, imgs.shape))# Epoch : 0 , input_size = torch.Size([64, 1, 28, 28])print("Epoch : {} , output_size = {}".format(epoch, output.shape))# Epoch : 0 , output_size = torch.Size([64, 3, 26, 26])writer.add_images("in_r", imgs, step)# 我希望将其变成 torch.Size([64, 3, 26, 26]) -> torch.Size([xxx, 1, 26, 26]),# 因为 FashionMNIST 数据集是黑白照片集,其通道数只有1,而不是传统的RGB三通道# 第一个数不知道是多少的时候直接写 -1 ,程序会根据后面进行计算output = torch.reshape(output, (-1, 1, 26, 26))print("Epoch : {} , reshape_output_size = {}".format(epoch, output.shape))writer.add_images("out_r", output, step)epoch = epoch + 1step = step + 1writer.close()#  =====================================================================================================================
#  自定义代码,将之前的蚂蚁蜜蜂数据集进行卷积操作class James_bees_ants(nn.Module):def __init__(self):super(James_bees_ants, self).__init__()self.conv1 = Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)def forward(self, x):x = self.conv1(x)return xclass CustomAntsAndBeesImageDataset(Dataset):def __init__(self, root_dir, target_dir, transform=None):self.img_dir = os.path.join(root_dir, target_dir)self.transform = transform# 获取目录中的所有图像文件名self.img_names = [f for f in os.listdir(self.img_dir) iff.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif'))]def __len__(self):return len(self.img_names)def __getitem__(self, idx):img_name = self.img_names[idx]img_path = os.path.join(self.img_dir, img_name)image = Image.open(img_path).convert('RGB')if self.transform:image = self.transform(image)return image, img_namedef run_ants_and_bees():root_dir = "data/hymenoptera_data/train"ants_target_dir = "ants_image"bees_target_dir = "bees_image"# img_ants_dir = os.path.join(root_dir, ants_target_dir)# img_bees_dir = os.path.join(root_dir, bees_target_dir)trans_img = transforms.Compose([transforms.Resize((256, 256)),  # 确保所有图像都调整为256x256transforms.ToTensor(),# 如果需要的话# transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),])ants_dataset = CustomAntsAndBeesImageDataset(root_dir=root_dir, target_dir=ants_target_dir, transform=trans_img)bees_dataset = CustomAntsAndBeesImageDataset(root_dir=root_dir, target_dir=bees_target_dir, transform=trans_img)ants_loader = DataLoader(dataset=ants_dataset, batch_size=64, shuffle=True)bees_loader = DataLoader(dataset=bees_dataset, batch_size=64, shuffle=True)james = James_bees_ants()writer = SummaryWriter("logs")epoch = 0step = 0# 输出蚂蚁数据集的原始图片和卷积后的图片for data in ants_loader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , ants_input_size = {}".format(epoch, imgs.shape))print("Epoch : {} , ants_output_size = {}".format(epoch, output.shape))writer.add_images("ants_in", imgs, step)#  将 6 通道转换回 3 通道output = torch.reshape(output, (-1, 3, 254, 254))print("Epoch : {} , ants_reshape_output_size = {}".format(epoch, output.shape))writer.add_images("ants_out", output, step)epoch = epoch + 1step = step + 1for data in bees_loader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , bees_input_size = {}".format(epoch, imgs.shape))print("Epoch : {} , bees_output_size = {}".format(epoch, output.shape))writer.add_images("bees_in", imgs, step)#  将 6 通道转换回 3 通道output = torch.reshape(output, (-1, 3, 254, 254))print("Epoch : {} , bees_reshape_output_size = {}".format(epoch, output.shape))writer.add_images("bees_out", output, step)epoch = epoch + 1step = step + 1writer.close()#  =====================================================================================================================
#  使用torchvision.datasets.ImageFolder来快速加载按上述方式组织的数据集
#  定义图像变换
def use_ImageFolder():transform = transforms.Compose([transforms.Resize((256, 256)),transforms.ToTensor(),])# 加载数据集ants_train_dataset = datasets.ImageFolder(root='data/hymenoptera_data/train', transform=transform)ants_train_loader = DataLoader(ants_train_dataset, batch_size=64, shuffle=True)james = James_bees_ants()writer = SummaryWriter("logs")epoch = 0step = 0# 输出蚂蚁数据集的原始图片和卷积后的图片for data in ants_train_loader:imgs, targets = dataoutput = james(imgs)print("Epoch : {} , use_ImageFolder_ants_input_size = {}".format(epoch, imgs.shape))print("Epoch : {} , use_ImageFolder_ants_output_size = {}".format(epoch, output.shape))# writer.add_images("use_ImageFolder_ants_in", imgs, step)#  将 6 通道转换回 3 通道output = torch.reshape(output, (-1, 3, 254, 254))print("Epoch : {} , use_ImageFolder_ants_reshape_output_size = {}".format(epoch, output.shape))writer.add_images("use_ImageFolder_ants_out", output, step)epoch = epoch + 1step = step + 1writer.close()if __name__ == "__main__":# run_right()# run_error()# run_ants_and_bees()use_ImageFolder()

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com

热搜词