EAST文本检测

原文：EAST文本检测 - 知乎 (zhihu.com)

一、文本检测

论文：

https://arxiv.org/pdf/1704.03155.pdfarxiv.org/pdf/1704.03155.pdf

一般的文本检测模型会分多个阶段（multi-stage）进行，在训练时需要把文本检测切割成多个阶段（stage）来进行学习，这种把完整文本行先分割检测再合并的方式，既影响了文本检测的精度又非常耗时，对于文本检测任务上中间过程处理得越多可能效果会越差。

EAST模型便简化了中间的过程步骤，直接实现端到端文本检测，优雅简洁，检测的准确性和速度都有了进一步的提升。

其中，（a）、（b）、（c）、（d）是几种常见的文本检测过程，典型的检测过程包括候选框提取、候选框过滤、bouding box回归、候选框合并等阶段，中间过程比较冗长。而（e）即是本文介绍的 EAST模型检测过程，从上图可看出，其过程简化为只有FCN阶段（全卷积网络）、NMS阶段（非极大抑制），中间过程大大缩减，而且输出结果支持文本行、单词的多个角度检测，既高效准确，又能适应多种自然应用场景。（d）为CTPN模型，虽然检测过程与（e）的EAST模型相似，但只支持水平方向的文本检测，可应用的场景不如EAST模型。

二、EAST模型结构

EAST模型的网络结构分为特征提取层、特征融合层、输出层三大部分。

class EAST(nn.Module):def __init__(self, pretrained=True):super(EAST, self).__init__()self.extractor = extractor(pretrained)self.merge     = merge()self.output    = output()def forward(self, x):return self.output(self.merge(self.extractor(x)))

1、特征提取层

论文中是基于PVANet（一种目标检测的模型）作为网络结构的骨干，这里代码我们使用VGG模型作为网络的backbone来提取特征，分别从stage1，stage2，stage3，stage4的卷积层抽取出特征图，卷积层的尺寸依次减半，但卷积核的数量依次增倍，这是一种“金字塔特征网络”（FPN，feature pyramid network）的思想。通过这种方式，可抽取出不同尺度的特征图，以实现对不同尺度文本行的检测（大的feature map擅长检测小物体，小的feature map擅长检测大物体）。

import torch
import torch.nn as nn
import torch.utils.model_zoo as model_zoo
import torch.nn.functional as F
import mathcfg = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M']def make_layers(cfg, batch_norm=False):layers = []in_channels = 3for v in cfg:if v == 'M':layers += [nn.MaxPool2d(kernel_size=2, stride=2)]else:conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)if batch_norm:layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]else:layers += [conv2d, nn.ReLU(inplace=True)]in_channels = vreturn nn.Sequential(*layers)class VGG(nn.Module):def __init__(self, features):super(VGG, self).__init__()self.features = featuresself.avgpool = nn.AdaptiveAvgPool2d((7, 7))self.classifier = nn.Sequential(nn.Linear(512 * 7 * 7, 4096),nn.ReLU(True),nn.Dropout(),nn.Linear(4096, 4096),nn.ReLU(True),nn.Dropout(),nn.Linear(4096, 1000),)for m in self.modules():if isinstance(m, nn.Conv2d):nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')if m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, nn.BatchNorm2d):nn.init.constant_(m.weight, 1)nn.init.constant_(m.bias, 0)elif isinstance(m, nn.Linear):nn.init.normal_(m.weight, 0, 0.01)nn.init.constant_(m.bias, 0)def forward(self, x):x = self.features(x)x = self.avgpool(x)x = x.view(x.size(0), -1)x = self.classifier(x)return xclass extractor(nn.Module):def __init__(self, pretrained):super(extractor, self).__init__()vgg16_bn = VGG(make_layers(cfg, batch_norm=True))if pretrained:vgg16_bn.load_state_dict(torch.load('./pths/vgg16_bn-6c64b313.pth'))self.features = vgg16_bn.featuresdef forward(self, x):out = []

相关资讯

热文排行

最新新闻

推荐新闻

热搜词