欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 新闻 > 会展 > linux之网络子系统- 地址解析协议arp 源码分析和邻居通用框架

linux之网络子系统- 地址解析协议arp 源码分析和邻居通用框架

2024/12/24 2:02:26 来源:https://blog.csdn.net/z20230508/article/details/143057200  浏览:    关键词:linux之网络子系统- 地址解析协议arp 源码分析和邻居通用框架

一、arp 的作用

ARP(Address Resolution Protocol,地址解析协议)是将IP地址解析为以太网MAC地址(物理地址)的协议。在局域网中,当主机或其他网络设备有数据要发送给另一个主机或设备时,它必须知道对方的网络层地址(即IP地址)。但是仅仅有IP地址是不够的,因为IP数据报文必须封装成帧才能通过物理网络发送。因此发送方还需要有接收方的物理地址,也就需要一个从IP地址到物理地址的映射。

1、arp 帧格式

每个字段的内容信息不解释了,可去网上搜索。

 2、arp 命令查询

可以看出,ip地址与MAC地址映射关系 。

3、arp 在内核网络协议栈的什么环节?

下面这个图,可以展示arp 在通用邻居接口层:

 网络协议栈中ARP肯定是处于调用物理网卡驱动之前的环节,因为需要把IP数据包或者分片中的目的IP地址转换成MAC地址,然后才能调用网卡驱动发送MAC帧。如果命中缓存,即刻发送数据包,如果没有,在广播ARP帧找到目的IP地址对应的MAC地址。

在IPv4 中,是叫ARP,

在IPv6中,ARP(地址解析协议)被替代为ND(邻居发现协议,Neighbor Discovery Protocol)。ND协议用于在IPv6网络中实现以下功能:

  1. 地址解析:类似于ARP,ND用于解析IPv6地址到MAC地址。
  2. 邻居可达性:设备可以检测与其他设备的连通性。
  3. 路由器发现:设备可以找到网络中的路由器。
  4. 前缀寻址:获取网络前缀信息。

ND协议使用ICMPv6(互联网控制消息协议版本6)进行通信,提供了一种更全面的机制来管理IPv6网络中的设备发现和邻居管理。

4、arp 缓存是有超时的。

5、免费arp 和地址冲突检测

这个的目的就是 检测主机即将使用的IP地址,是否在同一个网络中被使用了。具体的检测过程,不细讲。

6、arp 可以设置一台嵌入式设备的IPv4地址

一般有两种方式,一是DHCP,一是ARP设置。

二、arp  在内核源码中的实现

2.1 arp 在网络协议栈调用的位置

arp协议是围着一个哈希表的数据结构进行的,包括对节点的增删改查,一些回调函数的设置。

arp是内核中的邻居子系统的一部分,因为IPv6 的邻居子系统是ND。可以理解为neighbour 是core, ARP 和ND协议是 注册进core 的两个驱动,当IP层调用ip_output 时,具体调用ARP驱动还是ND驱动,根据提供的变量来区分。下面先来看一下网络数据包在网络协议栈中的流程,只聚焦在IP层调用邻居子系统的接口。

从上面的网络协议栈调用流程,可以看出 ip_neigh_for_gw 是 neighbour 层的通用接口函数,接着调用到 IPv4 的ARP协议驱动,然后就是网络设备层dev_queue_xmit 函数。

2.2 具体介绍ARP之前,先介绍网络协议栈 的 通用邻居框架

框架图:(在一本书上的图)

为什么需要邻居子系统?

因为在网络上发送报文的时候除了需要知道目的IP地址还需要知道邻居的L2 MAC地址,为什么是邻居的L2 MAC地址而不是目的地的L2地址?这是因为目的地网络可能和源主机不在同一个网段,因此需要借助其他离目的地近的网关帮我们传输,此时网关就是邻居。如果目的地和源主机在同一个LAN上的话,它们就是邻居。邻居子系统的核心功能 就是完成L3地址到L2地址的映射,也即使ARP协议,并提供网络层和驱动程序底层之间的接口。 

具体来说,当发送数据的时候,在邻居表里面查找邻居项,查找关键词就是目的地址映射的MAC地址,找到邻居项之后,就调用 n->output(回调函数)发送出去。那么问题来了,邻居项是什么?邻居项是如何分配的?邻居项的组织结构又是什么样子?此外,邻居项的管理又该怎么做?以上这些问题就是邻居子系统需要解决的问题。

首先邻居项是存储了到达邻居信息的结构体,表示与当前链路相连的网络结点,如下:

// kernel/include/net/neighbour.h
struct neighbour {struct neighbour __rcu  *next;       //指向下一个邻居项struct neigh_table      *tbl;        //邻居表struct neigh_parms      *parms;      //邻居协议参数unsigned long           confirmed;   //可到达性确认时间unsigned long           updated;     //邻居状态跟新时间rwlock_t                lock;        // 读写锁refcount_t              refcnt;      //引用计数unsigned int            arp_queue_len_bytes;   //发送缓存队列长度struct sk_buff_head     arp_queue;             //发送缓存队列struct timer_list       timer;                 //邻居项定时器unsigned long           used;                  // 使用时间标志位atomic_t                probes;                //探测次数__u8                    flags;__u8                    nud_state;             //邻居状态标志位__u8                    type;                  //地址类型__u8                    dead;                  //废弃标志u8                      protocol;              //协议类型 IPv4 还是IPv6seqlock_t               ha_lock;               //地址保护锁unsigned char           ha[ALIGN(MAX_ADDR_LEN, sizeof(unsigned long))] __aligned(8);                                          //MAC地址  struct hh_cache         hh;                    // L2帧头缓存int                     (*output)(struct neighbour *, struct sk_buff *);//提供给L3的发送接口const struct neigh_ops  *ops;         //函数表,随邻居状态变更struct list_head        gc_list;struct rcu_head         rcu;struct net_device       *dev;      //网卡设备u8                      primary_key[0];    //占位符,保存地址信息
} __randomize_layout;

邻居子系统提供一套通用的框架,供邻居协议使用,目前使用的协议包括ARP(IPv4),ND(IPv6)。虽然协议不同,但是都是使用同一套结构体。每个协议会建立自己的邻居表(struct  neigh_table),arp 使用的是arp_tbl,ND协议使用nd_tabl, neigh_table 结构如下:

//kernel/include/net/neighbour.h
struct neigh_table {int                     family;   // 协议族, AF_INET,AF_INET6unsigned int            entry_size;  //邻居项大小unsigned int            key_len;     // IP地址长度,IPv4是4字节,IPv6是16字节__be16                  protocol;    // 协议IPv4 或者IPv6__u32                   (*hash)(const void *pkey,const struct net_device *dev,__u32 *hash_rnd); // 计算hash 值得函数bool                    (*key_eq)(const struct neighbour *, const void *pkey);int                     (*constructor)(struct neighbour *);  //邻居项构造函数int                     (*pconstructor)(struct pneigh_entry *); //代理邻居项构造函数void                    (*pdestructor)(struct pneigh_entry *);  //反之void                    (*proxy_redo)(struct sk_buff *skb);    //重新代理int                     (*is_multicast)(const void *pkey);    bool                    (*allow_add)(const struct net_device *dev,struct netlink_ext_ack *extack);char                    *id;    // 邻居表的名称,IPv4-->arp_cache,IPv6-->ndisc_cachestruct neigh_parms      parms;  // 邻配置参数struct list_head        parms_list;  int                     gc_interval;  // gc回收时间 ,邻居子系统核心不直接使用,下面的变量用于邻居表条目阈值,用作激活同步垃圾收集器的条件,也用于异步垃圾收集器处理程序 neigh_periocdic_work() 之中int                     gc_thresh1;   // 邻居表占用内存阈值int                     gc_thresh2;int                     gc_thresh3;unsigned long           last_flush;  // 记录gc 上一次清理时间,最近一次运行 neigh_forced_gc 时间struct delayed_work     gc_work;     // gc 任务队列,异步垃圾收集器处理程序struct timer_list       proxy_timer; // 代理功能定时器,主机被配置为ARP 代理时,它可能不会立即处理请示,而是过一段时间再处理struct sk_buff_head     proxy_queue; // 由SKB组成的代表ARP队列atomic_t                entries;     //邻居项个数atomic_t                gc_entries;  struct list_head        gc_list;rwlock_t                lock;unsigned long           last_rand;struct neigh_statistics __percpu *stats;  //统计信息struct neigh_hash_table __rcu *nht;       //邻居项hash 表struct pneigh_entry     **phash_buckets;  // 代理邻居项表
};

neigh_table 中的成员变量 nht 是一个hash 链表,所有相同协议的邻居项都挂在这里。

每个邻居对象neighbour 中定义 neigh_ops 一组方法,它包含一个协议族成员和4个函数指针:

struct neigh_ops {
//对于IPv4, 它是AF_INET;对于IPv6, 它是AF_INET6int                     family; 
//此方法负载发送邻居请求void                    (*solicit)(struct neighbour *, struct sk_buff *);
//在邻居状态为NUD_FAILED时,将在方法 neigh_invalidate() 调用这个方法void                    (*error_report)(struct neighbour *, struct sk_buff *);
//在下一跳L3地址已知,但未能解析出L2地址时,应将outout回调函数来处理int                     (*output)(struct neighbour *, struct sk_buff *);int                     (*connected_output)(struct neighbour *, struct sk_buff *);
};

2.3 邻居表和邻居表项的关系

 上图展示了邻居表项的组织结构。

邻居表项是有生命周期的,有状态的变化。下面介绍邻居表项的状态变化:

当网络层发送报文前首先需要查找路由,出口路由是和邻居绑定的。路由查找完成后会调用邻居层提供的output接口发送。发送函数output 会随着邻居项的状态改变。比如,刚建立邻居项的时候,这时候还不知道邻居的MAC地址,这个邻居项不能使用,因为是初始化,所以状态是NONE,此时如果发送报文,必须先使用邻居协议发送solicit 请求,这个时候邻居项的状态会变成INCOMPLETE(未完成),创建邻居项的时候会自动起一个定时器,当定时器超时会检查邻居项的状态并作出适当改变。当发送solicit 请求一段时间没有响应,定时器就会超时,这时会根据当前状态判断是否需要重传,重传的次数是一定的,不可能一直重传下去。每次重传之后定时器会自动重启,定时器的时间也是根据配置来的,重传定时器时间是neigh->parms->retrans_time.此外,在发送solicit请求期间是没有办法传输报文的,这个时候怎么办?这个时候需要把这个报文放到neigh->arp_queue缓存队列里,当然队列也是有长度的,不可能无限存储,不然内存就不够了,默认是三个报文,溢出后简单丢弃最先进来的。队列长度是可配的。

假设收到了响应,这个时候邻居状态就会从INCOMPLETE状态迁移到REACHABLE(可到达),这个时候邻居是可到达的,除了迁移状态外,还需要把缓存队列里面的报文发送出去。

当然状态不可能一直是REACHEABLE,可能邻居down掉了,或者我们的设备自己挂掉了,这时邻居状态必须更改。通常情况下,如果一段时间不用,邻居状态就会从REACHABLE状态迁移到STALE(旧)状态,这个时候需要可到达性确认了。

如果在gc_staletime 没有使用的话,就会迁移到fail,此时gc 定时回收。如果在gc_staletime 期间有使用的话,状态迁移到delay状态,相当于延迟迁移到fail状态,在delay状态经过delay_probe_time状态没有更新的话就会进入probe状态,这个状态需要主动发送探测报文,发送探测报文次数是有限的,超时的话只能丢弃了。

邻居状态迁移图如下:

2.4  邻居子系统的初始化

// kernel/net/core/neighbour.cstatic int __init neigh_init(void)
{rtnl_register(PF_UNSPEC, RTM_NEWNEIGH, neigh_add, NULL, 0);rtnl_register(PF_UNSPEC, RTM_DELNEIGH, neigh_delete, NULL, 0);rtnl_register(PF_UNSPEC, RTM_GETNEIGH, neigh_get, neigh_dump_info, 0);rtnl_register(PF_UNSPEC, RTM_GETNEIGHTBL, NULL, neightbl_dump_info,0);rtnl_register(PF_UNSPEC, RTM_SETNEIGHTBL, neightbl_set, NULL, 0);return 0;
}subsys_initcall(neigh_init);

 neigh_init 仅仅注册应用层的回调处理函数,比如下面这条,添加一条邻居项,通过dev设备到达10.0.0.3 需要发送到0:0:0:0:0:1 

ip neigh add 10.0.0.3 lladdr 0:0:0:0:0:1 dev eth0 nud perm

 这条命令会通过netlink 下发到内核,最终由邻居子系统注册的neigh_add 函数处理,这个函数首先进行参数的合理性检查,没问题就将其加入到对应的邻居表中。

static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh,struct netlink_ext_ack *extack)
{int flags = NEIGH_UPDATE_F_ADMIN | NEIGH_UPDATE_F_OVERRIDE |NEIGH_UPDATE_F_OVERRIDE_ISROUTER;struct net *net = sock_net(skb->sk);struct ndmsg *ndm;struct nlattr *tb[NDA_MAX+1];struct neigh_table *tbl;struct net_device *dev = NULL;struct neighbour *neigh;void *dst, *lladdr;u8 protocol = 0;int err;ASSERT_RTNL();//参数合法性检查err = nlmsg_parse_deprecated(nlh, sizeof(*ndm), tb, NDA_MAX,nda_policy, extack);if (err < 0)goto out;err = -EINVAL;//邻居目的地地址都不存在,就不要继续搞了//毕竟邻居项的灵魂之一就是L3地址if (!tb[NDA_DST]) {NL_SET_ERR_MSG(extack, "Network address not specified");goto out;}ndm = nlmsg_data(nlh);if (ndm->ndm_ifindex) {//提取出口设备,如果获取失败的话返回错误,邻居项和出口绑定的dev = __dev_get_by_index(net, ndm->ndm_ifindex);if (dev == NULL) {err = -ENODEV;goto out;}//检查邻居L2地址长度是否合法if (tb[NDA_LLADDR] && nla_len(tb[NDA_LLADDR]) < dev->addr_len) {NL_SET_ERR_MSG(extack, "Invalid link address");goto out;}}//遍历邻居表,可能的选项IPv4 的arp_tbl 和IPv6 的nd_tbltbl = neigh_find_table(ndm->ndm_family);if (tbl == NULL)return -EAFNOSUPPORT;//检查长度是否合法,IPv4长度是4字节,IPv6 长度是6字节if (nla_len(tb[NDA_DST]) < (int)tbl->key_len) {NL_SET_ERR_MSG(extack, "Invalid network address");goto out;}dst = nla_data(tb[NDA_DST]);lladdr = tb[NDA_LLADDR] ? nla_data(tb[NDA_LLADDR]) : NULL;if (tb[NDA_PROTOCOL])protocol = nla_get_u8(tb[NDA_PROTOCOL]);//添加代理if (ndm->ndm_flags & NTF_PROXY) {struct pneigh_entry *pn;err = -ENOBUFS;//查找代理表,如果存在的话更新,否则创建pn = pneigh_lookup(tbl, net, dst, dev, 1);if (pn) {pn->flags = ndm->ndm_flags;if (protocol)pn->protocol = protocol;err = 0;}goto out;}if (!dev) {NL_SET_ERR_MSG(extack, "Device not specified");goto out;}if (tbl->allow_add && !tbl->allow_add(dev, extack)) {err = -EINVAL;goto out;}//先查找邻居表项是否存在neigh = neigh_lookup(tbl, dst, dev);if (neigh == NULL) {bool exempt_from_gc;//邻居不存在,如果没有创建标志位报错返回if (!(nlh->nlmsg_flags & NLM_F_CREATE)) {err = -ENOENT;goto out;}//查找失败,就会自动创建新的邻居项exempt_from_gc = ndm->ndm_state & NUD_PERMANENT ||ndm->ndm_flags & NTF_EXT_LEARNED;neigh = ___neigh_create(tbl, dst, dev,ndm->ndm_flags & NTF_EXT_LEARNED,exempt_from_gc, true);if (IS_ERR(neigh)) {err = PTR_ERR(neigh);goto out;}} else {//如果存在排他标志位,返回报错if (nlh->nlmsg_flags & NLM_F_EXCL) {err = -EEXIST;neigh_release(neigh);goto out;}//如果不存在替换标志位,就不要覆盖了if (!(nlh->nlmsg_flags & NLM_F_REPLACE))flags &= ~(NEIGH_UPDATE_F_OVERRIDE |NEIGH_UPDATE_F_OVERRIDE_ISROUTER);}if (protocol)neigh->protocol = protocol;if (ndm->ndm_flags & NTF_EXT_LEARNED)flags |= NEIGH_UPDATE_F_EXT_LEARNED;if (ndm->ndm_flags & NTF_ROUTER)flags |= NEIGH_UPDATE_F_ISROUTER;if (ndm->ndm_flags & NTF_USE)flags |= NEIGH_UPDATE_F_USE;//更新邻居表项err = __neigh_update(neigh, lladdr, ndm->ndm_state, flags,NETLINK_CB(skb).portid, extack);if (!err && ndm->ndm_flags & NTF_USE) {neigh_event_send(neigh, NULL);err = 0;}//释放引用计数,查找的时候会加 1, 现在不用了,减一。neigh_release(neigh);
out:return err;
}

2.5 arp 协议的源码分析(重点)

内核邻居子系统定义了一个基本的框架,使得不同的邻居协议可以共用一套代码。在此就是指IPv4 的arp 驱动和IPv6 的ND(neighbour discovery protol)驱动。邻居子系统位于网络层和流量控制子系统中间,它提供给L3向下发送的接口。接下来是内核源码分析:

 L3: ip_finish_output2

// kernel/net/ipv4/ip_output.c
static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *skb)
{struct dst_entry *dst = skb_dst(skb);struct rtable *rt = (struct rtable *)dst;struct net_device *dev = dst->dev;unsigned int hh_len = LL_RESERVED_SPACE(dev);struct neighbour *neigh;bool is_v6gw = false;if (rt->rt_type == RTN_MULTICAST) {IP_UPD_PO_STATS(net, IPSTATS_MIB_OUTMCAST, skb->len);} else if (rt->rt_type == RTN_BROADCAST)IP_UPD_PO_STATS(net, IPSTATS_MIB_OUTBCAST, skb->len);/* Be paranoid, rather than too clever. */if (unlikely(skb_headroom(skb) < hh_len && dev->header_ops)) {struct sk_buff *skb2;skb2 = skb_realloc_headroom(skb, LL_RESERVED_SPACE(dev));if (!skb2) {kfree_skb(skb);return -ENOMEM;}if (skb->sk)skb_set_owner_w(skb2, skb->sk);consume_skb(skb);skb = skb2;}if (lwtunnel_xmit_redirect(dst->lwtstate)) {int res = lwtunnel_xmit(skb);if (res < 0 || res == LWTUNNEL_XMIT_DONE)return res;}rcu_read_lock_bh();
// 查找邻居项 ,调用了路由相关头文件的函数接口,里面封装了邻居子系统的接口neigh = ip_neigh_for_gw(rt, skb, &is_v6gw);if (!IS_ERR(neigh)) {int res;sock_confirm_neigh(skb, neigh);/* if crossing protocols, can not use the cached header */// 调用邻居层接口 发送报文res = neigh_output(neigh, skb, is_v6gw);rcu_read_unlock_bh();return res;}rcu_read_unlock_bh();net_dbg_ratelimited("%s: No header cache and no neighbour!\n",__func__);kfree_skb(skb);return -EINVAL;
}

先看查找邻居项:(路由的头文件)

// kernel/include/net/route.h
static inline struct neighbour *ip_neigh_for_gw(struct rtable *rt,struct sk_buff *skb,bool *is_v6gw)
{struct net_device *dev = rt->dst.dev;// 根据路由表获取网卡设备 net_devicestruct neighbour *neigh;if (likely(rt->rt_gw_family == AF_INET)) {  //IPv4neigh = ip_neigh_gw4(dev, rt->rt_gw4);} else if (rt->rt_gw_family == AF_INET6) {   //IPv6neigh = ip_neigh_gw6(dev, &rt->rt_gw6);*is_v6gw = true;} else {neigh = ip_neigh_gw4(dev, ip_hdr(skb)->daddr);}return neigh;
}

目前以IPv4 为例:ip_neigh_gw4 (路由的头文件)

//kernel/include/net/route.h
static inline struct neighbour *ip_neigh_gw4(struct net_device *dev,__be32 daddr)
{struct neighbour *neigh;//根据目的IP地址daddr 查找邻居项是否存在neigh = __ipv4_neigh_lookup_noref(dev, (__force u32)daddr);if (unlikely(!neigh))//如果不存在,就创建neigh = __neigh_create(&arp_tbl, &daddr, dev, false);return neigh;
}

重点来了:通过ip_neigh_gw4 可以发现出口设备 net_device 是和邻居项绑定的,也就是说 路由表是和邻居项绑定的。如果路由表的发送网卡接口 没有目的地邻居项是发送不了的,即便其他网口设备有相关的目的地IP地址映射的MAC地址。

接着看查找邻居项:__ipv4_neigh_lookup_noref  (arp的头文件)

//kernel/include/net/arp.hstatic inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key)
{if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))//检查是否是回环设备或者端到端key = INADDR_ANY;// 这个arp_tbl 就是定义的 全局变量return ___neigh_lookup_noref(&arp_tbl, neigh_key_eq32, arp_hashfn, &key, dev);
}

___neigh_lookup_noref

// kernel/include/net/neighbour.h
static inline struct neighbour *___neigh_lookup_noref(struct neigh_table *tbl,bool (*key_eq)(const struct neighbour *n, const void *pkey),__u32 (*hash)(const void *pkey,const struct net_device *dev,__u32 *hash_rnd),const void *pkey,struct net_device *dev)
{struct neigh_hash_table *nht = rcu_dereference_bh(tbl->nht);//hash表,邻居数量大时加速. nht就是上面说到的hash表,相同协议的邻居项都在这里struct neighbour *n;u32 hash_val;hash_val = hash(pkey, dev, nht->hash_rnd) >> (32 - nht->hash_shift);//计算hash 值for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]);n != NULL;n = rcu_dereference_bh(n->next)) {if (n->dev == dev && key_eq(n, pkey))//dev相同并且pkey相同,pkey是目的IPv4地址return n;}return NULL;
}

通过hash 算法计算在哪个hash_buckets,然后去遍历对比dev 和pkey。有就返回n, 没有就返回NULL,没有就创建 neigh。

接下来创建 neigh: __neigh_create

//kernel/net/core/neighbour.cstruct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey,struct net_device *dev, bool want_ref)
{return ___neigh_create(tbl, pkey, dev, 0, false, want_ref);
}
EXPORT_SYMBOL(__neigh_create);static struct neighbour *
___neigh_create(struct neigh_table *tbl, const void *pkey,struct net_device *dev, u8 flags,bool exempt_from_gc, bool want_ref)
{u32 hash_val, key_len = tbl->key_len;struct neighbour *n1, *rc, *n;struct neigh_hash_table *nht;int error;n = neigh_alloc(tbl, dev, flags, exempt_from_gc); //创建邻居表项对象trace_neigh_create(tbl, dev, pkey, n, exempt_from_gc);if (!n) {rc = ERR_PTR(-ENOBUFS);goto out;}memcpy(n->primary_key, pkey, key_len);n->dev = dev;dev_hold(dev);/* Protocol specific setup. */ //IPv4 实际调用arp_constructor函数,设置output函数if (tbl->constructor && (error = tbl->constructor(n)) < 0) {rc = ERR_PTR(error);goto out_neigh_release;}if (dev->netdev_ops->ndo_neigh_construct) { //一般设备不设置该变量error = dev->netdev_ops->ndo_neigh_construct(dev, n);if (error < 0) {rc = ERR_PTR(error);goto out_neigh_release;}}/* Device specific setup. */if (n->parms->neigh_setup &&(error = n->parms->neigh_setup(n)) < 0) { //IPv4 未定义该函数rc = ERR_PTR(error);goto out_neigh_release;}n->confirmed = jiffies - (NEIGH_VAR(n->parms, BASE_REACHABLE_TIME) << 1);write_lock_bh(&tbl->lock);nht = rcu_dereference_protected(tbl->nht,lockdep_is_held(&tbl->lock));if (atomic_read(&tbl->entries) > (1 << nht->hash_shift))nht = neigh_hash_grow(tbl, nht->hash_shift + 1);hash_val = tbl->hash(n->primary_key, dev, nht->hash_rnd) >> (32 - nht->hash_shift);// 计算hash值,计算方式由邻居表定义if (n->parms->dead) {rc = ERR_PTR(-EINVAL);goto out_tbl_unlock;}//找到有相同hash 值的 neighbour链表for (n1 = rcu_dereference_protected(nht->hash_buckets[hash_val],lockdep_is_held(&tbl->lock));n1 != NULL;n1 = rcu_dereference_protected(n1->next,lockdep_is_held(&tbl->lock))) {if (dev == n1->dev && !memcmp(n1->primary_key, n->primary_key, key_len)){if (want_ref)neigh_hold(n1);rc = n1;goto out_tbl_unlock;}}n->dead = 0;if (!exempt_from_gc)list_add_tail(&n->gc_list, &n->tbl->gc_list);if (want_ref)neigh_hold(n);//通过RCU的方式,插入到链表中rcu_assign_pointer(n->next,rcu_dereference_protected(nht->hash_buckets[hash_val],lockdep_is_held(&tbl->lock)));rcu_assign_pointer(nht->hash_buckets[hash_val], n);write_unlock_bh(&tbl->lock);neigh_dbg(2, "neigh %p is created\n", n);rc = n;
out:return rc;
out_tbl_unlock:write_unlock_bh(&tbl->lock);
out_neigh_release:if (!exempt_from_gc)atomic_dec(&tbl->gc_entries);neigh_release(n);goto out;
}

neigh_alloc:

//kernel/net/core/neighbour.c
static struct neighbour *neigh_alloc(struct neigh_table *tbl,struct net_device *dev,u8 flags, bool exempt_from_gc)
{struct neighbour *n = NULL;unsigned long now = jiffies;int entries;if (exempt_from_gc)goto do_alloc;entries = atomic_inc_return(&tbl->gc_entries) - 1;if (entries >= tbl->gc_thresh3 ||(entries >= tbl->gc_thresh2 &&time_after(now, tbl->last_flush + 5 * HZ))) {if (!neigh_forced_gc(tbl) &&entries >= tbl->gc_thresh3) {net_info_ratelimited("%s: neighbor table overflow!\n",tbl->id);NEIGH_CACHE_STAT_INC(tbl, table_fulls);goto out_entries;}}do_alloc:n = kzalloc(tbl->entry_size + dev->neigh_priv_len, GFP_ATOMIC);if (!n)goto out_entries;__skb_queue_head_init(&n->arp_queue); // 初始化arp_queue队列rwlock_init(&n->lock);seqlock_init(&n->ha_lock);n->updated        = n->used = now;n->nud_state      = NUD_NONE;         //状态为不可用n->output         = neigh_blackhole;  //直接丢弃报文n->flags          = flags;seqlock_init(&n->hh.hh_lock);n->parms          = neigh_parms_clone(&tbl->parms);//拷贝neigh_table 的parmstimer_setup(&n->timer, neigh_timer_handler, 0);  //注册定时器NEIGH_CACHE_STAT_INC(tbl, allocs);n->tbl            = tbl;refcount_set(&n->refcnt, 1);n->dead           = 1;INIT_LIST_HEAD(&n->gc_list);atomic_inc(&tbl->entries);
out:return n;out_entries:if (!exempt_from_gc)atomic_dec(&tbl->gc_entries);goto out;
}

arp_constructor:

//kernel/net/ipv4/arp.c
static int arp_constructor(struct neighbour *neigh)
{__be32 addr;struct net_device *dev = neigh->dev;struct in_device *in_dev;struct neigh_parms *parms;u32 inaddr_any = INADDR_ANY;if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))memcpy(neigh->primary_key, &inaddr_any, arp_tbl.key_len);addr = *(__be32 *)neigh->primary_key;rcu_read_lock();in_dev = __in_dev_get_rcu(dev); // 通过net_device 得到in_deviceif (!in_dev) {rcu_read_unlock();return -EINVAL;}neigh->type = inet_addr_type_dev_table(dev_net(dev), dev, addr);//设置地址类型parms = in_dev->arp_parms;__neigh_parms_put(neigh->parms);neigh->parms = neigh_parms_clone(parms);rcu_read_unlock();if (!dev->header_ops) { //基本上 网卡都会设置该值neigh->nud_state = NUD_NOARP;neigh->ops = &arp_direct_ops;neigh->output = neigh_direct_output;} else {/* Good devices (checked by reading texts, but only Ethernet istested)ARPHRD_ETHER: (ethernet, apfddi)ARPHRD_FDDI: (fddi)ARPHRD_IEEE802: (tr)ARPHRD_METRICOM: (strip)ARPHRD_ARCNET:etc. etc. etc.ARPHRD_IPDDP will also work, if author repairs it.I did not it, because this driver does not work evenin old paradigm.*/if (neigh->type == RTN_MULTICAST) { //组播地址不需要arpneigh->nud_state = NUD_NOARP;arp_mc_map(addr, neigh->ha, dev, 1);} else if (dev->flags & (IFF_NOARP | IFF_LOOPBACK)) {//设备明确不需要arp,或者本地回环设备,不需要arpneigh->nud_state = NUD_NOARP;memcpy(neigh->ha, dev->dev_addr, dev->addr_len);} else if (neigh->type == RTN_BROADCAST ||(dev->flags & IFF_POINTOPOINT)) {// 广播或点对点,不需要arpneigh->nud_state = NUD_NOARP;memcpy(neigh->ha, dev->broadcast, dev->addr_len);}if (dev->header_ops->cache)  // eth_header_ops 包含cacheneigh->ops = &arp_hh_ops;elseneigh->ops = &arp_generic_ops;if (neigh->nud_state & NUD_VALID)neigh->output = neigh->ops->connected_output;elseneigh->output = neigh->ops->output; //初始阶段为该值,即arp_hh_ops的neigh_resolve_output 函数}return 0;
}

邻居表项创建后,output函数为neigh_resolve_output,此时邻居子系统还不具备发送IP报文的能力,因为目的MAC地址还未获取。接下来继续分析ip_finish_output2 中的neigh_output() 函数。

neigh_output:调用邻居子系统封装MAC头,并且调用二层发包函数完成报文发送

// kernel/include/net/neighbour.h
static inline int neigh_output(struct neighbour *n, struct sk_buff *skb,bool skip_cache)
{const struct hh_cache *hh = &n->hh;/* n->nud_state and hh->hh_len could be changed under us.* neigh_hh_output() is taking care of the race later.*/if (!skip_cache &&(READ_ONCE(n->nud_state) & NUD_CONNECTED) &&READ_ONCE(hh->hh_len))   // 如果neighbour 已连接且hh已设置return neigh_hh_output(hh, skb);return n->output(n, skb); //初始阶段调用此函数,此时为neigh_resolve_output函数
}

 neigh_resolve_output:

//kernel/net/core/neighbour.c
/* Slow and careful. */int neigh_resolve_output(struct neighbour *neigh, struct sk_buff *skb)
{int rc = 0;if (!neigh_event_send(neigh, skb)) { //发送arp请求,第一次返回trueint err;struct net_device *dev = neigh->dev;unsigned int seq;if (dev->header_ops->cache && !READ_ONCE(neigh->hh.hh_len))neigh_hh_init(neigh); //初始化MAC缓存值,目的是加速do {__skb_pull(skb, skb_network_offset(skb));// skb指向网络层headerseq = read_seqbegin(&neigh->ha_lock);err = dev_hard_header(skb, dev, ntohs(skb->protocol),neigh->ha, NULL, skb->len); // 封装MAC头} while (read_seqretry(&neigh->ha_lock, seq));if (err >= 0)rc = dev_queue_xmit(skb); // 二层发送报文elsegoto out_kfree_skb;}
out:return rc;
out_kfree_skb:rc = -EINVAL;kfree_skb(skb);goto out;
}
EXPORT_SYMBOL(neigh_resolve_output);

neigh_event_send:

static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
{unsigned long now = jiffies;if (READ_ONCE(neigh->used) != now)WRITE_ONCE(neigh->used, now);if (!(neigh->nud_state&(NUD_CONNECTED|NUD_DELAY|NUD_PROBE)))return __neigh_event_send(neigh, skb); //发送arp 请求return 0;
}

__neigh_event_send:

int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
{int rc;bool immediate_probe = false;write_lock_bh(&neigh->lock);rc = 0;if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))goto out_unlock_bh;if (neigh->dead)goto out_dead;if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) { //初始阶段进入此分支if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +NEIGH_VAR(neigh->parms, APP_PROBES)) {unsigned long next, now = jiffies;atomic_set(&neigh->probes,NEIGH_VAR(neigh->parms, UCAST_PROBES));neigh_del_timer(neigh);neigh->nud_state     = NUD_INCOMPLETE; //设置表项状态为incompleteneigh->updated = now;next = now + max(NEIGH_VAR(neigh->parms, RETRANS_TIME),HZ/100);neigh_add_timer(neigh, next); //触发定时器,期望刷新表项状态和output 函数,500毫秒后执行immediate_probe = true;} else {neigh->nud_state = NUD_FAILED;neigh->updated = jiffies;write_unlock_bh(&neigh->lock);kfree_skb(skb);return 1;}} else if (neigh->nud_state & NUD_STALE) {neigh_dbg(2, "neigh %p is delayed\n", neigh);neigh_del_timer(neigh);neigh->nud_state = NUD_DELAY;neigh->updated = jiffies;neigh_add_timer(neigh, jiffies +NEIGH_VAR(neigh->parms, DELAY_PROBE_TIME));}if (neigh->nud_state == NUD_INCOMPLETE) {if (skb) {while (neigh->arp_queue_len_bytes + skb->truesize >NEIGH_VAR(neigh->parms, QUEUE_LEN_BYTES)) {struct sk_buff *buff;//如果等待发送的报文数量超过阈值,丢弃报文buff = __skb_dequeue(&neigh->arp_queue);if (!buff)break;neigh->arp_queue_len_bytes -= buff->truesize;kfree_skb(buff);NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards);}skb_dst_force(skb);__skb_queue_tail(&neigh->arp_queue, skb);//报文放入arp_queue队列中neigh->arp_queue_len_bytes += skb->truesize;}rc = 1;}
out_unlock_bh:if (immediate_probe) //初始阶段,邻居项设置状态为incomplete,同时设置该变量为trueneigh_probe(neigh); // 探测邻居表项elsewrite_unlock(&neigh->lock);local_bh_enable();trace_neigh_event_send_done(neigh, rc);return rc;out_dead:if (neigh->nud_state & NUD_STALE)goto out_unlock_bh;write_unlock_bh(&neigh->lock);kfree_skb(skb);trace_neigh_event_send_dead(neigh, 1);return 1;
}
EXPORT_SYMBOL(__neigh_event_send);

neigh_probe函数:

//kernel/net/core/neighbour.c
static void neigh_probe(struct neighbour *neigh)__releases(neigh->lock)
{struct sk_buff *skb = skb_peek_tail(&neigh->arp_queue); //取出报文/* keep skb alive even if arp_queue overflows */if (skb)skb = skb_clone(skb, GFP_ATOMIC); //拷贝skbwrite_unlock(&neigh->lock);if (neigh->ops->solicit)neigh->ops->solicit(neigh, skb);//实际调用arp_solicit函数,发送arp请求atomic_inc(&neigh->probes);consume_skb(skb);
}

重点:从上述函数可以看到,报文并没有被发送出去,做了3个事情:1)发送了arp请求, 2)缓存了报文,3)启动定时器500毫秒后执行。 报文被丢弃了? 没有,其实报文是在neigh_update函数中被发送的,该函数的一个调用者是arp处理函数。 调用neigh_update函数后,neigh的output函数被改变,在这个之前,ouput函数仍然是neigh_resolve_output,如果是同一个目的IP,不会再次发送arp请求,仅仅把报文缓存起来。

下面我们分析一下 arp 处理函数的定义:

arp 模块初始化:

static struct packet_type arp_packet_type __read_mostly = {.type = cpu_to_be16(ETH_P_ARP),.func = arp_rcv,  //这就是传说中的ARP处理函数
};static int arp_proc_init(void);void __init arp_init(void) //初始化
{neigh_table_init(NEIGH_ARP_TABLE, &arp_tbl); 注册arp tabledev_add_pack(&arp_packet_type); //注册arp 报文处理函数,和ip层的ip_recv函数定义一样arp_proc_init(); //注proc 文件
#ifdef CONFIG_SYSCTLneigh_sysctl_register(NULL, &arp_tbl.parms, NULL); //注册sys文件
#endif
//向内核注册一个回到函数,用于接收设备状态和配置变化的通知register_netdevice_notifier(&arp_netdev_notifier); 
}

 neigh_table_init 函数

// kernel/net/core/neighbour.c static struct lock_class_key neigh_table_proxy_queue_class;static struct neigh_table *neigh_tables[NEIGH_NR_TABLES] __read_mostly;//邻居表数组
//邻居表初始化,IPv4 就是arp_tbl ,IPv6 就是nd_tblvoid neigh_table_init(int index, struct neigh_table *tbl) 
{unsigned long now = jiffies;unsigned long phsize;INIT_LIST_HEAD(&tbl->parms_list);INIT_LIST_HEAD(&tbl->gc_list);list_add(&tbl->parms.list, &tbl->parms_list);write_pnet(&tbl->parms.net, &init_net);refcount_set(&tbl->parms.refcnt, 1);tbl->parms.reachable_time =neigh_rand_reach_time(NEIGH_VAR(&tbl->parms, BASE_REACHABLE_TIME));//初始化统计结构体tbl->stats = alloc_percpu(struct neigh_statistics);if (!tbl->stats)panic("cannot create neighbour cache statistics");#ifdef CONFIG_PROC_FSif (!proc_create_seq_data(tbl->id, 0, init_net.proc_net_stat,&neigh_stat_seq_ops, tbl))panic("cannot create neighbour proc dir entry");
#endifRCU_INIT_POINTER(tbl->nht, neigh_hash_alloc(3));// 初始化邻居hash桶//获取arp 代理项的大小phsize = (PNEIGH_HASHMASK + 1) * sizeof(struct pneigh_entry *); //分配代理缓存tbl->phash_buckets = kzalloc(phsize, GFP_KERNEL);if (!tbl->nht || !tbl->phash_buckets)panic("cannot allocate neighbour cache hashes");if (!tbl->entry_size)tbl->entry_size = ALIGN(offsetof(struct neighbour, primary_key) +tbl->key_len, NEIGH_PRIV_ALIGN);elseWARN_ON(tbl->entry_size % NEIGH_PRIV_ALIGN);rwlock_init(&tbl->lock);//初始化读写锁INIT_DEFERRABLE_WORK(&tbl->gc_work, neigh_periodic_work);//gc 定时任务,清理工作queue_delayed_work(system_power_efficient_wq, &tbl->gc_work,tbl->parms.reachable_time);//起一个定时器处理arp代理功能timer_setup(&tbl->proxy_timer, neigh_proxy_process, 0);//初始化代理报文队列skb_queue_head_init_class(&tbl->proxy_queue,&neigh_table_proxy_queue_class);tbl->last_flush = now;tbl->last_rand  = now + tbl->parms.reachable_time * 20;neigh_tables[index] = tbl;
}
EXPORT_SYMBOL(neigh_table_init);

arp_tbl 的配置如下:

//kernel/net/ipv4/arp.cstruct neigh_table arp_tbl = {.family         = AF_INET,.key_len        = 4,.protocol       = cpu_to_be16(ETH_P_IP),.hash           = arp_hash,  //计算hash 值函数.key_eq         = arp_key_eq,.constructor    = arp_constructor,  //邻居项初始化函数.proxy_redo     = parp_redo,       //处理arp代理的函数.is_multicast   = arp_is_multicast,  //组播.id             = "arp_cache",     //邻居项缓存池名.parms          = {.tbl                    = &arp_tbl,.reachable_time         = 30 * HZ,  //只有在30秒内收到可到达性确认才是reachable状态.data   = {[NEIGH_VAR_MCAST_PROBES] = 3, //多播地址探测3次[NEIGH_VAR_UCAST_PROBES] = 3,  //单薄地址探测3次[NEIGH_VAR_RETRANS_TIME] = 1 * HZ,  //solicit请求重传时间[NEIGH_VAR_BASE_REACHABLE_TIME] = 30 * HZ, //reachable状态的最长时间[NEIGH_VAR_DELAY_PROBE_TIME] = 5 * HZ, //delay状态最长时间[NEIGH_VAR_GC_STALETIME] = 60 * HZ, //stale状态最长持续时间[NEIGH_VAR_QUEUE_LEN_BYTES] = SK_WMEM_MAX, //队列长度最长64字节 [NEIGH_VAR_PROXY_QLEN] = 64,[NEIGH_VAR_ANYCAST_DELAY] = 1 * HZ,[NEIGH_VAR_PROXY_DELAY] = (8 * HZ) / 10,[NEIGH_VAR_LOCKTIME] = 1 * HZ,},},.gc_interval    = 30 * HZ,    //垃圾回收定时.gc_thresh1     = 128,        //保留.gc_thresh2     = 512,        //邻居项 阈值.gc_thresh3     = 1024,       //邻居项阈值
};
EXPORT_SYMBOL(arp_tbl);

arp_rcv() ,arp 报文处理函数,和ip_rcv 是同样作用的函数。在arp_rcv函数中会调用neigh_update 更改邻居项状态,output函数被修改,才能发送报文出去。

//kernel/net/ipv4/arp.c
/**      Receive an arp request from the device layer.*/static int arp_rcv(struct sk_buff *skb, struct net_device *dev,struct packet_type *pt, struct net_device *orig_dev)
{const struct arphdr *arp;/* do not tweak dropwatch on an ARP we will ignore */if (dev->flags & IFF_NOARP ||skb->pkt_type == PACKET_OTHERHOST ||skb->pkt_type == PACKET_LOOPBACK) // 设备不是arp设备,数据包类型是其他主机或者回收设备,直接去consumeskbgoto consumeskb;skb = skb_share_check(skb, GFP_ATOMIC);if (!skb)goto out_of_mem;/* ARP header, plus 2 device addresses, plus 2 IP addresses.  */if (!pskb_may_pull(skb, arp_hdr_len(dev)))goto freeskb;arp = arp_hdr(skb);// 获取arp 头部进行检查if (arp->ar_hln != dev->addr_len || arp->ar_pln != 4)goto freeskb;memset(NEIGH_CB(skb), 0, sizeof(struct neighbour_cb));return NF_HOOK(NFPROTO_ARP, NF_ARP_IN,dev_net(dev), NULL, skb, dev, NULL,arp_process); // arp_process 是核心处理函数consumeskb:consume_skb(skb);return NET_RX_SUCCESS;
freeskb:kfree_skb(skb);
out_of_mem:return NET_RX_DROP;
}

arp_process() :  Process an arp request

//kernel/net/ipv4/arp.c
/**      Process an arp request.*/static int arp_process(struct net *net, struct sock *sk, struct sk_buff *skb)
{struct net_device *dev = skb->dev;struct in_device *in_dev = __in_dev_get_rcu(dev);struct arphdr *arp;unsigned char *arp_ptr;struct rtable *rt;unsigned char *sha;unsigned char *tha = NULL;__be32 sip, tip;u16 dev_type = dev->type;int addr_type;struct neighbour *n;struct dst_entry *reply_dst = NULL;bool is_garp = false;..../* Update our ARP tables */n = __neigh_lookup(&arp_tbl, &sip, dev, 0);addr_type = -1;if (n || IN_DEV_ARP_ACCEPT(in_dev)) {is_garp = arp_is_garp(net, dev, &addr_type, arp->ar_op,sip, tip, sha, tha);}if (IN_DEV_ARP_ACCEPT(in_dev)) {/* Unsolicited ARP is not accepted by default.It is possible, that this option should be enabled for somedevices (strip is candidate)*/if (!n &&(is_garp ||(arp->ar_op == htons(ARPOP_REPLY) &&(addr_type == RTN_UNICAST ||(addr_type < 0 &&/* postpone calculation to as late as possible */inet_addr_type_dev_table(net, dev, sip) ==RTN_UNICAST)))))n = __neigh_lookup(&arp_tbl, &sip, dev, 1);}if (n) {int state = NUD_REACHABLE; // 设置状态为 reachableint override;/* If several different ARP replies follows back-to-back,use the FIRST one. It is possible, if several proxyagents are active. Taking the first reply preventsarp trashing and chooses the fastest router.*/override = time_after(jiffies,n->updated +NEIGH_VAR(n->parms, LOCKTIME)) ||is_garp;/* Broadcast replies and request packetsdo not assert neighbour reachability.*/if (arp->ar_op != htons(ARPOP_REPLY) ||skb->pkt_type != PACKET_HOST)state = NUD_STALE;neigh_update(n, sha, state,override ? NEIGH_UPDATE_F_OVERRIDE : 0, 0);// 更新邻居项状态neigh_release(n);}out_consume_skb:consume_skb(skb);out_free_dst:dst_release(reply_dst);return NET_RX_SUCCESS;out_free_skb:kfree_skb(skb);return NET_RX_DROP;
}

neigh_update():

int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,u32 flags, u32 nlmsg_pid)
{return __neigh_update(neigh, lladdr, new, flags, nlmsg_pid, NULL);
}
EXPORT_SYMBOL(neigh_update);/* Generic update routine.-- lladdr is new lladdr or NULL, if it is not supplied.-- new    is new state.-- flagsNEIGH_UPDATE_F_OVERRIDE allows to override existing lladdr,if it is different.NEIGH_UPDATE_F_WEAK_OVERRIDE will suspect existing "connected"lladdr instead of overriding itif it is different.NEIGH_UPDATE_F_ADMIN    means that the change is administrative.NEIGH_UPDATE_F_USE      means that the entry is user triggered.NEIGH_UPDATE_F_OVERRIDE_ISROUTER allows to override existingNTF_ROUTER flag.NEIGH_UPDATE_F_ISROUTER indicates if the neighbour is known asa router.Caller MUST hold reference count on the entry.*/static int __neigh_update(struct neighbour *neigh, const u8 *lladdr,u8 new, u32 flags, u32 nlmsg_pid,struct netlink_ext_ack *extack)
{bool ext_learn_change = false;u8 old;int err;int notify = 0;struct net_device *dev;int update_isrouter = 0;trace_neigh_update(neigh, lladdr, new, flags, nlmsg_pid);write_lock_bh(&neigh->lock);dev    = neigh->dev;old    = neigh->nud_state;err    = -EPERM;if (neigh->dead) {NL_SET_ERR_MSG(extack, "Neighbor entry is now dead");new = old;goto out;}if (!(flags & NEIGH_UPDATE_F_ADMIN) &&(old & (NUD_NOARP | NUD_PERMANENT)))goto out;ext_learn_change = neigh_update_ext_learned(neigh, flags, &notify);if (flags & NEIGH_UPDATE_F_USE) {new = old & ~NUD_PERMANENT;neigh->nud_state = new;err = 0;goto out;}if (!(new & NUD_VALID)) {neigh_del_timer(neigh);if (old & NUD_CONNECTED)neigh_suspect(neigh);neigh->nud_state = new;err = 0;notify = old & NUD_VALID;if ((old & (NUD_INCOMPLETE | NUD_PROBE)) &&(new & NUD_FAILED)) {neigh_invalidate(neigh);notify = 1;}goto out;}/* Compare new lladdr with cached one */if (!dev->addr_len) {/* First case: device needs no address. */lladdr = neigh->ha;} else if (lladdr) {/* The second case: if something is already cachedand a new address is proposed:- compare new & old- if they are different, check override flag*/if ((old & NUD_VALID) &&!memcmp(lladdr, neigh->ha, dev->addr_len))lladdr = neigh->ha;} else {/* No address is supplied; if we know something,use it, otherwise discard the request.*/err = -EINVAL;if (!(old & NUD_VALID)) {NL_SET_ERR_MSG(extack, "No link layer address given");goto out;}lladdr = neigh->ha;}/* Update confirmed timestamp for neighbour entry after we* received ARP packet even if it doesn't change IP to MAC binding.*/if (new & NUD_CONNECTED)neigh->confirmed = jiffies;/* If entry was valid and address is not changed,do not change entry state, if new one is STALE.*/err = 0;update_isrouter = flags & NEIGH_UPDATE_F_OVERRIDE_ISROUTER;if (old & NUD_VALID) {if (lladdr != neigh->ha && !(flags & NEIGH_UPDATE_F_OVERRIDE)) {update_isrouter = 0;if ((flags & NEIGH_UPDATE_F_WEAK_OVERRIDE) &&(old & NUD_CONNECTED)) {lladdr = neigh->ha;new = NUD_STALE;} elsegoto out;} else {if (lladdr == neigh->ha && new == NUD_STALE &&!(flags & NEIGH_UPDATE_F_ADMIN))new = old;}}/* Update timestamp only once we know we will make a change to the* neighbour entry. Otherwise we risk to move the locktime window with* noop updates and ignore relevant ARP updates.*/if (new != old || lladdr != neigh->ha)neigh->updated = jiffies;if (new != old) {neigh_del_timer(neigh);if (new & NUD_PROBE)atomic_set(&neigh->probes, 0);if (new & NUD_IN_TIMER)neigh_add_timer(neigh, (jiffies +((new & NUD_REACHABLE) ?neigh->parms->reachable_time :0)));neigh->nud_state = new;notify = 1;}if (lladdr != neigh->ha) {write_seqlock(&neigh->ha_lock);memcpy(&neigh->ha, lladdr, dev->addr_len);write_sequnlock(&neigh->ha_lock);neigh_update_hhs(neigh);if (!(new & NUD_CONNECTED))neigh->confirmed = jiffies -(NEIGH_VAR(neigh->parms, BASE_REACHABLE_TIME) << 1);notify = 1;}if (new == old)goto out;if (new & NUD_CONNECTED)neigh_connect(neigh); //修改output函数为neigh_connected_outputelseneigh_suspect(neigh);if (!(old & NUD_VALID)) {     //如果源状态不为valid,则发送缓存的skbstruct sk_buff *skb;/* Again: avoid dead loop if something went wrong */while (neigh->nud_state & NUD_VALID &&(skb = __skb_dequeue(&neigh->arp_queue)) != NULL) {// 取出缓存报文struct dst_entry *dst = skb_dst(skb);struct neighbour *n2, *n1 = neigh;write_unlock_bh(&neigh->lock);rcu_read_lock();/* Why not just use 'neigh' as-is?  The problem is that* things such as shaper, eql, and sch_teql can end up* using alternative, different, neigh objects to output* the packet in the output path.  So what we need to do* here is re-lookup the top-level neigh in the path so* we can reinject the packet there.*/n2 = NULL;if (dst && dst->obsolete != DST_OBSOLETE_DEAD) {n2 = dst_neigh_lookup_skb(dst, skb);if (n2)n1 = n2;}n1->output(n1, skb); //此时已经修改为connect函数if (n2)neigh_release(n2);rcu_read_unlock();write_lock_bh(&neigh->lock);}__skb_queue_purge(&neigh->arp_queue); //清空缓存neigh->arp_queue_len_bytes = 0;}
out:if (update_isrouter)neigh_update_is_router(neigh, flags, &notify);write_unlock_bh(&neigh->lock);if (((new ^ old) & NUD_PERMANENT) || ext_learn_change)neigh_update_gc_list(neigh);if (notify)neigh_update_notify(neigh, nlmsg_pid);trace_neigh_update_done(neigh, err);return err;
}

neigh_update 函数才是实际的发送报文的函数。

在上面源码分析中,会根据邻居项的状态,设置的output 函数不同,这些函数是如何定义的?

//kernel/net/ipv4/arp.cstatic const struct neigh_ops arp_generic_ops = {.family =               AF_INET,.solicit =              arp_solicit,.error_report =         arp_error_report,.output =               neigh_resolve_output,.connected_output =     neigh_connected_output,
};static const struct neigh_ops arp_hh_ops = {.family =               AF_INET,.solicit =              arp_solicit,.error_report =         arp_error_report,.output =               neigh_resolve_output,.connected_output =     neigh_resolve_output,
};
//设备不需要L2帧头
static const struct neigh_ops arp_direct_ops = {.family =               AF_INET,.output =               neigh_direct_output,.connected_output =     neigh_direct_output,
};

通常ethernet 初始化使用的是arp_generic_ops ,初始化时邻居项的状态时NONE,因此当调用neigh->output时,实际是neigh_resolve_output函数。这个函数会先将报文放到邻居项的缓存队列里,然后发送solicit探测报文,这样整个发送流程就结束了。

当主机收到arp报文时调用 arp_rcv, 这个函数首先进行报文的合理性检查,然后根据报文的内容查找邻居项,假设这是一个arp响应报文,这时候需要更新邻居表项状态为 reachable同时检查缓存队列,如果存在报文就发送报文。此时neigh->output函数是neigh_connected_output 。

上面这些邻居子系统和arp 协议的源码分析。

三、创建邻居项的原因大致有如下几种

1、L3要发送报文。

2、应用层使用ip neigh 命令或者arp 命令手动添加

3、收到arp报文被动学习一个邻居项

下面针对第一种情况看一下流程:当内核发送报文的时候首先需要查找路由,出口路由是绑定邻居缓存的,如果没有邻居缓存会创建一个新的。

目前先借用网友的大致流程图,等后面完成路由源码分析后,再更新最新的源码函数栈:

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com