BCC-应用程序组件分析

libbpf-tools/gethostlatency

追踪glibc中的getaddrinfo、gethostbyname、gethostbyname2函数用时

# /usr/share/bcc/libbpf-tools/gethostlatency 
TIME     PID     COMM             LATms      HOST
14:58:32 8418    curl             313.635    www.taobao.com

以# curl www.taobao.com为例，域名的访问需要先解析域名为ip，再对ip进行访问。解析域名大多数应用层程序会通过调用glibc的相关函数解析。

原理

通过uprobe glibc的三个函数getaddrinfo、gethostbyname、gethostbyname2，分别在进入时记录时间戳，在函数退出时计算时间差。

static int probe_entry(struct pt_regs *ctx)
{__u64 pid_tgid = bpf_get_current_pid_tgid();__u32 tid = (__u32)pid_tgid;struct event event = {};event.time = bpf_ktime_get_ns();bpf_map_update_elem(&starts, &tid, &event, BPF_ANY);return 0;
}static int probe_return(struct pt_regs *ctx)
{__u32 tid = (__u32)bpf_get_current_pid_tgid();struct event *eventp;eventp = bpf_map_lookup_elem(&starts, &tid);/* update time from timestamp to delta */eventp->time = bpf_ktime_get_ns() - eventp->time;bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, eventp, sizeof(*eventp));bpf_map_delete_elem(&starts, &tid);return 0;
}SEC("kprobe/handle_entry")
int BPF_KPROBE(handle_entry)
{return probe_entry(ctx);
}SEC("kretprobe/handle_return")
int BPF_KRETPROBE(handle_return)
{return probe_return(ctx);
}

场景

参考

github bcc/tools/gethostlatency_example.txt

libbpf-tools/syscount

统计系统调用次数

# /usr/share/bcc/libbpf-tools/syscount 
Tracing syscalls, printing top 10... Ctrl+C to quit.
^C[16:21:23]
SYSCALL                   COUNT
epoll_pwait                 757
fcntl                       590
futex                       501
times                       488
epoll_ctl                   347
read                        337
nanosleep                   235
openat                      220
close                       202
poll                        112

原理

通过syscall原始跟踪点计数系统调用调用次数与时间差。所有的系统调用都会在raw_syscall处统计到。

场景

参考

github bcc/tools/syscount_example.txt

libbpf-tools/bashreadline

追踪bash进程中处理的命令行

# /usr/share/bcc/libbpf-tools/bashreadline 
TIME      PID     COMMAND
16:03:45  1844    pwd

注：最大字符串长度为80，超过80的不予处理

原理

readline是bash处理命令行解析过程中的一个函数，函数的返回值是命令行一行的字符串。bashreadline通过uretprobe
(用户态返回插桩)在该函数返回时直接根据返回值字符指针读取命令行。

readline.c:391/* Read a line of input.  Prompt with PROMPT.  An empty PROMPT meansnone.  A return value of NULL means that EOF was encountered. */
char *
readline (prompt)

#0  readline (prompt=0x557178e5ec30 "[root@localhost gitlab]# ") at readline.c:391
#1  0x00005571770309da in yy_readline_get () at ./parse.y:1457
#2  0x0000557177032d03 in yy_getc () at ./parse.y:2300
#3  shell_getc (remove_quoted_newline=1) at ./parse.y:2300
#4  shell_getc (remove_quoted_newline=1) at ./parse.y:2219
#5  0x00005571770360aa in read_token (command=<optimized out>) at ./parse.y:3117
#6  read_token (command=0) at ./parse.y:3067
#7  0x0000557177039ad8 in yylex () at ./parse.y:2676
#8  yyparse () at y.tab.c:1817
#9  0x000055717703014a in parse_command () at eval.c:262
#10 0x0000557177030258 in read_command () at eval.c:306
#11 0x00005571770304e0 in reader_loop () at eval.c:150
#12 0x000055717702ebdb in main (argc=1, argv=0x7ffc9dbd7a28, env=0x7ffc9dbd7a38) at shell.c:802

场景

bash是大多数操作系统的默认终端，使用此工具可以获取操作系统当前所有bash终端的命令执行状态。

如果用户制定了其他shell，不可捕获
高级语言执行的命令可能不通过bash处理，即使高级语言封装了shell=True传递的是脚本内容

参考

github bcc/tools/bashreadline_example.txt

tools/killsnoop

追踪任务发送信号，PID为发送者，TPID为信号接收者。

# /usr/share/bcc/tools/killsnoop 
TIME      PID    COMM             SIG  TPID   RESULT
14:19:54  5748   node             0    327    0

任务task包括进程和线程，在Linux中进程和线程都是通过task统一调度。
kill 不一定是字面意思杀掉任务，仅仅是向指定任务发送信号。
如果信号发送给进程，进程含有多个线程，那么进程的信号handle是哪个线程不可保证，除非进程额外处理。

原理

该模块长时间未得到维护，仅支持kprobe模式。

通过在系统调用kill中kprobe，获取信号发送目标进程pid和信号值。

场景

通过kill系统调用发送信号有一些典型场景，如杀死其他进程TERM和KILL信号，和其他进程交互USR信号。系统调用不仅仅可以通过kill命令触发，其他程序设计中也可能使用到系统调用发送信号交互。

参考

github bcc/tools/killsnoop_example.txt

tools/bpflis

统计当前正在运行的bpf程序

# /usr/share/bcc/tools/bpflist
PID    COMM             TYPE  COUNT
38284  execsnoop        map   4
38284  execsnoop        prog  2

原理

遍历/proc/<fd>/fd下文件查找文件夹内是否有链接为anon_inode:bpf-*的文件。

/proc是内核提供的虚拟文件系统，提供接口可以获取进程信息，当bpf程序运行时通常会创建对应的map等数据结构，这些数据结构映射为文件描述符供程序调用，/proc/<fd>/fd下是进程已打开的文件描述符并使用软连接指向文件。

场景

该模块仅仅是遍历/proc效率较高，可快速定位是否有bpf类进程运行。

参考

proc(5) — Linux manual page
proc_pid_fd(5) — Linux manual page
github bcc/tools/bpflist_example.txt

libbpf-tools/execsnoop

追踪进程执行

/usr/share/bcc/libbpf-tools/execsnoop 
PCOMM            PID    PPID   RET ARGS
ls               3438   2601     0 /usr/bin/ls

原理

linux中执行新的程序需要通过clone/fork先创建出子进程，由子进程通过exec系统调用才能执行新的程序替换子进程空间。

以命令行中执行ls命令为例，父进程bash的pid是3589，此时使用strace -p 3589 -f跟踪bash的系统调用，bash先clone创建自身的子进程3702，由子进程3702执行系统调用execve变为ls进程。

# psPID TTY          TIME CMD3589 pts/1    00:00:00 bash
# ls

# strace -p 3589 -f
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 3702 attached
[pid  3702] execve("/usr/bin/ls", ["ls", "--color=auto"], 0x556e71ce7510 /* 46 vars */) = 0

execsnoop在execve系统调用追踪点获取事件。

场景

追踪系统中的新程序执行

参考

github bcc/tools/execsnoop_example.txt

tools/threadsnoop

追踪使用libc.so或libpthread.so的pthread_create函数创建线程任务。

# /usr/share/bcc/tools/threadsnoop
Attaching 2 probes...
TIME(ms)   PID    COMM             FUNC
1938       28549  dockerd          threadentry

部分高级语言(如python)、使用其他库、程序特殊设计的可能不使用pthread_create创建线程，此类追踪不到。

原理

使用uprobe用户态插桩libc.so或libpthread.so的pthread_create函数收集信息并输出。

场景

参考

github bcc/tools/threadsnoop_example.txt

tools/sslsniff

# /usr/share/bcc/tools/sslsniff 
FUNC         TIME(s)            COMM             PID     LEN    
WRITE/SEND   0.000000000        curl             1968    77    
----- DATA -----
GET / HTTP/1.1
Host: www.baidu.com
User-Agent: curl/7.79.1
Accept: */*
----- END DATA -----

仅适用于使用了openssl、gnutls、nss动态库的应用层软件。

使用readelf -d <path_to_your_executable_program>判断使用的动态库。

原理

使用uprobe方式追踪用户态的openssl、gnutls、nss三个动态库的ssl相关函数执行。

bcc/tools/sslsniff.py: 280def attach_openssl(lib):b.attach_uprobe(name=lib, sym="SSL_write",fn_name="probe_SSL_rw_enter", pid=args.pid or -1)b.attach_uretprobe(name=lib, sym="SSL_write",fn_name="probe_SSL_write_exit", pid=args.pid or -1)b.attach_uprobe(name=lib, sym="SSL_read",fn_name="probe_SSL_rw_enter", pid=args.pid or -1)b.attach_uretprobe(name=lib, sym="SSL_read",fn_name="probe_SSL_read_exit", pid=args.pid or -1)if args.latency and args.handshake:b.attach_uprobe(name="ssl", sym="SSL_do_handshake",fn_name="probe_SSL_do_handshake_enter", pid=args.pid or -1)b.attach_uretprobe(name="ssl", sym="SSL_do_handshake",fn_name="probe_SSL_do_handshake_exit", pid=args.pid or -1)def attach_gnutls(lib):b.attach_uprobe(name=lib, sym="gnutls_record_send",fn_name="probe_SSL_rw_enter", pid=args.pid or -1)b.attach_uretprobe(name=lib, sym="gnutls_record_send",fn_name="probe_SSL_write_exit", pid=args.pid or -1)b.attach_uprobe(name=lib, sym="gnutls_record_recv",fn_name="probe_SSL_rw_enter", pid=args.pid or -1)b.attach_uretprobe(name=lib, sym="gnutls_record_recv",fn_name="probe_SSL_read_exit", pid=args.pid or -1)def attach_nss(lib):b.attach_uprobe(name=lib, sym="PR_Write",fn_name="probe_SSL_rw_enter", pid=args.pid or -1)b.attach_uretprobe(name=lib, sym="PR_Write",fn_name="probe_SSL_write_exit", pid=args.pid or -1)b.attach_uprobe(name=lib, sym="PR_Send",fn_name="probe_SSL_rw_enter", pid=args.pid or -1)b.attach_uretprobe(name=lib, sym="PR_Send",fn_name="probe_SSL_write_exit", pid=args.pid or -1)b.attach_uprobe(name=lib, sym="PR_Read",fn_name="probe_SSL_rw_enter", pid=args.pid or -1)b.attach_uretprobe(name=lib, sym="PR_Read",fn_name="probe_SSL_read_exit", pid=args.pid or -1)b.attach_uprobe(name=lib, sym="PR_Recv",fn_name="probe_SSL_rw_enter", pid=args.pid or -1)b.attach_uretprobe(name=lib, sym="PR_Recv",fn_name="probe_SSL_read_exit", pid=args.pid or -1)

场景

ssl常用于https通信中。使用https通信时常规的网络抓包工具如果需要解析报文非常麻烦，使用此工具可以建议看到双方通讯内容。

参考

github bcc/tools/sslsniff_example.txt

tools/ttysnoop

追踪终端输出。

# /usr/share/bcc/tools/ttysnoop  1
[root@localhost ~]# pwd
/root

原理

追踪kprobe/tty_write位置，此位置实现的是file_operations的write接口，位于驱动层，tty实现了文件系统的写入接口，所有打开了tty的程序都拥有tty的句柄(文件描述符)，对该tty的写入都会通过此函数。

参考《文件系统模块梳理》。

linux-5.10.202/drivers/tty/tty_io.c: 474static const struct file_operations tty_fops = {.write_iter	= tty_write,

场景

参考

github bcc/tools/ttysnoop_example.txt

tools/capable

追踪权能验证。

#  /usr/share/bcc/tools/capable 
TIME      UID    PID    COMM             CAP  NAME                 AUDIT 
14:37:11  0      2490   modprobe         16   CAP_SYS_MODULE       1     
14:37:11  0      549    systemd-udevd    12   CAP_NET_ADMIN        1     
14:37:11  0      2489   tcpdump          13   CAP_NET_RAW          1

原理

追踪kprobe/cap_capable，此函数是内核用来检查权能是否允许的函数。

cap_capable在内核的位置与说明：

linux-5.10.202/security/commoncap.c: 50/*** cap_capable - Determine whether a task has a particular effective capability* @cred: The credentials to use* @ns:  The user namespace in which we need the capability* @cap: The capability to check for* @opts: Bitmask of options defined in include/linux/security.h** Determine whether the nominated task has the specified capability amongst* its effective set, returning 0 if it does, -ve if it does not.** NOTE WELL: cap_has_capability() cannot be used like the kernel's capable()* and has_capability() functions.  That is, it has the reverse semantics:* cap_has_capability() returns 0 when a task has a capability, but the* kernel's capable() and has_capability() returns 1 for this case.*/
int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,int cap, unsigned int opts)
{

/*** cap_capable - 判断一个任务是否具有特定的有效权限能力* @cred: 要使用的凭证* @ns:   需要该权限的用户命名空间* @cap:  要检查的权限能力* @opts: 在 include/linux/security.h 中定义的选项位掩码** 判断指定的任务在其有效集内是否具有指定的权限能力，如果具有返回 0，不具有则返回负值。** 注意：cap_has_capability() 不能像内核的 capable() 和 has_capability() 函数那样使用。* 即，它的语义是相反的：当一个任务具有某权限时，cap_has_capability() 返回 0，* 而内核的 capable() 和 has_capability() 在这种情况下返回 1。*/

场景

参考文章：抓虫：chown失败 Couldn‘t change ownership of savefile 内的此段，正好是在cap_capable函数内检查权能失败而导致的问题。

# perf ftrace -G setattr_prepare  --graph-opts depth=5 chown tcpdump:tcpdump a.pcap            chown: 正在更改'a.pcap' 的所有者: 不允许的操作
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |0)               |  setattr_prepare() {0)               |    capable_wrt_inode_uidgid() {0)               |      ns_capable_common() {0)               |        security_capable() {0)   0.099 us    |          cap_capable();0)   0.433 us    |        }0)   0.651 us    |      }0)   0.854 us    |    }0) + 31.484 us   |  }

权能问题的追踪较为复杂，因为不容易直接发现是否是权能导致的问题。如发现类似内核拒绝执行的内容，可先检查如audit类日志，在使用如ftrace类工具追踪内核。

参考

抓虫：chown失败 Couldn‘t change ownership of savefile
capabilities(7) — Linux manual page
github bcc/tools/capable_example.txt