从0开始的操作系统手搓教程29：实现用户进程的printf

所以，可变参数函数是。。。

实现printf的一个promise：sys_write

准备好实现printf

基本流程

处理我们的格式化字符串

思路正确，进一步完善printf

我们马上就准备实现一个大的——也就是实现用户进程的printf，很快用户进程就能自己说话，告诉我们当前进程的PID了，为此，我们需要提前做一部分工作。不然的话，到时候看可变参数列表等东西，可就要发懵了！，因为我们的printf实际上不同于一般的函数，他是参数可变的。所以，我们务必需要先理解的是——可变参数函数的实现的根本原理

所以，可变参数函数是。。。

不着急，你可能没有见过可变参数函数长啥样，我先把签名放在这里，给您看一眼

// Formats and prints a string based on a format specifier
uint32_t printf(const char* format_string, ...);

是的，这个就是我们马上准备搞的一个函数，就是我们熟悉的printf函数。你可以看到，我们的函数的末尾就是一个...，表达的含义就是参数未定。

你熟悉函数调用的本质——对于Typical的C调用，函数从右向左，参数依次压栈，那问题来了，那咋知道到底压多少个参数呢？

嘿，你不要着急！你仔细想象，你是咋调用我们的printf的呢？

printf("prog_a_pid:0x%x\n", getpid());

看懂了嘛？实际上，可变参数可变参数，本质上还是静态的。这里的可变，更多比较的是我们的“编译期”，也就说，我们在书写的时候很自由，可以决定到底压入多少个参数。但是实际上，你不可能运行的时候决定同一个代码调用处到底放多少个参数。换而言之，我们的printf的调用的参数是编译时确定的，而不是运行时确定的，在代码跑之前，我们就知道这里的printf的调用准备放入多少个参数了。比如说

printf("aaa%d", 1234);

这里对printf的调用就要放入两个参数压栈。

printf("aaaaaa%d%d", 12, 34)

这里则是三个，printf自身支持调用多个参数，但是你一写完，这就确定下来到底多少个了。

那我们作为库开发者，咋确定啊？不着急，我们一个个看。

知道过我们printf或者是其他可变参数函数实现原理的朋友估计已经按耐不住了，大喊：“很简单嘛！就是三个核心：va_start、va_end 和va_arg”

非常正确，笔者把简化的实现放到这里，之后我们也会直接使用这个简化的版本直接干。

#define va_start(ap, v) ap = (va_list) & v // the start of the vargs
#define va_arg(ap, t) *((t *)(ap += 4))    // let next
#define va_end(ap) ap = NULL               // clear varg

啥，看不懂，嘿嘿，Man手册一下：man 3 stdarg，笔者简单的翻译一下要点：

函数可以使用不同数量和不同类型的参数进行调用。包含文件 <stdarg.h> 声明了一个类型 va_list，并定义了三个宏，用于遍历被调用函数不知道其数量和类型的参数列表。

被调用函数必须声明一个 va_list 类型的对象，该对象由宏 va_start()、va_arg() 和 va_end() 使用。

va_start() va_start() 宏初始化 ap，以供 va_arg() 和 va_end() 随后使用，并且必须首先调用。参数 last 是变量参数列表之前的最后一个参数的名称，即调用函数知道其类型的最后一个参数。由于此参数的地址可能在 va_start() 宏中使用，因此不应将其声明为寄存器变量、或函数或数组类型。

va_arg() va_arg() 宏扩展为一个表达式，该表达式具有调用中下一个参数的类型和值。参数 ap 是由 va_start() 初始化的 va_list ap。每次调用 va_arg() 都会修改 ap，以便下一次调用返回下一个参数。参数类型是指定的类型名称，因此只需在类型中添加 * 即可获得指向具有指定类型的对象的指针的类型。

va_start() 宏之后的第一次使用 va_arg() 宏将返回最后一个参数。后续调用将返回剩余参数的值。如果没有下一个参数，或者类型与实际下一个参数的类型不兼容（根据默认参数提升进行提升），则会发生随机错误。如果将 ap 传递给使用 va_arg(ap,type) 的函数，则在该函数返回后，ap 的值未定义。

va_end() va_start() 的每次调用都必须与同一函数中 va_end() 的相应调用相匹配。在调用 va_end(ap) 之后，变量 ap 未定义。可以多次遍历列表，每次遍历都由 va_start() 和 va_end() 括起来。va_end() 可以是宏或函数。

好像还是看不太懂，我说吧：

这三个宏是C语言中用于访问可变参数（variadic arguments）的工具。它们的作用是让函数能够接受不定数量的参数，并在函数内部对这些参数进行读取。

va_list 是一个指针类型，用于遍历可变参数。说白了抓住这个，就是抓住了我们的可变参数列表的头，我们的va_list在笔者这里的实现，就是一个char*

typedef char* va_list;

va_start(ap, v)的作用是初始化参数访问。它的两个参数中，ap 是你定义的 va_list 类型变量，用来接收参数的起始地址，v 是函数中最后一个固定参数。宏的实现是将 ap 设置为 v 后面那个参数的地址，也就是 &v，然后通过强制类型转换变成 va_list 类型。由于C语言函数参数是顺序存放在栈上的，所以可变参数就紧跟在 v 后面。因此，把 ap 设为 &v，就能从这里开始访问所有的可变参数。

va_arg(ap, t)的作用是从参数列表中依次取出下一个参数，并将其解释为类型 t。实现上，ap += 4 表示将 ap 向后移动4个字节（也就是一个 int 大小，通常是32位机器上最常见的类型宽度），使其指向下一个参数，然后将这个地址强制转换为类型 t *，再通过 * 取得实际的值。这样每调用一次 va_arg，ap 就自动前进一个参数。这也就是直接按照指针来取东西进行解释。

va_end(ap)的作用是清理工作，它将 ap 设为 NULL，表示不再访问参数。这里的定义更加像是告诉大伙不能用了。

实现printf的一个promise：sys_write

write本身算是我们搓文件系统的时候才会用到的玩意，但是这里呢，笔者打算当作一个系统调用的联系，我们先实现一个这个再说。

uint32_t write(char* str){return _syscall1(SYS_WRITE, str);
}

uint32_t sys_write(char* str) {console_ccos_puts(str);return k_strlen(str);
}

/* Initialize the system call table */
void syscall_init(void) {verbose_ccputs("syscall_init start\n"); // Logging the start of syscall initialization/* Set the system call table entries for various system call numbers */syscall_table[SYS_GETPID] = sys_getpid; // Get process IDsyscall_table[SYS_WRITE] = sys_write;verbose_ccputs("syscall_init done\n"); // Logging the completion of syscall// initialization
}

啊哈，简单吧，没有了就是这样的。

准备好实现printf

基本流程

足够了！我们这就来实现printf

// Formats and prints a string based on a format specifier
uint32_t printf(const char* format_string, ...) { va_list args; va_start(args, format_string); // Initialize args to store all additional argumentschar buffer[PRINTF_IO_BUFFER_LENGTH] = {0};       // Buffer to hold the formatted stringvsprintf(buffer, format_string, args); va_end(args); return write(buffer); 
}

声明可变参数的handle，也就是args
声明一个准备用来实现printf的buffer，这就是我们准备用来承接分析输出的buffer
使用vsprintf来解析我们的%格式符
结束我们的可变参数，表达不可用
发起系统调用write来写参数

说完了，我们下面就来看看如何处理我们的格式化字符串

处理我们的格式化字符串

// Formats a string and stores the result in the provided buffer
uint32_t vsprintf(char* output_buffer, const char* format_string, va_list args) { char* buffer_ptr = output_buffer; const char* format_ptr = format_string; char current_char = *format_ptr; int32_t argument_integer; 
while (current_char) { if (current_char != '%') { *(buffer_ptr++) = current_char; current_char = *(++format_ptr); continue; } 
current_char = *(++format_ptr);  // Get the character after '%'
switch (current_char) { case 'x': argument_integer = va_arg(args, int); itoa(argument_integer, &buffer_ptr, 16); current_char = *(++format_ptr);  // Skip format specifier and update current_char break; } } 
return k_strlen(output_buffer); 
}

我们先不搞复杂来，一个一个慢慢实现，为了验证我们的vsprintf思路（啊，我还没说，不着急）是正确的，我们先一步一步慢慢说：

函数开始的时候呢，我们就声明一部分薄记处理变量：buffer_ptr 被设置为输出缓冲区 output_buffer 的起始位置，用于后续逐步写入字符。format_ptr 是格式字符串的当前处理位置，它一开始指向 format_string 的起点。current_char 是当前处理的格式字符，也就是 *format_ptr 的值。argument_integer 是临时变量，用来存储从可变参数中取出的整数。

最开始的部分是处理平凡的字符——也就是当我们没有遇到%的时候，说明这个时候打印的就是平凡的字符串。把它当作普通字符直接写入输出缓冲区当前的位置，然后让 buffer_ptr 和 format_ptr 都向后移动一个字符，再更新 current_char，开始下一轮循环。

遇到了%，说明要来活了，我们马上老实了跳过%，因为我们没必要输出它，这个current_char = *(++format_ptr);就是在做这个事情。

我们马上准备根据current_char来指定我们分类的想法。目前我们的代码中只处理了 %x 这一种格式。当发现是 %x 时，就从可变参数中取出一个 int 类型的整数，保存在 argument_integer 中。这个取值用的是 va_arg(args, int)。然后调用 itoa 函数，把这个整数转换成十六进制字符串，并把转换后的字符串写入 buffer_ptr 指向的位置。转换完成后，buffer_ptr 会自动前进到下一个空位。接着 format_ptr 再往后移动一位，准备处理下一个字符，并更新 current_char。整个格式字符串处理完之后，循环退出，函数最后调用 k_strlen 来计算输出缓冲区中字符串的长度，并把这个长度作为函数的返回值返回出去。

所以，现在的问题就变成了，我们如何把数字转成字符串呢？

// Converts an unsigned integer to a string representation in the specified base
static void itoa(uint32_t number, char **buffer_address, uint8_t base) {uint32_t remainder = number % base;  // Get the least significant digituint32_t quotient = number / base;   // Get the remaining part of the number
// Recursively process the higher digitsif (quotient) {itoa(quotient, buffer_address, base);}
// Convert remainder to a character and store it in the bufferif (remainder < 10) {*((*buffer_address)++) = remainder + '0';  // Convert 0-9 to '0'-'9'} else {*((*buffer_address)++) = remainder - 10 + 'A';  // Convert 10-15 to 'A'-'F' (for base > 10)}
}

我们的函数是递归处理的，每一次，我们都处理一个位。

第一步就是把整的部分和个位数分别领出来，我们的整的就是quotient，这个是0了，说明没有更高的位了，我们当前处理的就是最高位。如果不是零，就说明还有更高位的数字没有处理。这时候函数会递归调用自己，也就是再执行一次 itoa(quotient, buffer_address, base)。这一句会继续向上分解更高位的数字，直到所有的位都分解完了，quotient 为零为止。

然后开始回到当前层，处理当前层的 remainder。如果 remainder 小于 10，说明它可以直接用数字字符 '0' 到 '9' 表示，于是通过 remainder + '0' 的方式转换成字符并写入缓冲区当前的位置，然后把缓冲区地址 *buffer_address 往后移动一位，为下一个字符腾出空间。

如果 remainder 大于等于 10，说明在当前进制中，这个数位对应的是字母，比如十六进制中的 A 到 F。这时候用 remainder - 10 + 'A' 的方式，把 10 到 15 映射成字符 'A' 到 'F'，然后写入缓冲区，并同样把地址向后移一位。

因为递归的顺序是先处理高位再处理低位，所以写入缓冲区的字符顺序正好是从最高位到最低位，得到的是一个完整正确的字符串，不需要再进行反转

上电看看实力：

#include "include/device/console_tty.h"
#include "include/kernel/init.h"
#include "include/library/kernel_assert.h"
#include "include/thread/thread.h"
#include "include/user/process/process.h"
#include "include/user/syscall/syscall.h"
#include "include/user/syscall/syscall_init.h"
#include "include/user/stdio/stdio.h"
void thread_a(void *args);
void thread_b(void *args);
void u_prog_a(void);
void u_prog_b(void);
int prog_a_pid = 0, prog_b_pid = 0; 

int main(void)
{init_all();create_process(u_prog_a, "user_prog_a");create_process(u_prog_b, "user_prog_b");interrupt_enabled();console_ccos_puts("main_pid: 0x");console__ccos_display_int(sys_getpid());console__ccos_putchar('\n');thread_start("k_thread_a", 31, thread_a, "In thread A:");thread_start("k_thread_b", 31, thread_b, "In thread B:");while (1){}
}

void thread_a(void *arg)
{char* para = arg;(void)para;console_ccos_puts("thread a pid: 0x");console__ccos_display_int(sys_getpid());console__ccos_putchar('\n');while (1);
}
void thread_b(void *arg)
{char* para = arg;(void)para;console_ccos_puts("thread b pid: 0x");console__ccos_display_int(sys_getpid());console__ccos_putchar('\n');while (1);
}

void u_prog_a(void)
{printf("prog_a_pid:0x%x\n", getpid()); while (1);
}

void u_prog_b(void) { printf("prog_b_pid:0x%x\n", getpid()); while(1);
}

嗯，没有问题。

思路正确，进一步完善printf

实际上，就是接着添加，所以真没啥好说的：

// Formats a string and stores the result in the provided buffer
uint32_t vsprintf(char *output_buffer, const char *format_string, va_list args) {char *buffer_ptr = output_buffer;const char *format_ptr = format_string;char current_char = *format_ptr;int32_t integer_argument;char *string_argument;
while (current_char) {if (current_char != '%') {*(buffer_ptr++) = current_char;current_char = *(++format_ptr);continue;}
current_char = *(++format_ptr);  // Get the character after '%'
switch (current_char) {case 's':  // String argumentstring_argument = va_arg(args, char *);k_strcpy(buffer_ptr, string_argument);buffer_ptr += k_strlen(string_argument);current_char = *(++format_ptr);break;
case 'c':  // Character argument*(buffer_ptr++) = va_arg(args, int);current_char = *(++format_ptr);break;
case 'd':  // Decimal integer argumentinteger_argument = va_arg(args, int);if (integer_argument < 0) {integer_argument = -integer_argument;*buffer_ptr++ = '-';}itoa(integer_argument, &buffer_ptr, 10);current_char = *(++format_ptr);break;
case 'x':  // Hexadecimal integer argumentinteger_argument = va_arg(args, int);itoa(integer_argument, &buffer_ptr, 16);current_char = *(++format_ptr);break;}}return k_strlen(output_buffer);
}

我们新增了对字符串类型 %s 的处理。在 switch 语句中多了一个 case 's'，这表示格式字符串中如果遇到 %s，就会从可变参数中取出一个 char* 类型的字符串指针，也就是调用者传入的字符串内容。然后使用 k_strcpy 把这个字符串复制到输出缓冲区的当前位置。接着用 k_strlen 得到这个字符串的长度，把 buffer_ptr 向后移动。这样，咱们的东西就会完整的刚刚好放到缓存区中。

我们的格式字符串中如果遇到 %c，就会从可变参数中取出一个字符（本质上是 int 类型），然后直接写入输出缓冲区当前位置，并将 buffer_ptr 向后移动。

以及不用苦哈哈看十六位进制的数字了：我们的格式字符串中如果遇到 %d，说明要格式化一个有符号的十进制整数。它从参数中取出一个 int，先判断是否为负数。如果是负数，就先写入 '-' 字符，再把整数转成正数，以便后续调用 itoa。然后调用 itoa 把这个整数转换成十进制字符串，写入缓冲区。

还是上电，看看实力

#include "include/device/console_tty.h"
#include "include/kernel/init.h"
#include "include/library/kernel_assert.h"
#include "include/thread/thread.h"
#include "include/user/process/process.h"
#include "include/user/syscall/syscall.h"
#include "include/user/syscall/syscall_init.h"
#include "include/user/stdio/stdio.h"
void thread_a(void *args);
void thread_b(void *args);
void u_prog_a(void);
void u_prog_b(void);
int prog_a_pid = 0, prog_b_pid = 0; 

int main(void)
{init_all();create_process(u_prog_a, "user_prog_a");create_process(u_prog_b, "user_prog_b");interrupt_enabled();console_ccos_puts("main_pid: 0x");console__ccos_display_int(sys_getpid());console__ccos_putchar('\n');thread_start("k_thread_a", 31, thread_a, "In thread A:");thread_start("k_thread_b", 31, thread_b, "In thread B:");while (1){}
}

void thread_a(void *arg)
{char* para = arg;(void)para;console_ccos_puts("thread a pid: 0x");console__ccos_display_int(sys_getpid());console__ccos_putchar('\n');while (1);
}
void thread_b(void *arg)
{char* para = arg;(void)para;console_ccos_puts("thread b pid: 0x");console__ccos_display_int(sys_getpid());console__ccos_putchar('\n');while (1);
}

void u_prog_a(void)
{char* name = "user_prog_a"; printf("in proc %s, prog_a_pid:0x%x%c", name, getpid(), '\n'); while (1);
}

void u_prog_b(void) { char* name = "user_prog_b"; printf("in proc %s, prog_b_pid:0x%x%c", name, getpid(), '\n'); while(1);
}

代码

CCOperateSystem/Documentations/11_advanced_kernel/11.2_implement_printf_code at main · Charliechen114514/CCOperateSystemhttps://github.com/Charliechen114514/CCOperateSystem/tree/main/Documentations/11_advanced_kernel/11.2_implement_printf_code

CCOperateSystem/Documentations/11_advanced_kernel/11.3_better_printf_code at main · Charliechen114514/CCOperateSystemhttps://github.com/Charliechen114514/CCOperateSystem/tree/main/Documentations/11_advanced_kernel/11.3_better_printf_code