CSAPP 2e Shell Lab 笔记

通过完成一个简单的 Shell 程序（称为 tsh），来熟悉程序控制与信号的内容。

本笔记根据第二版材料写出，可能与第三版有一定差别。由于测试用例本身就是由浅入深的，所以本笔记也按照这个方式组织。

前置知识

建议提前阅读官方 Writeup（在这里下载），并学习课本 “异常控制流” 部分。

Writeup 的 Hints 部分包含了很多有用的提示，太多了就不一块翻译了。

在原作者所属的顶级大学 CMU，这个实验是分配给两个人做 7 天的，可以作为参考，决定自己的工作量。

关于 Shell

Shell 是一个交互式命令行解释器，就是执行你命令的程序。如果已经用得比较熟了，就没必要看这个部分了。

命令是一串空格分隔的字符，第一个单词是程序名或内置命令名。如果是程序名，Shell 会 fork 出一个进程执行它；如果是内置命令，就在当前进程中运行。

比如下面的命令：

        bash
        
    gcc     a.c    -o       a     -O2
程序名 | 参数 1 | 参数 2 | 参数 3 | 参数 4

1
2


gcc     a.c    -o       a     -O2
程序名 | 参数 1 | 参数 2 | 参数 3 | 参数 4

如果一个命令以 & 结尾，则任务应在后台运行。只有一个程序可以在前台运行，而后台程序的数量没有多少限制。

Unix Shell 支持任务控制，即使用 Ctrl+C 可以发送 SIGINT 信号停止任务，使用 Ctrl+Z 可以发送 SIGTSTP 暂停进程（也可能有其他行为）。此外，还应该支持 jobs 显示任务列表、bg <job> 让程序后台运行、fg <job> 让程序前台运行，以及 kill <job> 来杀掉进程。

特别地，在本 Lab 中，需要做到的有以下几点：

命令行 prompt 以 “tsh>” 开头。
不需要支持 IO 重定向（<>）和管道（|），这点大家在 Buffer Lab 里应该很熟悉。
Ctrl+C 和 Ctrl+Z 需要向前台程序 和其后代进程 对应的信号。
如果命令以 & 结尾，将其放到后台运行。
为每个进程或工作分配一个 PID 或 JID，JID 应以 % 开头。这一部分已被实现
实现 quit、jobs、bg 和 fg 内置命令。
实现终止所有后代僵尸进程。

关于测试

每进行一次修改，都要运行 make 命令 以重新编译，然后运行 ./tsh 来执行。

本实验提供一个参考程序 tshref，做出来的 tsh 与其输出相同，便代表你是正确的。如果要测试正确性，可以在 tsh 和 tshref 中手动输入命令对比，使用 sdriver.pl 来自动判断，或者结合使用。目录中还有 16 个测试文件，可以使用如下的命令来对比测试。其中的 - p 参数表示不输出命令行 tsh > 开头。

        bash
        
    unix> ./sdriver.pl -t trace01.txt -s ./tsh -a "-p"
unix> ./sdriver.pl -t trace01.txt -s ./tshref -a "-p"
# 上面两个与下面两个命令等效
unix> make test01
unix> make rtest01

1
2
3
4
5


unix> ./sdriver.pl -t trace01.txt -s ./tsh -a "-p"
unix> ./sdriver.pl -t trace01.txt -s ./tshref -a "-p"
# 上面两个与下面两个命令等效
unix> make test01
unix> make rtest01

但也不是所有情况都需要让两个输出相同。比如运行的 PID 很可能甚至一定不相同。另外，trace11-13 的 ps 输出可能次次不同，但重点是使进程状态相同。

此外，lab 文件还带有几个测试程序，用法和作用分别如下：

mystop <n> - 睡眠 n 秒，然后向自己发送 SIGTSTP。
mysplit <n> - fork 一个子进程，自旋 n 秒。
myspin <n> - 睡眠 n 秒。
myint <n> - 睡眠 n 秒，然后向自己发送 SIGINT。

如果你想要一次对比所有测试的输出，可以使用以下的方法（参考资料 4）：

在 Shell Lab 目录中放置这个文件和这个文件；
运行下面的命令：

        bash
        
    ./judge.sh > test.log
./rjudge.sh > rtest.log

1
2


./judge.sh > test.log
./rjudge.sh > rtest.log

将 rtest.log 中的 tshref 全部替换为 tsh，使用你喜欢的编辑器（如 vscode）的正则表达式替换，将两文件的 \(\d*\) 替换为 10000；
使用你喜欢的对比工具（仍然可用 vscode）对比 test.log 和 rtest.log，不同应该只有 ps 命令输出的 PID。

更多的用法请自行参见 Writeup 的 Checking Your Work 一节。

Tricks

个人建议使用 Git 进行版本控制，并进行相对密集的 Commit，以分清各阶段所做的工作，并方便地回退代码。

如果你测试过程中发现无法退出 tsh，可以打开另一个终端窗口，执行 killall tsh。

错误处理也可以先不做，在 trace14 处对照填写。

trace01

trace01.txt - 在 EOF 处正确地停止。

eval 函数是根据输入命令行进行对应操作的函数。在 CSAPP 2e 8.4.6 节（中文版 P502 / 英文版 P733），有 eval 函数的大致框架，先填上再说。语句作用我都填到注释里了：

        c
        
    void eval(char *cmdline)
{
    char *argv[MAXARGS];
    char buf[MAXLINE]; // command will be parsed and modified?
    int bg;            // whether it runs in background
    pid_t pid;
    // preprocess cmd line
    strcpy(buf, cmdline);
    bg = parseline(buf, argv); // convert the command into argv
    if (argv[0] == NULL)       // the line is empty, return
    {
        return;
    }
    // run external command
    if (!builtin_cmd(argv))
    {
        if ((pid = fork()) == 0) // this is child
        {
            if (execve(argv[0], argv, environ) <0) // execute command failed
            {
                printf("%s: Command not found\n", argv[0]);
                exit(0); // here only child exited
            }
        }
        if (!bg) // run the process in foreground:
        // wait for foreground job to terminate
        {
            int status;
            if (waitpid(pid, &status, 0) <0) // if return -1, then waiting failed
            {
                unix_error("waitfg: waitpid error");
            }
        }
        else
        {
            printf("%d %s", pid, cmdline);
        }
    }
    return;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40


void eval(char *cmdline)
{
    char *argv[MAXARGS];
    char buf[MAXLINE]; // command will be parsed and modified?
    int bg;            // whether it runs in background
    pid_t pid;
    // preprocess cmd line
    strcpy(buf, cmdline);
    bg = parseline(buf, argv); // convert the command into argv
    if (argv[0] == NULL)       // the line is empty, return
    {
        return;
    }
    // run external command
    if (!builtin_cmd(argv))
    {
        if ((pid = fork()) == 0) // this is child
        {
            if (execve(argv[0], argv, environ) <0) // execute command failed
            {
                printf("%s: Command not found\n", argv[0]);
                exit(0); // here only child exited
            }
        }
        if (!bg) // run the process in foreground:
        // wait for foreground job to terminate
        {
            int status;
            if (waitpid(pid, &status, 0) <0) // if return -1, then waiting failed
            {
                unix_error("waitfg: waitpid error");
            }
        }
        else
        {
            printf("%d %s", pid, cmdline);
        }
    }
    return;
}

这个框架做的事情如下：

使用 parseline 预处理命令行，将一整个字符串按空格分割为字符串数组；
使用 builtin_command 检测并处理内置命令，如果为内置命令，归 builtin_command 函数处理；
fork 出一个新进程；
在新进程中，execve 运行对应的程序，如果运行失败，提示错误信息，将子进程退出；
如果运行的程序是前台进程，那么就要等待前台进程结束，再继续运行 shell；
否则，如果是后台进程，直接打印命令信息。

trace01 只需要正确响应 EOF 就可以了。在课本的程序框架中，已经使用 parseline 函数将命令行解析为了参数，并判断了第一个参数是否为 NULL。显然，第一个参数是 NULL，后面当然更是 NULL，意味着没有输入。也就是说，我们什么额外的工作都不用干……

详细的 Git Commit 更改见此处。通过如图：

trace02

trace02.txt - 处理内置的 quit 命令。

在 tsh 中，内置的命令实在 builtin_cmd 函数中处理的。只需要在其中判断一下第一个参数是否为 quit，如果是的话退出即可。

在 builtin_cmd 中插入以下代码：

        c
        
        if (strcmp(argv[0], "quit") == 0) // process quit command
    {
        exit(0);
    }

1
2
3
4


    if (strcmp(argv[0], "quit") == 0) // process quit command
    {
        exit(0);
    }

详细的 Git Commit 更改见此处。通过如图：

trace03/04

trace03.txt - 运行一个前台任务。

trace04.txt - 运行一个后台任务。

在课本的 eval 框架中，首先使用 builtin_cmd 判断并处理内置函数，然后根据 bg 值（由 parseline 得到）判断是否在后台运行。这其中会先 fork 出一个子进程，然后使用 execve 执行目标程序。请注意，只有子进程会在运行失败时 exit。根据是否是前台任务，判断是直接打印执行详情还是等待前台进程结束。

        c
        
        if (!builtin_cmd(argv)) // built-in command is done in `builtin_cmd`
    {
        if ((pid = fork()) == 0) // this is child
        {
            if (execve(argv[0], argv, environ) <0) // execute command failed
            {
                printf("%s: Command not found\n", argv[0]);
                exit(0); // here only child exited
            }
        }
        if (!bg) // run the process in foreground:
        // wait for foreground job to terminate
        {
            int status;
            if (waitpid(pid, &status, 0) <0) // if return -1, then waiting failed
            {
                unix_error("waitfg: waitpid error");
            }
        }
        else
        {
            printf("%d %s", pid, cmdline);
        }
    }

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


    if (!builtin_cmd(argv)) // built-in command is done in `builtin_cmd`
    {
        if ((pid = fork()) == 0) // this is child
        {
            if (execve(argv[0], argv, environ) <0) // execute command failed
            {
                printf("%s: Command not found\n", argv[0]);
                exit(0); // here only child exited
            }
        }
        if (!bg) // run the process in foreground:
        // wait for foreground job to terminate
        {
            int status;
            if (waitpid(pid, &status, 0) <0) // if return -1, then waiting failed
            {
                unix_error("waitfg: waitpid error");
            }
        }
        else
        {
            printf("%d %s", pid, cmdline);
        }
    }

说了这么多又是什么意思？这道题又什么都不用做。

此题没有 Commit，输出有一个格式问题，在 trace05 的 commit 内一起解决。通过如图。这里的 job 编号不同是正常现象，这个问题将在 trace05 中解决。

trace05

trace05.txt - 处理 jobs 内置命令。

先拿软柿子下手，将处理 jobs 命令的地方做好。tsh 已经为我们实现了 listjobs 函数，可以直接放到 builtin_cmd 中。注意 return 1 用来告诉 eval 已经找到了一个内置命令，否则会提示 “command not found”。

        c
        
        if (strcmp(argv[0], "jobs") == 0) // t5: process jobs command
    {
        listjobs(jobs);
        return 1; // this IS a builtin command, return 1 to notify
    }

1
2
3
4
5


    if (strcmp(argv[0], "jobs") == 0) // t5: process jobs command
    {
        listjobs(jobs);
        return 1; // this IS a builtin command, return 1 to notify
    }

但是不管怎么样，运行 jobs 都不会输出任何东西。毕竟在我们已有的代码里，既没有在任务开始时将其添加到任务列表，也没有在结束时将它移出。要做到将任务添加到任务列表，需要在 fork 后调用 addjob：

        c
        
                    printf("%s: Command not found\n", argv[0]);
                exit(0); // here only child exited
            }
        }
        addjob(jobs, pid, bg ? BG : FG, cmdline); // add the job to job list.
        // When bg=1, state=2; bg=0, state=1. this way it's just elegant
        if (!bg)                               // run the process in foreground:

1
2
3
4
5
6
7


                printf("%s: Command not found\n", argv[0]);
                exit(0); // here only child exited
            }
        }
        addjob(jobs, pid, bg ? BG : FG, cmdline); // add the job to job list.
        // When bg=1, state=2; bg=0, state=1. this way it's just elegant
        if (!bg)                               // run the process in foreground:

上述代码的 addjob 中第三个参数 state 有三个取值，FG=1、BG=2、ST=3。虽然直接使用 bg+1 也是可行的方案，但这样使用三元运算符会更优雅更容易理解。

至于删除任务的工作，需要在 sigchld_handler 函数中处理。这涉及到对 waitpid 函数的更深入理解，在 CSAPP 8.4.3 节（中文版 P496，英文版 P724），提到了这样一段话，介绍了 waitpid 函数的默认行为，以及退出状态的检查方法（中翻太烂了，这是我自己翻的）：

（更改默认行为）WNOHANG|WUNTRACED：立即返回。如果等待集里面没有子进程已经终止，那么返回 0；否则，返回其中一个已终止子进程的 PID。

（检查回收子进程的返回状态）WIFEXITED(status)：如果子进程正常退出，即通过调用 exit 或者 return，则返回真。

Writeup 中也提到 WNOHANG|WUNTRACED 或许会有用。如果没有 WNOHANG 参数，shell 会一直等待，直到有一个子进程退出。而如果没有 WUNTRACED 参数，那么程序无法捕捉到 STOP 的子进程情况，这样会卡在 trace16。

上述的前一个更改默认行为的部分对应了 waitpid 的第三个参数，后一个函数检查用于确定是否有个子进程真的退出了（而非没有子进程终止，waitpid 返回了 0）。有了这些知识，可以得到实现如下，如此便可以将退出的进程从任务列表删除：

        c
        
    void sigchld_handler(int sig) 
{
    pid_t pid;
    int status;
    while((pid=waitpid(-1,&status,WNOHANG|WUNTRACED))>0) // check if a child has become zombie, without wait
    {
        if(WIFEXITED(status))
        {
            deletejob(jobs,pid); // remove pid from job list
        }
    }
    return;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


void sigchld_handler(int sig) 
{
    pid_t pid;
    int status;
    while((pid=waitpid(-1,&status,WNOHANG|WUNTRACED))>0) // check if a child has become zombie, without wait
    {
        if(WIFEXITED(status))
        {
            deletejob(jobs,pid); // remove pid from job list
        }
    }
    return;
}

这并没有结束，我们需要干其他的亿些工作。我们知道如果 shell 要继续运行，需要等待前台任务（如果有）结束，比如在前面的通过截图中，只有 make test04 结束后，shell 才会再打印出 “cyp0633@cyp0633-R7000-Linux> ~/ 桌面…” 字样，在本实验中就是“tsh>”。但 tsh 有一个 waitfg 函数，看起来他们不想让我们把等待工作放到 eval 中。于是这一部分改成：

        c
        
            if (!bg) // run the process in foreground:
        // wait for foreground job to terminate
        {
            waitfg(pid); // the waiting stuff should be done in `waitfg`
        }

1
2
3
4
5


        if (!bg) // run the process in foreground:
        // wait for foreground job to terminate
        {
            waitfg(pid); // the waiting stuff should be done in `waitfg`
        }

至于 waitpid 函数的实现，Writeup 中已经给了提示：

实验有一个棘手的部分，是决定 waitfg 和 sigchld 处理函数之间的工作分配。我们推荐以下方法：

- 在 waitfg 中，用一个死循环包裹 sleep 函数。

- 在 sigchld_handler 中，调用且仅调用一次 waitpid。

所谓的工作分配，指的是 waitfg 和 sigchld_handler 都有等待进程结束的功能。waitfg 的作用前面已经说明，而 sigchld_handler 负责接收任何子进程结束的信号，并将其回收，注意它是一个 handler。waitfg 的实现如下：

        c
        
    void waitfg(pid_t pid)
{
    while (pid == fgpid(jobs))
    {
        sleep(0);
    }
    return;
}

1
2
3
4
5
6
7
8


void waitfg(pid_t pid)
{
    while (pid == fgpid(jobs))
    {
        sleep(0);
    }
    return;
}

也可以在这里调用 waitpid 等待前台进程退出，但它可能会抢走 sigchld_handler 的信号，而不会执行 deletejob。结果就是在 trace05 中会打印一堆 tsh。那当然可以在这里也加一个 deletejob，但 Writeup 中指出这种工作分配是不明确的。所以最好的办法就是在有前台进程的时候一直等待。

另外还要处理 eval 中的信号问题。Writeup 中提到：

在 eval 中，父进程在 fork 子进程之前，必须使用 sigprocmask 函数来阻断 SIGCHLD 信号，然后在使用 addjob 将子进程加入任务列表之后，再调用 sigprocmask 恢复 SIGCHLD 信号。因为子进程继承了父进程的中断向量，所以子进程必须在它执行新程序之前将 SIGCHILD 恢复。

父进程这样将 SIGCHLD 信号阻断，是为了避免子进程被 SIGCHLD 处理程序回收（然后被从任务列表中移除），_之后_父进程调用 addjob 时的竞态条件。

于是，我们在 eval 函数最前面声明变量的区域加入几行：

        c
        
        sigset_t mask;
    sigemptyset(&mask);

1
2


    sigset_t mask;
    sigemptyset(&mask);

然后在判断不是内置命令之后，阻断 SIGCHLD 信号，修改如下：

        c
        
        if (!builtin_cmd(argv)) // built-in command is done in `builtin_cmd`
    {
        sigaddset(&mask, SIGCHLD);
        sigprocmask(SIG_BLOCK, &mask, NULL); // 5. block SIGCHLD
        if ((pid = fork()) == 0)             // this is child

1
2
3
4
5


    if (!builtin_cmd(argv)) // built-in command is done in `builtin_cmd`
    {
        sigaddset(&mask, SIGCHLD);
        sigprocmask(SIG_BLOCK, &mask, NULL); // 5. block SIGCHLD
        if ((pid = fork()) == 0)             // this is child

然后，在子进程 execve 之前，恢复信号，修改如下：

        c
        
            if ((pid = fork()) == 0)             // this is child
        {
            sigprocmask(SIG_UNBLOCK, &mask, NULL);  // 5. unblock SIGCHLD
            if (execve(argv[0], argv, environ) <0) // execute command failed

1
2
3
4


        if ((pid = fork()) == 0)             // this is child
        {
            sigprocmask(SIG_UNBLOCK, &mask, NULL);  // 5. unblock SIGCHLD
            if (execve(argv[0], argv, environ) <0) // execute command failed

再然后，父进程 addjob 完毕后也要恢复：

        c
        
            addjob(jobs, pid, bg ? BG : FG, cmdline); // add the job to job list.
        // When bg=1, state=2; bg=0, state=1. this way it's just elegant
        sigprocmask(SIG_UNBLOCK, &mask, NULL); // 5. unblock SIGCHLD
        if (!bg)                               // run the process in foreground:

1
2
3
4


        addjob(jobs, pid, bg ? BG : FG, cmdline); // add the job to job list.
        // When bg=1, state=2; bg=0, state=1. this way it's just elegant
        sigprocmask(SIG_UNBLOCK, &mask, NULL); // 5. unblock SIGCHLD
        if (!bg)                               // run the process in foreground:

这样就可以通过 trace05 的测试了，Git Commit 见此处，通过截图如下。

trace06

trace06.txt - 将 SIGINT 信号发送到前台任务。

这个 trace 解决起来比较简单。最核心的，我们需要实现 SIGINT 信号的处理例程。这里使用 -pid 是为了将整个进程组的进程全部干掉。

        c
        
    void sigint_handler(int sig)
{
    pid_t pid = fgpid(jobs);    // get pid of foreground job
    if (kill(-pid, SIGINT) <0) // try to send SIGINT
    {
        unix_error("sigint error"); // failed
    }
    return;
}

1
2
3
4
5
6
7
8
9


void sigint_handler(int sig)
{
    pid_t pid = fgpid(jobs);    // get pid of foreground job
    if (kill(-pid, SIGINT) <0) // try to send SIGINT
    {
        unix_error("sigint error"); // failed
    }
    return;
}

在 tshref 中，终止进程后还会输出一行提示信息，由于这也算是子进程结束了，这部分也是在 sigchld_handler 中处理的。将函数修改为如下的样子即可。

        c
        
    void sigchld_handler(int sig) 
{
    pid_t pid;
    int status;
    while((pid=waitpid(-1,&status,WNOHANG|WUNTRACED))>0) // check if a child has become zombie, without wait
    {
        if(WIFEXITED(status))
        {
            deletejob(jobs,pid); // remove pid from job list
        }
        if (WIFSIGNALED(status))
        {
            printf("Job [%d] (%d) terminated by signal %d\n", pid2jid(pid), pid, WTERMSIG(status));
            deletejob(jobs, pid);
        }

    }
    if (pid < 0 && errno != ECHILD)
    {
        unix_error("waitpid error");
    }
    return;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


void sigchld_handler(int sig) 
{
    pid_t pid;
    int status;
    while((pid=waitpid(-1,&status,WNOHANG|WUNTRACED))>0) // check if a child has become zombie, without wait
    {
        if(WIFEXITED(status))
        {
            deletejob(jobs,pid); // remove pid from job list
        }
        if (WIFSIGNALED(status))
        {
            printf("Job [%d] (%d) terminated by signal %d\n", pid2jid(pid), pid, WTERMSIG(status));
            deletejob(jobs, pid);
        }

    }
    if (pid < 0 && errno != ECHILD)
    {
        unix_error("waitpid error");
    }
    return;
}

还有一个坑，在 Writeup 中已经提到。

当你在标准 Unix shell 中运行你的 shell 时，你的 shell 处于前台进程组。如果你的 shell 创建一个子进程，那么它默认也会被加到前台进程组内。因为输入 Ctrl+C 会向前台进程组的所有进程发送 SIGINT 信号，所以你输入 Ctrl+C 也会向你的 shell 和你创建的子进程发送 SIGINT，这显然不对。

这里有个解决办法：在 fork 之后，execve 之前，子进程应该调用 setpgid(0, 0)，来将子进程放置到一个新的进程组内，组 ID 与子进程 PID 相同。这确保了只会有一个进程——即你的 shell——处于前台进程组内。当你按下 Ctrl+C，shell 应该捕获 SIGINT 信号，然后将其传递到正确的前台应用（或更准确地，包含前台进程的进程组）。

长话短说，就是使用 Ctrl+C 结束 tsh 中运行的前台进程，会把 shell 一起干掉。解决办法就是在 execve 之前设置进程组。两个 0 分别代表要加入的是当前进程，以及新建一个 GID=PID 的组。

        c
        
                sigprocmask(SIG_UNBLOCK, &mask, NULL);  // 5. unblock SIGCHLD
            setpgid(0, 0);                          // put the child process (0=current) into a new process group (0=current)
            if (execve(argv[0], argv, environ) <0) // execute command failed

1
2
3


            sigprocmask(SIG_UNBLOCK, &mask, NULL);  // 5. unblock SIGCHLD
            setpgid(0, 0);                          // put the child process (0=current) into a new process group (0=current)
            if (execve(argv[0], argv, environ) <0) // execute command failed

Git Commit 见此处，通过截图见下。

trace07

trace07.txt - 将 SIGINT 信号只发送到前台任务。

其实单论测试的话，上个 trace 的程序现在也可以直接用，能过。但测试用例没有测试没有前台任务的情况，为了让程序更完善，还是要做一处修改。

在 sigint_handler 中。需要判断是否存在前台任务，如果没有，就不需要做任何事。这样，在什么都没运行的时候按下 Ctrl+C，tsh 就不会直接挂掉，什么都输不进去。在没有前台任务的情况下，fgpid 会返回 0，我们可以利用这个特性。

        c
        
    void sigint_handler(int sig)
{
    pid_t pid = fgpid(jobs);    // get pid of foreground job
    if (pid != 0) // if no foreground job (PID=0), do nothing (ONLY send to foreground)
    {
        if (kill(-pid, SIGINT) <0) // try to send SIGINT
        {
            unix_error("sigint error"); // failed
        }
    }
    return;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


void sigint_handler(int sig)
{
    pid_t pid = fgpid(jobs);    // get pid of foreground job
    if (pid != 0) // if no foreground job (PID=0), do nothing (ONLY send to foreground)
    {
        if (kill(-pid, SIGINT) <0) // try to send SIGINT
        {
            unix_error("sigint error"); // failed
        }
    }
    return;
}

Git Commit 见此处，通过截图见下。

trace08

trace08.txt - 将 SIGTSTP 信号只发送给前台任务。

SIGTSTP 对应的是 Ctrl+Z。实现方法很像上两个 trace 的方法，只需改 sigtstp_handler 和 sigchld_handler 就行了。

首先是 sigtstp_handler：

        c
        
    void sigtstp_handler(int sig) 
{
    pid_t pid = fgpid(jobs);
    if (pid != 0)
    {
        if (kill(-pid, SIGTSTP) <0)
        {
            unix_error("sigtstp error");
        }
    }
    return;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


void sigtstp_handler(int sig) 
{
    pid_t pid = fgpid(jobs);
    if (pid != 0)
    {
        if (kill(-pid, SIGTSTP) <0)
        {
            unix_error("sigtstp error");
        }
    }
    return;
}

然后是 sigchld_handler。注意这里额外地要将工作的状态改为停止（对应上文 addjob 说明处的三种状态类型）：

        c
        
            if (WIFSIGNALED(status)) // SIGINT, etc.
        {
            printf("Job [%d] (%d) terminated by signal %d\n", pid2jid(pid), pid, WTERMSIG(status));
            deletejob(jobs, pid);
        }
// 插入下面的部分
        if (WIFSTOPPED(status)) // SIGTSTP, etc.
        {
            printf("Job [%d] (%d) stopped by signal %d\n", pid2jid(pid), pid, WSTOPSIG(status));
            struct job_t *job = getjobpid(jobs, pid);
            job->state = ST;
        }

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


        if (WIFSIGNALED(status)) // SIGINT, etc.
        {
            printf("Job [%d] (%d) terminated by signal %d\n", pid2jid(pid), pid, WTERMSIG(status));
            deletejob(jobs, pid);
        }
// 插入下面的部分
        if (WIFSTOPPED(status)) // SIGTSTP, etc.
        {
            printf("Job [%d] (%d) stopped by signal %d\n", pid2jid(pid), pid, WSTOPSIG(status));
            struct job_t *job = getjobpid(jobs, pid);
            job->state = ST;
        }

这样就通过了。Git Commit 见此处，通过截图见下。

trace09/10

trace09.txt - 处理 bg 内置命令

trace10.txt - 处理 fg 内置命令

bg 和 fg 命令是由 do_bgfg 函数处理的，我们需要在 builtin_cmd 里添加合适的调用。

        c
        
        if (strcmp(argv[0], "bg") == 0 || strcmp(argv[0], "fg") == 0) // judge bg & fg
    {
        do_bgfg(argv);
        return 1;
    }

1
2
3
4
5


    if (strcmp(argv[0], "bg") == 0 || strcmp(argv[0], "fg") == 0) // judge bg & fg
    {
        do_bgfg(argv);
        return 1;
    }

bg 和 fg 命令的参数 <job> 可以是 PID 或者 JID。在 Writeup 的 Specification 一节中，有这样一段话：

bg <job> 命令通过发送 SIGCONT 指令给工作来使它重新开始，然后让它运行在后台。

fg <job> 命令通过发送 SIGCONT 指令给工作来使它重新开始，然后让它运行在前台。

这么一来，用一个函数处理两个命令就显得很合理了。在 do_bgfg 函数中，要获取参数中的 PID 或者 JID，解析为合适的任务类型指针，发送 SIGCONT 信号，然后根据前台和后台决定所要做的事情。

当然，首先不能忘了定义变量。end 的作用将在后面说明。

        c
        
        char *id = argv[1], *end; // JID or PID
    struct job_t *job;
    int numid;

1
2
3


    char *id = argv[1], *end; // JID or PID
    struct job_t *job;
    int numid;

然后检查参数是否存在。

        c
        
        // extract job or process
    if (id == NULL) // not specified
    {
        printf("%s command requires PID or %%jobid argument\n", argv[0]);
        return;
    }

1
2
3
4
5
6


    // extract job or process
    if (id == NULL) // not specified
    {
        printf("%s command requires PID or %%jobid argument\n", argv[0]);
        return;
    }

对 JID 和 PID 的第一、二步处理是不尽相同的，斟酌再三还是分开处理为好。

首先是 JID 的情况。将 id 指针自增 1，是为了让指针指向第一个数字，然后使用 strtol 功能将其从字符串转为数字。在转换的过程中，end 会被设定为指向被转换的最后一个数字的下一个字符。正常情况下，JID/PID 并不应该包含除开头 % 号外的字符，所以 end 指向的应该是表示字符串结尾的 \0。具体可以参考这个链接，但他的判断条件只能保证不以字符开头，如果遇到类似于 1a23 的字符串，仍然不会提示异常，并返回 1。虽然我这么做只是为了避免多遍历一遍字符串判断是否有其他字符，稍快一点，代码也优雅一点……

然后就是调用 getjobjid 得到 job 了，再加一个是否存在的判断。

        c
        
        if (id[0] == '%') // this is a job
    {
        id++;                         // point to the number position
        numid = strtol(id, &end, 10); // convert id char[] to integer
        if (*end !='\0')             // contains non-digit characters
        {
            printf("%s: argument must be a PID or %%jobid\n", argv[0]);
            return;
        }
        job = getjobjid(jobs, numid); // try to get job
        if (job == NULL)
        {
            printf("%%%d: No such job\n", numid);
            return;
        }
    }

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


    if (id[0] == '%') // this is a job
    {
        id++;                         // point to the number position
        numid = strtol(id, &end, 10); // convert id char[] to integer
        if (*end !='\0')             // contains non-digit characters
        {
            printf("%s: argument must be a PID or %%jobid\n", argv[0]);
            return;
        }
        job = getjobjid(jobs, numid); // try to get job
        if (job == NULL)
        {
            printf("%%%d: No such job\n", numid);
            return;
        }
    }

对于 PID 的情况，不同的地方只在于没有自增，换了适用于 PID 的函数，以及提示信息改变而已。

        c
        
        else // this is a process
    {
        numid = strtol(id, &end, 10);
        if (*end !='\0')
        {
            printf("%s: argument must be a PID or %%jobid\n", argv[0]);
            return;
        }
        job = getjobpid(jobs, numid); // try to get proc
        if (job == NULL)
        {
            printf("(%d): No such process\n", atoi(id));
            return;
        }
    }

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


    else // this is a process
    {
        numid = strtol(id, &end, 10);
        if (*end !='\0')
        {
            printf("%s: argument must be a PID or %%jobid\n", argv[0]);
            return;
        }
        job = getjobpid(jobs, numid); // try to get proc
        if (job == NULL)
        {
            printf("(%d): No such process\n", atoi(id));
            return;
        }
    }

发送信号很简单，就一行。仍然是给全组发送。

        c
        
        kill(-(job->pid), SIGCONT);

1

    kill(-(job->pid), SIGCONT);

最后是根据前台或者后台的要求，做出相应的行为，这与 eval 最后的行为比较类似，将 waitfg 封装起来也是便于这里再利用。

        c
        
        if (strcmp(argv[0], "fg") == 0) // foreground
    {
        job->state = FG;
        waitfg(job->pid);
    }
    else // background
    {
        job->state = BG;
        printf("[%d] (%d) %s", job->jid, job->pid, job->cmdline);
    }

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


    if (strcmp(argv[0], "fg") == 0) // foreground
    {
        job->state = FG;
        waitfg(job->pid);
    }
    else // background
    {
        job->state = BG;
        printf("[%d] (%d) %s", job->jid, job->pid, job->cmdline);
    }

把这些都填上，就实现了 bg 和 fg 内置命令的处理。Git Commit 见此处，通过图如下。

trace11-16

trace11.txt - 将 SIGINT 信号发送给前台进程集里的每个进程

trace12.txt - 将 SIGTSTP 信号发送给前台进程集里的每个进程

trace13.txt - 将进程集里的每个停止的进程重启

trace14.txt - 简单的错误处理

trace15.txt - 全都混到一起

trace16.txt - 测试 shell 是否能够处理来自其他进程（而不是终端）的 SIGTSTP 和 SIGINT 信号

到上一节，我们所写的 Shell 应该也能够完美处理这些测试用例，你什么都不用做，应该就能看到正确的输出。
在 Trace 11-12 中，要做到 “发送给进程集中的每个进程”，重点在于 kill(-pid, signal) 中的负号。这表示对整个进程集发送。Trace 11 中 SIGINT 把整个进程集干掉了，所以 ps 的输出应该没有任何一个 mysplit；而 Trace 12 让整个进程集停止，所以在 ps 的输出中，你应该能看到两个停止状态的 mysplit。关于 ps 的输出，可以参见这里。

在 Trace 13 中，mysplit 先被停止，然后被转到前台运行，直到结束。所以可以观察到第一个 ps 的输出中有两个等几秒，然后第二个 ps 的输出没有 mysplit。

如果说在前面不知道错误处理的文字怎么写，可以参考 make rtest14 的输出。

Trace 16 的处理方法其实有点意思。在前面写的过程中，我们可以发现 SIGINT 和 SIGTSTP 的提示输出都放到了 sigchld_handler 中，而不是各自的 handler 函数中。因为如果放到各自的 handler 中，就不会在 Trace 16 的情况下被唤起（因为信号不是发给 Shell 的），而 sigchld_handler 却可以接收因任何原因造成的停止，恰巧可以使用 WIFSIGNALED 等函数判断停止原因，所以也能够实现分信号的处理。参考文献 3 的 Step 6 就是这方面的讲解，值得一看。

参考文献

https://zhuanlan.zhihu.com/p/89224358（只有代码，抄倒是可以抄）
https://hackmd.io/@KYWeng/SyK1Hk63S（讲得很详细，主要参考这篇；做到了高于实验标准）
https://www.jianshu.com/p/7f5e78e83a0e（分 step 的想法不错，而且 Step 6 讲得很好）
http://home.ustc.edu.cn/~liuly0322/blog/2022/03/19/csapp-shell/（结果测试方法值得学习，Trace 16 讲得也明白）

参考文献可能有各自的许可证。