tinyhttpd 阅读与分析

tinyhttpd 是一个简易的 http 服务器，支持CGI。代码量少，非常容易阅读，十分适合网络编程初学者学习的项目。麻雀虽小，五脏俱全。在tinyhttpd中可以学到 linux 上进程的创建，管道的使用。linux 下 socket 编程基本方法和http 协议的最基本结构。

1. 主要函数和作用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


  void accept_request(int);//处理从套接字上监听到的一个 HTTP 请求
  void bad_request(int);//返回给客户端这是个错误请求，400响应码
  void cat(int, FILE *);//读取服务器上某个文件写到 socket 套接字
  void cannot_execute(int);//处理发生在执行 cgi 程序时出现的错误
  void error_die(const char *);//把错误信息写到 perror
  void execute_cgi(int, const char *, const char *, const char *);//运行cgi脚本，这个非常重要，涉及动态解析
  int get_line(int, char *, int);//读取一行HTTP报文
  void headers(int, const char *);//返回HTTP响应头
  void not_found(int);//返回找不到请求文件
  void serve_file(int, const char *);//调用 cat 把服务器文件内容返回给浏览器。
  int startup(u_short *);//开启http服务，包括建立套接字，绑定端口，监听，开启线程处理链接
  void unimplemented(int);//返回给浏览器表明收到的 HTTP 请求所用的 method 不被支持。

源码阅读顺序： main -> startup -> accept_request -> execute_cgi

2. 程序运行流程

2.1 安装运行tinyhttpd

这个项目不能直接在Linux上编译运行,它本来是在solaris上实现的，需要作如下修改：

33行改为 void *accept_request(void *); 所以下面的实现也要修改下：

1
2
3


        void *accept_request(void* tclient)   {   
        int client = *(int *)tclient; 
        //同时注意此函数两个返回值改为return NULL;

438行和483行的变量类型改为socklen_t
497行改为 if (pthread_create(&newthread , NULL, accept_request, (void*)&client_sock) != 0)
Makefile中编译的一行改为gcc -W -Wall -o httpd httpd.c -lpthread

2.2 程序流程

程序运行流程如下图所示： tinyhttpd 流程图

服务器启动，在指定端口或随机选取端口绑定 httpd 服务。
收到一个 HTTP 请求时（其实就是 listen 的端口 accpet 的时候），派生一个线程运行 accept_request 函数。
取出 HTTP 请求中的 method (GET 或 POST) 和 url,。对于 GET 方法，如果有携带参数，则 query_string 指针指向 url 中？后面的 GET 参数。
格式化 url 到 path 数组，表示浏览器请求的服务器文件路径，在 tinyhttpd 中服务器文件是在 htdocs 文件夹下。当 url 以 / 结尾，或 url 是个目录，则默认在 path 中加上 index.html，表示访问主页。
如果文件路径合法，对于无参数的 GET 请求，直接输出服务器文件到浏览器，即用 HTTP 格式写到套接字上，跳到（10）。其他情况（带参数 GET，POST 方式，url 为可执行文件），则调用 excute_cgi 函数执行 cgi 脚本。
读取整个 HTTP 请求并丢弃，如果是 POST 则找出 Content-Length. 把 HTTP 200 状态码写到套接字。
建立两个管道，cgi_input 和 cgi_output, 并 fork 一个进程。
在子进程中，把 STDOUT 重定向到 cgi_outputt 的写入端，把 STDIN 重定向到 cgi_input 的读取端，关闭 cgi_input 的写入端和 cgi_output 的读取端，设置 request_method 的环境变量，GET 的话设置 query_string 的环境变量，POST 的话设置 content_length 的环境变量，这些环境变量都是为了给 cgi 脚本调用，接着用 execl 运行 cgi 程序。
在父进程中，关闭 cgi_input 的读取端和 cgi_output 的写入端，如果 POST 的话，把 POST 数据写入 cgi_input，已被重定向到 STDIN，读取 cgi_output 的管道输出到客户端，该管道输入是 STDOUT。接着关闭所有管道，等待子进程结束。
关闭与浏览器的连接，完成了一次 HTTP 请求与回应，因为 HTTP 是无连接的。

2.3 函数分析

main
- 程序入口，通过 startup 函数来绑定和监听端口,accept 一个客户端链接后创建一个线程调用 accept_request 函数来处理用户发来的 HTTP 请求报文。
startup
- 建立socket绑定端口（bind）并且开始监听（listen），等待客户端的握手信息
accept_request
- 通过 get_line 按行处理 HTTP 请求。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83


    void *accept_request(void *tclient)
    {
    int client = *(int *)tclient;
    char buf[1024];
    int numchars;
    char method[255]; //保存请求行中的方法GET或者POST
    char url[255]; //保存请求行中的url字段
    char path[512]; //保存请求行中文件的服务器的路径
    size_t i, j;
    struct stat st;
    int cgi = 0;      /* becomes true if server decides this is a CGI
                        * program */
    char *query_string = NULL; // GET请求中？之后的查询参数

    numchars = get_line(client, buf, sizeof(buf)); //读取第一行
    i = 0; j = 0;
    while (!ISspace(buf[j]) && (i < sizeof(method) - 1)) //读取请求方法
    {
      method[i] = buf[j];
      i++; j++;
    }
    method[i] = '\0';

    if (strcasecmp(method, "GET") && strcasecmp(method, "POST")) //如果不是GET或者POST方法则返回501错误
    {
      unimplemented(client);
      return NULL;
    }

    if (strcasecmp(method, "POST") == 0)
      cgi = 1;

    i = 0;
    while (ISspace(buf[j]) && (j < sizeof(buf)))
      j++;
    while (!ISspace(buf[j]) && (i < sizeof(url) - 1) && (j < sizeof(buf))) //读取请求的url路径
    {
      url[i] = buf[j];
      i++; j++;
    }
    url[i] = '\0';

    if (strcasecmp(method, "GET") == 0)
    {
      query_string = url; //请求信息
      while ((*query_string != '?') && (*query_string != '\0')) // 截取“？”之前的字符，这之前部分为路径之后部分为参数
      query_string++;
      if (*query_string == '?') // 如果url中存在“？”，则该请求是动态请求
      {
      cgi = 1;
      *query_string = '\0';
      query_string++;
      }
    }

    //根据url拼接url在服务器上面的路径
    sprintf(path, "htdocs%s", url); 
    if (path[strlen(path) - 1] == '/') //如果url是目录则定位至该目录下index.html
      strcat(path, "index.html");
    //查找url指向的文件
    if (stat(path, &st) == -1) { //如果未找到文件
      while ((numchars > 0) && strcmp("\n", buf))  /* read & discard headers */
      numchars = get_line(client, buf, sizeof(buf)); //从客户端读取数据到buf
      not_found(client); //回应客户端未找到
    }
    else
    {
      if ((st.st_mode & S_IFMT) == S_IFDIR) //如果path为目录，则默认使用该目录下index.html文件
      strcat(path, "/index.html");
      //如果path是可执行文件，则设置cgi标志
      if ((st.st_mode & S_IXUSR) ||
          (st.st_mode & S_IXGRP) ||
          (st.st_mode & S_IXOTH)    )
      cgi = 1;
      if (!cgi) //静态页面请求
      serve_file(client, path); //直接返回文件信息
      else //动态页面请求
      execute_cgi(client, path, method, query_string); //执行cgi脚本
    }

    close(client); //关闭客户端套接字
    return NULL;
    }

execute_cgi
- fork 一个子进程执行可执行文件，然后通过管道将结果返回父进程，进而返回客户端。
- 如果是 get 方法，就读取并丢弃整个 http 首部。如果是 post 方法，还会从中 content_length 长度。
- 建立两个管道， cgi_input 和 cgi_output ，并 fork 一个进程（必须 fork 子进程，pipe 管道才有意义）。建立父子进程间的通信机制。
- 在子进程中，对其进程下的管道进行重定向，并设置对应的环境变量（method、 query_string 、 content_length ），这些环境变量都是为了给 cgi 脚本调用，接着用 execl 运行 cgi 脚本，可以看出 cgi 脚本的执行在子进程中进行，然后结果通过管道以及重定向返回给父进程。
- 父进程中，关闭管道一端，如果是 POST 方式，则把 POST 数据写入 cgi_intput ，已被重定向到 STDIN，读取 cgi_output 。管道输出到客户端（浏览器输出），具体流程图参见上面的管道最终状态图。接着关闭所有管道，等待子进程结束。
- 关闭连接，完成一次 HTTP 请求与回应。

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109


  void execute_cgi(int client, const char *path,
                 const char *method, const char *query_string)
{
 char buf[1024];
 int cgi_output[2];
 int cgi_input[2];
 pid_t pid;
 int status;
 int i;
 char c;
 int numchars = 1;
 int content_length = -1;

 buf[0] = 'A'; buf[1] = '\0';
 if (strcasecmp(method, "GET") == 0)
  while ((numchars > 0) && strcmp("\n", buf))  /* read & discard headers */
   numchars = get_line(client, buf, sizeof(buf));
 else    /* POST */
 {
  numchars = get_line(client, buf, sizeof(buf));
  //获取HTTP消息实体的传输长度
  while ((numchars > 0) && strcmp("\n", buf)) 
  {
   buf[15] = '\0';
   if (strcasecmp(buf, "Content-Length:") == 0) //是否为Content-Length字段
    content_length = atoi(&(buf[16])); //读取Content-Length（描述HTTP消息实体的传输长度）
   numchars = get_line(client, buf, sizeof(buf));
  }
  if (content_length == -1) {
   bad_request(client); //请求页面为空
   return;
  }
 }

 sprintf(buf, "HTTP/1.0 200 OK\r\n"); //请求成功
 send(client, buf, strlen(buf), 0);

//管道建立失败
 if (pipe(cgi_output) < 0) { 
  cannot_execute(client);
  return;
 }
 if (pipe(cgi_input) < 0) {
  cannot_execute(client);
  return;
 }

 if ( (pid = fork()) < 0 ) {
  cannot_execute(client);
  return;
 }
 /*
 子进程继承了父进程的 pipe，然后通过关闭子进程 output 管道的输出端，input 管道的写入端；
 关闭父进程 output 管道的写入端，input 管道的输出端
 */
 if (pid == 0)  /* child: CGI script */
 {
  char meth_env[255];
  char query_env[255];
  char length_env[255];

//复制文件句柄，重定向进程的标准输入输出
//dup2 的第一个参数描述符关闭
  dup2(cgi_output[1], 1); //标准输出重定向到output管道的写入端
  dup2(cgi_input[0], 0); //标准输入重定向到input管道的读取端
  close(cgi_output[0]); //关闭output管道的写入端
  close(cgi_input[1]); //关闭input管道的输出端
  sprintf(meth_env, "REQUEST_METHOD=%s", method);
  putenv(meth_env);
  if (strcasecmp(method, "GET") == 0) {
    //设置query_string的环境变量
   sprintf(query_env, "QUERY_STRING=%s", query_string); 
   putenv(query_env);
  }
  else {   /* POST */
  //设置content——string的环境变量
   sprintf(length_env, "CONTENT_LENGTH=%d", content_length);
   putenv(length_env);
  }
  execl(path, path, NULL);//exec 函数簇，执行 CGI 脚本，获取 cgi 的标准输出作为相应内容发送给客户端
//通过 dup2 重定向，标准输出内容进入管道 output 的输入端

  exit(0); //子进程退出

 } else {    /* parent */
  close(cgi_output[1]); //关闭管道的一端，这样可以建立父子进程间的管道通信
  close(cgi_input[0]);
  /*
  通过关闭对应管道的通道，然后重定向子进程的管道某端，这样就在父子进程之间构建一条单双工通道
  如果不重定向，将是一条典型的全双工管道通信机制
  */
  if (strcasecmp(method, "POST") == 0)
   for (i = 0; i < content_length; i++) {
    recv(client, &c, 1, 0); //从客户端接受单个字符
    write(cgi_input[1], &c, 1); //写入input然后重定向到标准输入
    /*
    数据传送过程：
    input[1](父进程) ——> input[0](子进程)[执行 cgi 函数] ——> STDIN ——> STDOUT
    ——> output[1](子进程) ——> output[0](父进程)[将结果发送给客户端]
     */
   }
  while (read(cgi_output[0], &c, 1) > 0) //读取 output 的管道输出到客户端，output 输出端为 cgi 脚本执行后的内容
   send(client, &c, 1, 0);

  close(cgi_output[0]); //关闭剩下的管道端，子进程在执行 dup2 之后，就已经关闭了管道一端通道
  close(cgi_input[1]);
  waitpid(pid, &status, 0); //等待子进程终止
 }
}

3. 待改进点

只用了 502 行代码就实现了一个 http 服务器，代码还是十分精炼的，逻辑非常清晰，具有单独的错误处理模块。使用线程模型实现并发服务器，这些优点还是非常值得肯定的。

但是下面几点也是十分有必要进行改进的

没有更改当前工作目录，如果不是在当前目录启动，则服务器将找不到对应的资源目录，当然，代码中使用相对路径是相同弊病
没有将运行、通信情况记入 log，这样对于以往的服务器工作状况、通信状况，我们将无从得知
http 请求的处理非常低效
没有对拒绝服务攻击做出处理，即便不是拒绝服务攻击，仅仅是连接数目过多都会使服务器线程开辟过多而占用过多资源，这一点如果换成 IO 复用模型可以得到一定的改善
没有对信号进行处理，甚至没有对 EINTR 错误的处理

Contents