zero down-time update服務的方案

發(fā)布時間：2020-07-21 07:17:09 來源：網絡閱讀：309 作者：lee哥說架構欄目：編程語言

從問題開始

先來拋一塊磚，對于靜態(tài)編譯的應用程序，比如用C、C++、Golang或者其它的語言編寫的程序，如果我們修改一個BUG或者添加一個新的特性后，如何在服務不下線的情況下更遠應用程序呢？

拋出了一個問題，一個很平常的問題，有人對問題思考比較透徹，比如牛頓，被蘋果砸中了之后，引起了很多的思考，最后發(fā)現(xiàn)了萬有引力定律。

如果你被蘋果砸中了怎么辦？

玩笑話一句，那我們如果被蘋果砸中了會不死變成智障呢？

那么我們回到剛才這個問題：

當我們修復BUG，添加新的需求后，如何如絲般順滑地升級服務器應用程序，而不會中斷服務？

這個問題意味著：

C / C++ / GO都是靜態(tài)語言，所有的指令都編譯在可執(zhí)行文件，升級就意味著編譯新的執(zhí)行文件替換舊的執(zhí)行文件，已經運行的進程如何加載新的image（可執(zhí)行程序文件）去執(zhí)行呢？

正在處理的業(yè)務邏輯不能中斷，正在處理的連接不能暴力中斷？

這種如絲般順滑地升級應用程序，我們稱之為熱更新。

用個形象上的比喻表示就是：

你現(xiàn)在在坐卡車，卡車開到了150KM/H

然后，有個輪胎，爆了

然后，司機說，你就直接換吧，我不停車。你小心點換

哦，Lee哥，我明白了，在這些情況下，我們是不能使用哪個萬能地“重啟”去解決問題的。

第一種解決方案：灰度發(fā)布和A/B測試引起的思考

灰度發(fā)布（又名金絲雀發(fā)布）是指在黑與白之間，能夠平滑過渡的一種發(fā)布方式。在其上可以進行A/B testing，即讓一部分用戶繼續(xù)用產品特性A，一部分用戶開始用產品特性B，如果用戶對B沒有什么反對意見，那么逐步擴大范圍，把所有用戶都遷移到B 上面來?；叶劝l(fā)布可以保證整體系統(tǒng)的穩(wěn)定，在初始灰度的時候就可以發(fā)現(xiàn)、調整問題，以保證其影響度。利用nginx做灰度發(fā)布的方案如下圖：

nginx是一個反向代理軟件，可以把外網的請求轉發(fā)到內網的業(yè)務服務器上，系統(tǒng)的分層的設計，一般我們把nginx歸為接入層，當然LVS/F5/Apache等等都能去轉發(fā)用戶請求。比如我們來看一個nginx的配置：

http {

    upstream cluster {

        ip_hash;

        server 192.168.2.128:8086 weight=1 fail_timeout=15 max_fails =3;

        server 192.168.2.130:8086 weight=2 fail_timeout=15 max_fails =3;

    }

    server {

        listen 8080;

        location / {

            proxy_pass http://cluster;

        }

    }

}

我們對8080端口的訪問，都會轉發(fā)到cluster說定義的upstream里，upstream里會根據IP hash的策略轉發(fā)給192.168.2.128和192.168.2.130的8086端口的服務上。這里配置的是ip hash，當然nginx還支持其他策略。

那么通過nginx如何去如絲般升級服務程序呢？

比如nginx的配置：

http {  

    upstream cluster {  

        ip_hash;

        server 192.168.2.128:8086 weight=1 fail_timeout=15 max_fails =3;

        server 192.168.2.130:8086 weight=2 fail_timeout=15 max_fails =3;

    }  

    server {  

        listen 80;  

        location / {

            proxy_pass http://cluster;  

        }  

    }  

}

假如我們的服務部署在192.168.2.128上，現(xiàn)在我們修復BUG或者增加新的特性后，我們重新部署了一臺服務（比如192.168.2.130上），那么我們就可以修改nginx配置如上，然后執(zhí)行nginx -s reload加載新的配置，這樣我們現(xiàn)有的連接和服務都沒有斷掉，但是新的業(yè)務服務已經可以開始服務了，這就是通過nginx做的灰度發(fā)布，依據這樣的方法做的測試稱之為A/B測試，好了，那如何讓老的服務徹底停掉呢？

可以修改nginx的配置如下，即在對應的upstream的服務器上添加down字段：

http {  

    upstream cluster {  

        ip_hash;

server 192.168.2.128:8086 weight=1 fail_timeout=15 max_fails =3down;

        server 192.168.2.130:8086 weight=2 fail_timeout=15 max_fails =3;

    }  

    server {  

        listen 80;  

        location / {

            proxy_pass http://cluster;  

        }  

    }  

}

這樣等過一段時間，就可以把192.168.2.128上的服務給停掉了。

這就是通過接入層nginx的一個如絲般順滑的一個方案，這種思想同樣可以應用于其他的比如LVS、apache等，當然還可以通過DNS，zookeeper，etcd等，就是把流量全都打到新的系統(tǒng)上去。

灰度發(fā)布解決的流量轉發(fā)到新的系統(tǒng)中去，但是如果對于nginx這樣的應用程序，或者我就是要在這臺機器上升級image，那怎么辦呢？這就必須要實現(xiàn)熱更新，這里需要考慮的問題是舊的服務如果緩存了數據怎么辦？如果正在處理業(yè)務邏輯怎么辦？

第二種解決方案：nginx的熱更新方案

nginx采用Master/Worker的多進程模型，Master進程負責整個nginx進程的管理，比如停機、日志重啟和熱更新等等，worker進程負責用戶的請求處理。

如上一個nginx里配置的所有的監(jiān)聽端口都是首先在Master進程里create的socket（sfd）、bind、listen，然后Master在創(chuàng)建worker進程的時候把這些socket通過unix domain socket復制給了Worker進程，Worker進程把這些socket全都添加到epoll，之后如果有客戶端連接進來了，則由worker進程負責處理，那么也就是說用戶的請求是由worker進程處理的。

先交代了nginx的IO處理模型的背景，然后我們再看nginx的熱更新方案：

升級的步驟：

第一步：升級nginx二進制文件，需要先將新的nginx可執(zhí)行文件替換原有舊的nginx文件，然后給nginx master進程發(fā)送USR2信號，告知其開始升級可執(zhí)行文件；nginx master進程會將老的pid文件增加.oldbin后綴，然后調用exec函數拉起新的master和worker進程，并寫入新的master進程的pid。

UID        PID  PPID  C STIME TTY          TIME CMD

root      4584     1  0 Oct17 ?        00:00:00 nginx: master process /usr/local/apigw/apigw_nginx/nginx

root     12936  4584  0 Oct26 ?        00:03:24 nginx: worker process

root     12937  4584  0 Oct26 ?        00:00:04 nginx: worker process

root     12938  4584  0 Oct26 ?        00:00:04 nginx: worker process

root     23692  4584  0 21:28 ?        00:00:00 nginx: master process /usr/local/apigw/apigw_nginx/nginx

root     23693 23692  3 21:28 ?        00:00:00 nginx: worker process

root     23694 23692  3 21:28 ?        00:00:00 nginx: worker process

root     23695 23692  3 21:28 ?        00:00:00 nginx: worker process

關于exec家族的函數說明見下：

NAME

       execl, execlp, execle, execv, execvp, execvpe - execute a file

SYNOPSIS

       #include <unistd.h>

       extern char **environ;

       int execl(const char *path, const char *arg, ...

                       /* (char  *) NULL */);

       int execlp(const char *file, const char *arg, ...

                       /* (char  *) NULL */);

       int execle(const char *path, const char *arg, ...

                       /*, (char *) NULL, char * const envp[] */);

       int execv(const char *path, char *const argv[]);

       int execvp(const char *file, char *const argv[]);

       int execvpe(const char *file, char *const argv[],

                       char *const envp[]);

   Feature Test Macro Requirements for glibc (see feature_test_macros(7)):

       execvpe(): _GNU_SOURCE

DESCRIPTION

The  exec()  family of functions replaces the current process image with a new process image.  The functions described in this manual page are front-ends for execve(2).

       (See the manual page for execve(2) for further details about the replacement of the current process image.)

       The initial argument for these functions is the name of a file that is to be executed.

       The const char *arg and subsequent ellipses in the execl(), execlp(), and execle() functions can be thought of as arg0, arg1, ..., argn.  Together they describe a  list

       of  one or more pointers to null-terminated strings that represent the argument list available to the executed program.  The first argument, by convention, should point

       to the filename associated with the file being executed.  The list of arguments must be terminated by a null pointer, and, since  these  are  variadic  functions,  this

       pointer must be cast (char *) NULL.

       The  execv(),  execvp(),  and execvpe() functions provide an array of pointers to null-terminated strings that represent the argument list available to the new program.

       The first argument, by convention, should point to the filename associated with the file being executed.  The array of pointers must be terminated by a null pointer.

       The execle() and execvpe() functions allow the caller to specify the environment of the executed program via the argument envp.  The envp argument is an array of point‐

       ers  to null-terminated strings and must be terminated by a null pointer.  The other functions take the environment for the new process image from the external variable

       environ in the calling process.

第二步：在此之后，所有工作進程(包括舊進程和新進程)將會繼續(xù)接受請求。這時候，需要發(fā)送WINCH信號給nginx master進程，master進程將會向worker進程發(fā)送消息，告知其需要進行graceful shutdown，worker進程會在連接處理完之后進行退出。


UID        PID  PPID  C STIME TTY          TIME CMD

root      4584     1  0 Oct17 ?        00:00:00 nginx: master process /usr/local/apigw/apigw_nginx/nginx

root     12936  4584  0 Oct26 ?        00:03:24 nginx: worker process

root     12937  4584  0 Oct26 ?        00:00:04 nginx: worker process

root     12938  4584  0 Oct26 ?        00:00:04 nginx: worker process

root     23692  4584  0 21:28 ?        00:00:00 nginx: master process /usr/local/apigw/apigw_nginx/nginx

如果舊的worker進程還需要處理連接，則worker進程不會立即退出，需要待消息處理完后再退出。

第三步：經過一段時間之后，將會只會有新的worker進程處理新的連接。

注意，舊master進程并不會關閉它的listen socket；因為如果出問題后，需要回滾，master進程需要法重新啟動它的worker進程。

第四步：如果升級成功，則可以向舊master進程發(fā)送QUIT信號，停止老的master進程；如果新的master進程（意外）退出，那么舊master進程將會去掉自己的pid文件的.oldbin后綴。

幾個核心的步驟和命令說明如下：

操作的命令

master進程相關信號

USR2 升級可執(zhí)行文件
WINCH 優(yōu)雅停止worker進程
QUIT 優(yōu)雅停止master進程

worker進程相關信號

TERM, INT 快速退出進程
QUIT 優(yōu)雅停止進程

nginx本身是一個代理組件（代理http TCP UDP），本身并沒有什么業(yè)務邏輯，也即沒有什么狀態(tài)數據可言，即使有業(yè)務邏輯這套方案也是可以的。

nginx是如何graceful shutdown的？也即正在處理的http請求和長連接怎么處理？

如何啟動新的的image：

好了，以上就是zero down-time update的一些方案，如果還有不明白可以看下面這個視頻。
https://www.bilibili.com/video/av57429199

向AI問一下細節(jié)

zero down-time update服務的方案

猜你喜歡

最新資訊

相關推薦

相關標簽