溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊(cè)×
其他方式登錄
點(diǎn)擊 登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

Linux網(wǎng)絡(luò)中數(shù)據(jù)包的接收過(guò)程是怎樣的

發(fā)布時(shí)間:2022-01-07 16:31:46 來(lái)源:億速云 閱讀:142 作者:柒染 欄目:系統(tǒng)運(yùn)維

本篇文章為大家展示了Linux網(wǎng)絡(luò)中數(shù)據(jù)包的接收過(guò)程是怎樣的,內(nèi)容簡(jiǎn)明扼要并且容易理解,絕對(duì)能使你眼前一亮,通過(guò)這篇文章的詳細(xì)介紹希望你能有所收獲。

下面將介紹在Linux系統(tǒng)中,數(shù)據(jù)包是如何一步一步從網(wǎng)卡傳到進(jìn)程手中的。

如果英文沒(méi)有問(wèn)題,強(qiáng)烈建議閱讀后面參考里的兩篇文章,里面介紹的更詳細(xì)。

小編只討論以太網(wǎng)的物理網(wǎng)卡,不涉及虛擬設(shè)備,并且以一個(gè)UDP包的接收過(guò)程作為示例.

示例里列出的函數(shù)調(diào)用關(guān)系來(lái)自于kernel  3.13.0,如果你的內(nèi)核不是這個(gè)版本,函數(shù)名稱和相關(guān)路徑可能不一樣,但背后的原理應(yīng)該是一樣的(或者有細(xì)微差別)

網(wǎng)卡到內(nèi)存

網(wǎng)卡需要有驅(qū)動(dòng)才能工作,驅(qū)動(dòng)是加載到內(nèi)核中的模塊,負(fù)責(zé)銜接網(wǎng)卡和內(nèi)核的網(wǎng)絡(luò)模塊,驅(qū)動(dòng)在加載的時(shí)候?qū)⒆约鹤?cè)進(jìn)網(wǎng)絡(luò)模塊,當(dāng)相應(yīng)的網(wǎng)卡收到數(shù)據(jù)包時(shí),網(wǎng)絡(luò)模塊會(huì)調(diào)用相應(yīng)的驅(qū)動(dòng)程序處理數(shù)據(jù)。

下圖展示了數(shù)據(jù)包(packet)如何進(jìn)入內(nèi)存,并被內(nèi)核的網(wǎng)絡(luò)模塊開(kāi)始處理:

                   +-----+                    |     |                            Memroy +--------+   1     |     |  2  DMA     +--------+--------+--------+--------+ | Packet |-------->| NIC |------------>| Packet | Packet | Packet | ...... | +--------+         |     |             +--------+--------+--------+--------+                    |     |<--------+                    +-----+         |                       |            +---------------+                       |                            |                     3 | Raise IRQ                  | Disable IRQ                       |                          5 |                       |                            |                       &darr;                            |                    +-----+                   +------------+                    |     |  Run IRQ handler  |            |                    | CPU |------------------>| NIC Driver |                    |     |       4           |            |                    +-----+                   +------------+                                                    |                                                 6  | Raise soft IRQ                                                    |                                                    &darr;

1: 數(shù)據(jù)包從外面的網(wǎng)絡(luò)進(jìn)入物理網(wǎng)卡。如果目的地址不是該網(wǎng)卡,且該網(wǎng)卡沒(méi)有開(kāi)啟混雜模式,該包會(huì)被網(wǎng)卡丟棄。

2: 網(wǎng)卡將數(shù)據(jù)包通過(guò)DMA的方式寫(xiě)入到指定的內(nèi)存地址,該地址由網(wǎng)卡驅(qū)動(dòng)分配并初始化。注: 老的網(wǎng)卡可能不支持DMA,不過(guò)新的網(wǎng)卡一般都支持。

3: 網(wǎng)卡通過(guò)硬件中斷(IRQ)通知CPU,告訴它有數(shù)據(jù)來(lái)了

4: CPU根據(jù)中斷表,調(diào)用已經(jīng)注冊(cè)的中斷函數(shù),這個(gè)中斷函數(shù)會(huì)調(diào)到驅(qū)動(dòng)程序(NIC Driver)中相應(yīng)的函數(shù)

5:  驅(qū)動(dòng)先禁用網(wǎng)卡的中斷,表示驅(qū)動(dòng)程序已經(jīng)知道內(nèi)存中有數(shù)據(jù)了,告訴網(wǎng)卡下次再收到數(shù)據(jù)包直接寫(xiě)內(nèi)存就可以了,不要再通知CPU了,這樣可以提高效率,避免CPU不停的被中斷。

6:  啟動(dòng)軟中斷。這步結(jié)束后,硬件中斷處理函數(shù)就結(jié)束返回了。由于硬中斷處理程序執(zhí)行的過(guò)程中不能被中斷,所以如果它執(zhí)行時(shí)間過(guò)長(zhǎng),會(huì)導(dǎo)致CPU沒(méi)法響應(yīng)其它硬件的中斷,于是內(nèi)核引入軟中斷,這樣可以將硬中斷處理函數(shù)中耗時(shí)的部分移到軟中斷處理函數(shù)里面來(lái)慢慢處理。

內(nèi)核的網(wǎng)絡(luò)模塊

軟中斷會(huì)觸發(fā)內(nèi)核網(wǎng)絡(luò)模塊中的軟中斷處理函數(shù),后續(xù)流程如下

                                            +-----+                                     14      |     |                                +----------->| NIC |                                |            |     |                                |Enable IRQ  +-----+                                |                                |                          +------------+                                      Memroy                          |            |        Read           +--------+--------+--------+--------+         +--------------->| NIC Driver |<--------------------- | Packet | Packet | Packet | ...... |         |                |            |          9            +--------+--------+--------+--------+         |                +------------+         |                      |    |        skb    Poll | 8      Raise softIRQ | 6  +-----------------+         |                      |             10       |         |                      &darr;                      &darr; +---------------+  Call  +-----------+        +------------------+ | net_rx_action |<-------| ksoftirqd |        | napi_gro_receive | +---------------+   7    +-----------+        +------------------+                                                       |                                                       | 11                                                       &darr;                                            +--------------------------+    12      +------------------------+                                            | __netif_receive_skb_core |----------->| packet taps(AF_PACKET) |                                            +--------------------------+            +------------------------+                                                       |                                                       | 13                                                       &darr;                                              +-----------------+                                              | protocol layers |                                              +-----------------+

7:  內(nèi)核中的ksoftirqd進(jìn)程專門(mén)負(fù)責(zé)軟中斷的處理,當(dāng)它收到軟中斷后,就會(huì)調(diào)用相應(yīng)軟中斷所對(duì)應(yīng)的處理函數(shù),對(duì)于上面第6步中是網(wǎng)卡驅(qū)動(dòng)模塊拋出的軟中斷,ksoftirqd會(huì)調(diào)用網(wǎng)絡(luò)模塊的net_rx_action函數(shù)

8: net_rx_action調(diào)用網(wǎng)卡驅(qū)動(dòng)里的poll函數(shù)來(lái)一個(gè)一個(gè)的處理數(shù)據(jù)包

9: 在pool函數(shù)中,驅(qū)動(dòng)會(huì)一個(gè)接一個(gè)的讀取網(wǎng)卡寫(xiě)到內(nèi)存中的數(shù)據(jù)包,內(nèi)存中數(shù)據(jù)包的格式只有驅(qū)動(dòng)知道

10: 驅(qū)動(dòng)程序?qū)?nèi)存中的數(shù)據(jù)包轉(zhuǎn)換成內(nèi)核網(wǎng)絡(luò)模塊能識(shí)別的skb格式,然后調(diào)用napi_gro_receive函數(shù)

11:  napi_gro_receive會(huì)處理GRO相關(guān)的內(nèi)容,也就是將可以合并的數(shù)據(jù)包進(jìn)行合并,這樣就只需要調(diào)用一次協(xié)議棧,接著調(diào)用__netif_receive_skb_core

12:  看是不是有AF_PACKET類型的socket(也就是我們常說(shuō)的原始套接字),如果有的話,拷貝一份數(shù)據(jù)給它。tcpdump抓包就是抓的這里的包。

13: 調(diào)用協(xié)議棧相應(yīng)的函數(shù),將數(shù)據(jù)包交給協(xié)議棧處理。

14: 待內(nèi)存中的所有數(shù)據(jù)包被處理完成后(即poll函數(shù)執(zhí)行完成),啟用網(wǎng)卡的硬中斷,這樣下次網(wǎng)卡再收到數(shù)據(jù)的時(shí)候就會(huì)通知CPU

協(xié)議棧

IP層

由于是UDP包,所以***步會(huì)進(jìn)入IP層,然后一級(jí)一級(jí)的函數(shù)往下調(diào):

|           |           &darr;         promiscuous mode &&       +--------+    PACKET_OTHERHOST (set by driver)   +-----------------+       | ip_rcv |-------------------------------------->| drop this packet|       +--------+                                       +-----------------+           |           |           &darr; +---------------------+ | NF_INET_PRE_ROUTING | +---------------------+           |           |           &darr;       +---------+       |         | enabled ip forword  +------------+        +----------------+       | routing |-------------------->| ip_forward |------->| NF_INET_FOWARD |       |         |                     +------------+        +----------------+       +---------+                                                   |           |                                                         |           | destination IP is local                                 &darr;           &darr;                                                 +---------------+  +------------------+                                       | dst_output_sk |  | ip_local_deliver |                                       +---------------+  +------------------+           |           |           &darr;  +------------------+  | NF_INET_LOCAL_IN |  +------------------+           |           |           &darr;     +-----------+     | UDP layer |     +-----------+
  • ip_rcv:  ip_rcv函數(shù)是IP模塊的入口函數(shù),在該函數(shù)里面,***件事就是將垃圾數(shù)據(jù)包(目的mac地址不是當(dāng)前網(wǎng)卡,但由于網(wǎng)卡設(shè)置了混雜模式而被接收進(jìn)來(lái))直接丟掉,然后調(diào)用注冊(cè)在NF_INET_PRE_ROUTING上的函數(shù)

  • NF_INET_PRE_ROUTING:  netfilter放在協(xié)議棧中的鉤子,可以通過(guò)iptables來(lái)注入一些數(shù)據(jù)包處理函數(shù),用來(lái)修改或者丟棄數(shù)據(jù)包,如果數(shù)據(jù)包沒(méi)被丟棄,將繼續(xù)往下走

  • routing: 進(jìn)行路由,如果是目的IP不是本地IP,且沒(méi)有開(kāi)啟ip forward功能,那么數(shù)據(jù)包將被丟棄,如果開(kāi)啟了ip  forward功能,那將進(jìn)入ip_forward函數(shù)

  • ip_forward:  ip_forward會(huì)先調(diào)用netfilter注冊(cè)的NF_INET_FORWARD相關(guān)函數(shù),如果數(shù)據(jù)包沒(méi)有被丟棄,那么將繼續(xù)往后調(diào)用dst_output_sk函數(shù)

  • dst_output_sk: 該函數(shù)會(huì)調(diào)用IP層的相應(yīng)函數(shù)將該數(shù)據(jù)包發(fā)送出去,同下一篇要介紹的數(shù)據(jù)包發(fā)送流程的后半部分一樣。

  • ip_local_deliver:如果上面routing的時(shí)候發(fā)現(xiàn)目的IP是本地IP,那么將會(huì)調(diào)用該函數(shù),在該函數(shù)中,會(huì)先調(diào)用NF_INET_LOCAL_IN相關(guān)的鉤子程序,如果通過(guò),數(shù)據(jù)包將會(huì)向下發(fā)送到UDP層

UDP層

 |          |          &darr;      +---------+            +-----------------------+      | udp_rcv |----------->| __udp4_lib_lookup_skb |      +---------+            +-----------------------+          |          |          &darr; +--------------------+      +-----------+ | sock_queue_rcv_skb |----->| sk_filter | +--------------------+      +-----------+          |          |          &darr; +------------------+ | __skb_queue_tail | +------------------+          |          |          &darr;  +---------------+  | sk_data_ready |  +---------------+
  • udp_rcv:  udp_rcv函數(shù)是UDP模塊的入口函數(shù),它里面會(huì)調(diào)用其它的函數(shù),主要是做一些必要的檢查,其中一個(gè)重要的調(diào)用是__udp4_lib_lookup_skb,該函數(shù)會(huì)根據(jù)目的IP和端口找對(duì)應(yīng)的socket,如果沒(méi)有找到相應(yīng)的socket,那么該數(shù)據(jù)包將會(huì)被丟棄,否則繼續(xù)

  • sock_queue_rcv_skb: 主要干了兩件事,一是檢查這個(gè)socket的receive  buffer是不是滿了,如果滿了的話,丟棄該數(shù)據(jù)包,然后就是調(diào)用sk_filter看這個(gè)包是否是滿足條件的包,如果當(dāng)前socket上設(shè)置了filter,且該包不滿足條件的話,這個(gè)數(shù)據(jù)包也將被丟棄(在Linux里面,每個(gè)socket上都可以像tcpdump里面一樣定義filter,不滿足條件的數(shù)據(jù)包將會(huì)被丟棄)

  • __skb_queue_tail: 將數(shù)據(jù)包放入socket接收隊(duì)列的末尾

  • sk_data_ready: 通知socket數(shù)據(jù)包已經(jīng)準(zhǔn)備好

調(diào)用完sk_data_ready之后,一個(gè)數(shù)據(jù)包處理完成,等待應(yīng)用層程序來(lái)讀取,上面所有函數(shù)的執(zhí)行過(guò)程都在軟中斷的上下文中。

socket

應(yīng)用層一般有兩種方式接收數(shù)據(jù),一種是recvfrom函數(shù)阻塞在那里等著數(shù)據(jù)來(lái),這種情況下當(dāng)socket收到通知后,recvfrom就會(huì)被喚醒,然后讀取接收隊(duì)列的數(shù)據(jù);另一種是通過(guò)epoll或者select監(jiān)聽(tīng)相應(yīng)的socket,當(dāng)收到通知后,再調(diào)用recvfrom函數(shù)去讀取接收隊(duì)列的數(shù)據(jù)。兩種情況都能正常的接收到相應(yīng)的數(shù)據(jù)包。

上述內(nèi)容就是Linux網(wǎng)絡(luò)中數(shù)據(jù)包的接收過(guò)程是怎樣的,你們學(xué)到知識(shí)或技能了嗎?如果還想學(xué)到更多技能或者豐富自己的知識(shí)儲(chǔ)備,歡迎關(guān)注億速云行業(yè)資訊頻道。

向AI問(wèn)一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI