iOS App使用GCD導(dǎo)致的卡頓現(xiàn)象怎么辦

發(fā)布時間：2021-07-13 15:14:48 來源：億速云閱讀：198 作者：小新欄目：移動開發(fā)

這篇文章主要介紹了iOS App使用GCD導(dǎo)致的卡頓現(xiàn)象怎么辦，具有一定借鑒價值，感興趣的朋友可以參考下，希望大家閱讀完這篇文章之后大有收獲，下面讓小編帶著大家一起了解一下。

最近在調(diào)研 iOS app 中存在的各種卡頓現(xiàn)象以及解決方法。

iOS App 出現(xiàn)卡頓（stall）的概率可能超出大部分人的想象，尤其是對于大公司旗艦型 App。一方面是由于業(yè)務(wù)功能不停累積，各個產(chǎn)品團(tuán)隊之間缺乏協(xié)調(diào)，大家都忙著增加功能，系統(tǒng)資源出現(xiàn)瓶頸。另一方面的原因是老設(shè)備更新?lián)Q代太慢，iOS 設(shè)備的耐用度極好，現(xiàn)在還有不少 iPhone 4S 在服役，iPhone 6 作為問題設(shè)備持有量很高，據(jù)估計，現(xiàn)在 iPhone 6s 以前的設(shè)備占有比高達(dá) 40%。

所以，如果嘗試在線上 App 加入卡頓檢測的工具，你會發(fā)現(xiàn)卡頓出現(xiàn)的概率高的驚人。但卡頓的檢測就修復(fù)并不簡單，主要是因為難以在開發(fā)設(shè)備上復(fù)現(xiàn)。

之前寫過一篇介紹主線程卡頓監(jiān)控的文章，好像現(xiàn)在主流的做法都是通過監(jiān)控 Runloop 事件回調(diào)，檢查進(jìn)入回調(diào)的時間間隔是否超過 Threshold，超過則記錄當(dāng)前 App 所有線程的 call stack。

我前段時間從后臺上報的卡頓日志里看到這樣一個 call stack：

> 0 libsystem_kernel.dylib __workq_kernreturn
> 1 libsystem_pthread.dylib _pthread_workqueue_addthreads
> 2 libdispatch.dylib _dispatch_queue_wakeup_global_slow
> 3 libdispatch.dylib _dispatch_queue_wakeup_with_qos_slow
> 4 libdispatch.dylib dispatch_async

也就是說卡頓出現(xiàn)在 dispatch_async，以我現(xiàn)有對于 GCD 的認(rèn)知，dispatch_async 是絕無可能出現(xiàn)卡頓的。dispatch_async 的主要任務(wù)是從系統(tǒng)線程池里取出一個工作線程，并將 block 放到該線程里去執(zhí)行。

上述 call stack 確確實實的出現(xiàn)了，而且樣本數(shù)量還不少，最后一個函數(shù)明顯是一個內(nèi)核調(diào)用。從函數(shù)名字猜測，可能是 GCD 嘗試從線程池里獲取線程，但已有線程都在執(zhí)行狀態(tài)，所以向系統(tǒng)內(nèi)核申請創(chuàng)建新的線程。但創(chuàng)建線程的內(nèi)核調(diào)用會很慢嗎？會慢到讓主線程出現(xiàn)卡頓的程度？帶著疑問我搜索了大量相關(guān)資料，最后比較相關(guān)的有這樣一篇文章：http://newosxbook.com/articles/GCD.html

其中有這樣一段話：

This isn't due to 10.9's GCD being different - rather, it demonstrates the true asynchronous nature of GCD: The main thread has yet to return from requesting the worker (which it does by pthread_workqueue_addthreads_np, as I'll describe later), and already the worker thread has spawned and is mid execution, possibly on another CPU core. The exact state of the main thread with respect to the worker is largely unpredictable.

作者認(rèn)為，GCD 申請到的線程有可能是一個正在處理其他任務(wù)的 thread，main thread 需要等待這個忙碌的線程返回才能繼續(xù)執(zhí)行，我對這種說法存疑。

最后求助無門的狀況下，我決定使用一次寶貴的 TSL 機(jī)會，直接向 Apple 的工程師求教。這里不得不提下，向 Apple 尋求 technical support 是非常寶貴而且可行的方案，每個開發(fā)者賬號每年都有 2 次機(jī)會，不用非?？上?。

我把問題拋過去后，得到一位 Apple 內(nèi)核團(tuán)隊工程師的回復(fù)，我將精簡過的回復(fù)以問答的形式展示和大家分享：

Q: looks like even if it's async dispatching, the main thread still has to wait for the other thread to return, during which time, the other thread happen to be in mid execution of sth. this confuses me, what exactly is the main thread waiting for?

為什么主線程需要等待 dispatch_async 返回，主線程到底在等待什么？

A: It's hard to say with just a user space backtrace. Frame 0 has clearly sent the current thread into the kernel, and this specific kernel call is /way/ too complex to analyse from outside [1].

從用戶態(tài)調(diào)用棧無法得出答案，內(nèi)核可能的狀態(tài)過于復(fù)雜。

Q: I know it's suggested that we create limited amount of serial queue，and use target queue probably. but what could happen if we don't follow that rule?

Apple 一直推薦自己創(chuàng)建 serial GCD queue 的時候，一定要控制數(shù)量，而且最好設(shè)置 target queue，否則會出現(xiàn)問題，但會出現(xiàn)什么問題我一直很好奇，這次借著機(jī)會一起問了。

* On macOS, where the system is happier to over commit, you end up with a thread explosion. That in turn can lead to problems running out of memory, running out of Mach ports, and so on.

* On iOS, which is not happy about over committing, you find that the latency between a block being queued and it running can skyrocket. This can, in turn, have knock-on effects. For example, the last time I looked at a problem like this I found that `NSOperationQueue` was dispatching blocks to the global queue for internal maintenance tasks, so when one subsystem within the app consumed all the dispatch worker threads other subsystems would just stall horribly.

Note: In the context of dispatch, an “over commit” is where the system had to allocate more threads to a queue then there are CPU cores. In theory this should never be necessary because work you dispatch to a queue should never block waiting for resources. In practice it's unavoidable because, at a minimum, the work you queue can end up blocking on the VM subsystem.

Despite this, it's still best to structure your code to avoid the need for over committing, especially when the over commit doesn't buy you anything. For example, code like this:

group = dispatch_group_create();
for (url in urlsToFetch) {
  dispatch_group_enter(group);
  dispatch_async(dispatch_get_global_queue(…), ^{
    … fetch `url` synchronously …
    dispatch_group_leave(group);
  });
}
dispatch_group_wait(group, …);

is horrible because it ties up 10 dispatch worker threads for a very long time without any benefit. And while this is an extreme example — from dispatch's perspective, networking is /really/ slow — there are less extreme examples that are similarly problematic. From dispatch's perspective, even the disk drive is slow (-:

這段回復(fù)很有意思。閱讀過 GCD 源碼的同學(xué)會知道，所有默認(rèn)創(chuàng)建的 GCD queue 都有一個優(yōu)先級，但其實每個優(yōu)先級對應(yīng)兩個 queue，比如一個是 default-priority，那么另一個就是 default-priority-overcommit。dispatch_async 的時候，會首先將任務(wù)丟進(jìn) default-priority 隊列，如果隊列滿了，就轉(zhuǎn)而丟進(jìn) default-priority-overcommit。

在 Mac 系統(tǒng)里，GCD 允許 overcommit，意味著每次 dispatch_async 都會創(chuàng)建一個新線程，即使 over commit 了，這些過量的線程會根據(jù)優(yōu)先級來競爭 CPU 資源。

而在 iOS 系統(tǒng)里，GCD 會控制 overcommit，如果某個優(yōu)先級隊列 over commit 里，那么排在后面的任務(wù)就會處于等待狀態(tài)。移動設(shè)備 CPU 資源比較緊張，這種設(shè)計合乎常理。

所以如果在 iOS 里創(chuàng)建過多的 serial queue，那么后面提交的任務(wù)可能就會一直處于等待狀態(tài)。這也是為什么我們需要嚴(yán)格控制 queue 的數(shù)量和層級關(guān)系，最好是 App 當(dāng)中每個子系統(tǒng)只能分配固定數(shù)量和優(yōu)先級的 queue，從而避免 thread explosion 導(dǎo)致的代碼無法及時執(zhí)行問題。

Q：I know the system watchdog can kill an app if the main thread is taking too long to respond. I also heard rumors that there are two other cases that may gets your app killed by watchdog. the first is too many new threads are being created like by random usage of dispatching work to global concurrent queue? the second case is if CPU has been kept too busy like 100% for too long, watchdog kills app too?

我借機(jī)問了下系統(tǒng) watchdong 強(qiáng)殺 App 的原因，因為坊間一直有傳聞是除了主線程長時間沒反應(yīng)之外，創(chuàng)建過多的線程和 CPU 長時間超負(fù)荷運轉(zhuǎn)也會導(dǎo)致被強(qiáng)殺。

A：I'm not aware of any specific watchdog check along those lines, but it's not hard to imagine that the above-mentioned knock-on effects might jam up your app sufficiently for the watchdog to kill it for other reasons. Running the CPU for too long generates a crash report but it doesn't actually kill the app. It's essentially a ‘warning' crash report about the problem.

創(chuàng)建過多線程不會直接導(dǎo)致 watchdog 強(qiáng)殺，但過多線程有可能導(dǎo)致主線程得不到及時處理，而因為其他原因被 kill。而 CPU 長時間過載并不會導(dǎo)致強(qiáng)殺，但系統(tǒng)會生成一個 report 來警告開發(fā)者。我確實看到過不少這類 ‘this is not a crash' 的 crash 日志。

另外還有一些問答，和我當(dāng)前疑問并不直接相關(guān)所以略去。最后再貼一段比較有意思的回復(fù)，在閱讀之前大家可以自己先思考下：

dispatch_async(myQueue, ^{
 // line A
});
// line B

line A 和 line B 誰先執(zhí)行？

Consider a snippet like this:

dispatch_async(myQueue, ^{
 // line A
});
// line B

there's clearly a race condition between lines A and B, that is, between the `dispatch_async` returning and the block running on the queue. This can pan out in multiple ways, including:

* If `myQueue` (which we're assuming is a serial queue) is busy, A has to wait so B will definitely run before A.

* If `myQueue` is empty, there's no idle CPU, and `myQueue` has a higher priority then the thread that called `dispatch_async`, you could imagine the kernel switching the CPU to `myQueue` so that it can run A.

* The thread that called `dispatch_async` could run out of its time quantum after scheduling B on `myQueue` but before returning from `dispatch_async`, which again results in A running before B.

* If `myQueue` is empty and there's an idle CPU, A and B could end up running simultaneously.

答案

其實最后我也沒有得到我想要的準(zhǔn)確的答案，可能正如回復(fù)里所說，情況有很多而且過于復(fù)雜，沒法通過一個用戶態(tài)的 call stack 簡單推知內(nèi)核的狀態(tài)，但有些有價值的信息還是得以大致理清：

信息一

iOS 系統(tǒng)本身是一個資源調(diào)度和分配系統(tǒng)，CPU，disk IO，VM 等都是稀缺資源，各個資源之間會互相影響，主線程的卡頓看似 CPU 資源出現(xiàn)瓶頸，但也有可能內(nèi)核忙于調(diào)度其他資源，比如當(dāng)前正在發(fā)生大量的磁盤讀寫，或者大量的內(nèi)存申請和清理，都會導(dǎo)致下面這個簡單的創(chuàng)建線程的內(nèi)核調(diào)用出現(xiàn)卡頓：

libsystem_kernel.dylib __workq_kernreturn

所以解決辦法只能是自己分析各 thread 的 call stack，根據(jù)用戶場景分析當(dāng)前正在消耗的系統(tǒng)資源。后面也確實通過最近提交的代碼分析，發(fā)現(xiàn)是由于增加了一些非常耗時的磁盤 io 任務(wù)（雖然也是放在在子線程），才出現(xiàn)這個看著不怎么沾邊的 call stack。revert 之后卡頓警報就消失了。

信息二

現(xiàn)有的卡頓檢測工具都只能在超時的情況下 dump call stack，但出現(xiàn)超時有可能是任務(wù) A，B，C 共同作用導(dǎo)致的，A 和 B 可能是真正耗時的任務(wù)，C 不耗時但碰巧是最后一個，所以被當(dāng)成元兇，而 A 和 B 卻沒有出現(xiàn)在上報日志里。我暫時也沒有想到特別好的解決辦法。很明顯，libsystem_kernel.dylib __workq_kernreturn 就是一個不怎么耗時的 C 任務(wù)。

信息三

在使用 GCD 創(chuàng)建 queue，或者說一個 App 內(nèi)部使用 GCD 執(zhí)行子線程任務(wù)時，最好有一套 App 所有團(tuán)隊都能遵循的隊列使用機(jī)制，避免創(chuàng)建過多的 thread，而出現(xiàn)意料之外的線程資源緊缺，代碼無法及時執(zhí)行的情況。這很難，尤其是在大公司動則上百人的團(tuán)隊里面。

感謝你能夠認(rèn)真閱讀完這篇文章，希望小編分享的“iOS App使用GCD導(dǎo)致的卡頓現(xiàn)象怎么辦”這篇文章對大家有幫助，同時也希望大家多多支持億速云，關(guān)注億速云行業(yè)資訊頻道，更多相關(guān)知識等著你來學(xué)習(xí)!

向AI問一下細(xì)節(jié)

iOS App使用GCD導(dǎo)致的卡頓現(xiàn)象怎么辦

猜你喜歡

最新資訊

相關(guān)推薦

相關(guān)標(biāo)簽