soft lockup
<h2>概述</h2>
<p>1、soft lockup 是针对单独 CPU 而不是整个系统的。
2、soft lockup 指的是发生的 CPU 上在 20 秒(默认)中没有发生调度切换。</p>
<p>可能的原因:</p>
<ul>
<li>长时间运行的中断处理程序</li>
<li>长时间持有自旋锁</li>
<li>长时间禁用抢占</li>
</ul>
<p><code>local_bh_disable</code> 会增加软中断计数。</p>
<h2>分析</h2>
<p>现象:</p>
<pre><code class="language-c">BUG: soft lockup - CPU#3 stuck for 23s! [kworker/3:0:32]</code></pre>
<h2>watchdog 机制简介</h2>
<p>Linux 内核会为每一个 CPU 启动一个优先级最高的 FIFO 实时内核线程 watchdog,我们通过 ps 可以看得到:</p>
<pre><code class="language-sh">[root@localhost kernel]# ps -e|grep watchdog
18 ? 00:00:00 watchdog/0
19 ? 00:00:00 watchdog/1
24 ? 00:00:00 watchdog/2
29 ? 00:00:00 watchdog/3</code></pre>
<p>这些实时内核线程默认每 4 秒执行一次针对自己 CPU 的喂狗操作,同时喂狗过后会重置一个 hrtimer 在 2 倍的watchdog_thresh 秒后到期,watchdog_thresh 是内核参数,可调,默认为 10:</p>
<pre><code class="language-sh">kernel.watchdog_thresh = 10</code></pre>
<p>hrtimer 到期后将会打印以下令人遗憾的信息:</p>
<pre><code class="language-sh">BUG: soft lockup - CPU#3 stuck for 23s! [kworker/3:0:32]</code></pre>
<p>内核会起 hrtimer 来定时检查是否发生 soft lockup,对应的处理函数为:</p>
<pre><code class="language-c">// file: kernel/watchdog.c
/* watchdog kicker functions */
static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
{
unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts); // 获取时间
struct pt_regs *regs = get_irq_regs();
int duration;
int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
if (!watchdog_enabled)
return HRTIMER_NORESTART;
/* kick the hardlockup detector */
watchdog_interrupt_count();
/* kick the softlockup detector */
if (completion_done(this_cpu_ptr(&amp;softlockup_completion))) {
reinit_completion(this_cpu_ptr(&amp;softlockup_completion));
stop_one_cpu_nowait(smp_processor_id(),
softlockup_fn, NULL,
this_cpu_ptr(&amp;softlockup_stop_work));
}
/* .. and repeat */
hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period));
if (touch_ts == SOFTLOCKUP_RESET) {
if (unlikely(__this_cpu_read(softlockup_touch_sync))) {
/*
* If the time stamp was touched atomically
* make sure the scheduler tick is up to date.
*/
__this_cpu_write(softlockup_touch_sync, false);
sched_clock_tick();
}
/* Clear the guest paused flag on watchdog reset */
kvm_check_and_clear_guest_paused();
__touch_watchdog();
return HRTIMER_RESTART;
}
/* check for a softlockup
* This is done by making sure a high priority task is
* being scheduled. The task touches the watchdog to
* indicate it is getting cpu time. If it hasn't then
* this is a good indication some task is hogging the cpu
*/
duration = is_softlockup(touch_ts); // 判断时间,注意是 watchdog_thresh * 2
if (unlikely(duration)) {
/*
* If a virtual machine is stopped by the host it can look to
* the watchdog like a soft lockup, check to see if the host
* stopped the vm before we issue the warning
*/
if (kvm_check_and_clear_guest_paused())
return HRTIMER_RESTART;
/* only warn once */
if (__this_cpu_read(soft_watchdog_warn) == true) {
/*
* When multiple processes are causing softlockups the
* softlockup detector only warns on the first one
* because the code relies on a full quiet cycle to
* re-arm. The second process prevents the quiet cycle
* and never gets reported. Use task pointers to detect
* this.
*/
if (__this_cpu_read(softlockup_task_ptr_saved) !=
current) {
__this_cpu_write(soft_watchdog_warn, false);
__touch_watchdog();
}
return HRTIMER_RESTART;
}
if (softlockup_all_cpu_backtrace) {
/* Prevent multiple soft-lockup reports if one cpu is already
* engaged in dumping cpu back traces
*/
if (test_and_set_bit(0, &amp;soft_lockup_nmi_warn)) {
/* Someone else will report us. Let's give up */
__this_cpu_write(soft_watchdog_warn, true);
return HRTIMER_RESTART;
}
}
pr_emerg(&quot;BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n&quot;,
smp_processor_id(), duration,
current-&gt;comm, task_pid_nr(current)); // 告警日志
__this_cpu_write(softlockup_task_ptr_saved, current);
print_modules();
print_irqtrace_events(current);
if (regs)
show_regs(regs);
else
dump_stack();
if (softlockup_all_cpu_backtrace) {
/* Avoid generating two back traces for current
* given that one is already made above
*/
trigger_allbutself_cpu_backtrace();
clear_bit(0, &amp;soft_lockup_nmi_warn);
/* Barrier to sync with other cpus */
smp_mb__after_atomic();
}
add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
if (softlockup_panic)
panic(&quot;softlockup: hung tasks&quot;);
__this_cpu_write(soft_watchdog_warn, true);
} else
__this_cpu_write(soft_watchdog_warn, false);
return HRTIMER_RESTART;
}
</code></pre>