公开学习文档

公开学习文档


soft lockup

<h2>概述</h2> <p>1、soft lockup 是针对单独 CPU 而不是整个系统的。 2、soft lockup 指的是发生的 CPU 上在 20 秒(默认)中没有发生调度切换。</p> <p>可能的原因:</p> <ul> <li>长时间运行的中断处理程序</li> <li>长时间持有自旋锁</li> <li>长时间禁用抢占</li> </ul> <p><code>local_bh_disable</code> 会增加软中断计数。</p> <h2>分析</h2> <p>现象:</p> <pre><code class="language-c">BUG: soft lockup - CPU#3 stuck for 23s! [kworker/3:0:32]</code></pre> <h2>watchdog 机制简介</h2> <p>Linux 内核会为每一个 CPU 启动一个优先级最高的 FIFO 实时内核线程 watchdog,我们通过 ps 可以看得到:</p> <pre><code class="language-sh">[root@localhost kernel]# ps -e|grep watchdog 18 ? 00:00:00 watchdog/0 19 ? 00:00:00 watchdog/1 24 ? 00:00:00 watchdog/2 29 ? 00:00:00 watchdog/3</code></pre> <p>这些实时内核线程默认每 4 秒执行一次针对自己 CPU 的喂狗操作,同时喂狗过后会重置一个 hrtimer 在 2 倍的watchdog_thresh 秒后到期,watchdog_thresh 是内核参数,可调,默认为 10:</p> <pre><code class="language-sh">kernel.watchdog_thresh = 10</code></pre> <p>hrtimer 到期后将会打印以下令人遗憾的信息:</p> <pre><code class="language-sh">BUG: soft lockup - CPU#3 stuck for 23s! [kworker/3:0:32]</code></pre> <p>内核会起 hrtimer 来定时检查是否发生 soft lockup,对应的处理函数为:</p> <pre><code class="language-c">// file: kernel/watchdog.c /* watchdog kicker functions */ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer) { unsigned long touch_ts = __this_cpu_read(watchdog_touch_ts); // 获取时间 struct pt_regs *regs = get_irq_regs(); int duration; int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; if (!watchdog_enabled) return HRTIMER_NORESTART; /* kick the hardlockup detector */ watchdog_interrupt_count(); /* kick the softlockup detector */ if (completion_done(this_cpu_ptr(&amp;amp;softlockup_completion))) { reinit_completion(this_cpu_ptr(&amp;amp;softlockup_completion)); stop_one_cpu_nowait(smp_processor_id(), softlockup_fn, NULL, this_cpu_ptr(&amp;amp;softlockup_stop_work)); } /* .. and repeat */ hrtimer_forward_now(hrtimer, ns_to_ktime(sample_period)); if (touch_ts == SOFTLOCKUP_RESET) { if (unlikely(__this_cpu_read(softlockup_touch_sync))) { /* * If the time stamp was touched atomically * make sure the scheduler tick is up to date. */ __this_cpu_write(softlockup_touch_sync, false); sched_clock_tick(); } /* Clear the guest paused flag on watchdog reset */ kvm_check_and_clear_guest_paused(); __touch_watchdog(); return HRTIMER_RESTART; } /* check for a softlockup * This is done by making sure a high priority task is * being scheduled. The task touches the watchdog to * indicate it is getting cpu time. If it hasn't then * this is a good indication some task is hogging the cpu */ duration = is_softlockup(touch_ts); // 判断时间,注意是 watchdog_thresh * 2 if (unlikely(duration)) { /* * If a virtual machine is stopped by the host it can look to * the watchdog like a soft lockup, check to see if the host * stopped the vm before we issue the warning */ if (kvm_check_and_clear_guest_paused()) return HRTIMER_RESTART; /* only warn once */ if (__this_cpu_read(soft_watchdog_warn) == true) { /* * When multiple processes are causing softlockups the * softlockup detector only warns on the first one * because the code relies on a full quiet cycle to * re-arm. The second process prevents the quiet cycle * and never gets reported. Use task pointers to detect * this. */ if (__this_cpu_read(softlockup_task_ptr_saved) != current) { __this_cpu_write(soft_watchdog_warn, false); __touch_watchdog(); } return HRTIMER_RESTART; } if (softlockup_all_cpu_backtrace) { /* Prevent multiple soft-lockup reports if one cpu is already * engaged in dumping cpu back traces */ if (test_and_set_bit(0, &amp;amp;soft_lockup_nmi_warn)) { /* Someone else will report us. Let's give up */ __this_cpu_write(soft_watchdog_warn, true); return HRTIMER_RESTART; } } pr_emerg(&amp;quot;BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n&amp;quot;, smp_processor_id(), duration, current-&amp;gt;comm, task_pid_nr(current)); // 告警日志 __this_cpu_write(softlockup_task_ptr_saved, current); print_modules(); print_irqtrace_events(current); if (regs) show_regs(regs); else dump_stack(); if (softlockup_all_cpu_backtrace) { /* Avoid generating two back traces for current * given that one is already made above */ trigger_allbutself_cpu_backtrace(); clear_bit(0, &amp;amp;soft_lockup_nmi_warn); /* Barrier to sync with other cpus */ smp_mb__after_atomic(); } add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK); if (softlockup_panic) panic(&amp;quot;softlockup: hung tasks&amp;quot;); __this_cpu_write(soft_watchdog_warn, true); } else __this_cpu_write(soft_watchdog_warn, false); return HRTIMER_RESTART; } </code></pre>

页面列表

ITEM_HTML