Steven Rostedt
016a8d5be6
rcu: Don't call wakeup() with rcu_node structure ->lock held
This commit fixes a lockdep-detected deadlock by moving a wake_up()
call out from a rnp->lock critical section. Please see below for
the long version of this story.
On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
> [12572.705832] ======================================================
> [12572.750317] [ INFO: possible circular locking dependency detected ]
> [12572.796978] 3.10.0-rc3+ #39 Not tainted
> [12572.833381] -------------------------------------------------------
> [12572.862233] trinity-child17/31341 is trying to acquire lock:
> [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12572.878859]
> but task is already holding lock:
> [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12572.903381]
> which lock already depends on the new lock.
>
> [12572.927541]
> the existing dependency chain (in reverse order) is:
> [12572.943736]
> -> #4 (&ctx->lock){-.-...}:
> [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
> [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
> [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
> [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70
> [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e
> [12573.018321]
> -> #3 (&rq->lock){-.-.-.}:
> [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
> [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470
> [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30
> [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140
> [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
> [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
> [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
> [12573.109528]
> -> #2 (&p->pi_lock){-.-.-.}:
> [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
> [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
> [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
> [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90
> [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50
> [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
> [12573.221379]
> -> #1 (&rsp->gp_wq){..-.-.}:
> [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
> [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50
> [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
> [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
> [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
> [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100
> [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
Notice the above call chain.
rcu_start_future_gp() is called with the rnp->lock held. Then it calls
rcu_start_gp_advance, which does a wakeup.
You can't do wakeups while holding the rnp->lock, as that would mean
that you could not do a rcu_read_unlock() while holding the rq lock, or
any lock that was taken while holding the rq lock. This is because...
(See below).
> [12573.295067]
> -> #0 (rcu_node_0){..-.-.}:
> [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
Notice the above trace.
perf took its own ctx->lock, which can be taken while holding the rq
lock. While holding this lock, it did a rcu_read_unlock(). The
perf_lock_task_context() basically looks like:
rcu_read_lock();
raw_spin_lock(ctx->lock);
rcu_read_unlock();
Now, what looks to have happened, is that we scheduled after taking that
first rcu_read_lock() but before taking the spin lock. When we scheduled
back in and took the ctx->lock, the following rcu_read_unlock()
triggered the "special" code.
The rcu_read_unlock_special() takes the rnp->lock, which gives us a
possible deadlock scenario.
CPU0 CPU1 CPU2
---- ---- ----
rcu_nocb_kthread()
lock(rq->lock);
lock(ctx->lock);
lock(rnp->lock);
wake_up();
lock(rq->lock);
rcu_read_unlock();
rcu_read_unlock_special();
lock(rnp->lock);
lock(ctx->lock);
**** DEADLOCK ****
> [12573.382068]
> other info that might help us debug this:
>
> [12573.403229] Chain exists of:
> rcu_node_0 --> &rq->lock --> &ctx->lock
>
> [12573.424471] Possible unsafe locking scenario:
>
> [12573.438499] CPU0 CPU1
> [12573.445599] ---- ----
> [12573.452691] lock(&ctx->lock);
> [12573.459799] lock(&rq->lock);
> [12573.467010] lock(&ctx->lock);
> [12573.474192] lock(rcu_node_0);
> [12573.481262]
> *** DEADLOCK ***
>
> [12573.501931] 1 lock held by trinity-child17/31341:
> [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
> [12573.516475]
> stack backtrace:
> [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
> [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
> [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
> [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
> [12573.567856] Call Trace:
> [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b
> [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
> [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
> [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
> [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
> [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
> [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
> [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
> [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
> [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
> [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
> [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
> [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
> [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
> [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
> [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
> [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
> [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
> [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
> [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
This commit delays the wakeup via irq_work(), which is what
perf and ftrace use to perform wakeups in critical sections.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-06-10 13:37:11 -07:00
..
2013-05-14 17:43:29 +02:00
2013-04-30 17:04:10 -07:00
2013-05-07 13:17:29 +02:00
2013-01-11 11:39:33 -08:00
2013-06-08 21:15:09 +01:00
2013-05-02 19:40:34 -07:00
2013-05-05 13:23:27 -07:00
2013-05-29 09:55:01 +02:00
2013-06-06 12:35:30 -04:00
2013-04-22 07:09:06 -07:00
2013-05-04 14:57:58 -04:00
2013-03-12 13:59:14 -07:00
2013-04-29 15:54:26 -07:00
2013-01-11 14:54:55 -08:00
2013-05-11 14:29:11 -07:00
2013-05-11 14:29:11 -07:00
2013-05-24 16:22:52 -07:00
2013-05-07 22:27:15 -04:00
2013-04-14 10:06:31 -07:00
2012-11-19 08:13:38 -08:00
2013-05-29 07:59:39 +09:00
2013-05-01 07:21:43 -07:00
2013-05-01 17:29:18 -04:00
2013-02-19 18:19:48 -08:00
2013-02-19 19:04:55 -08:00
2013-05-01 17:51:54 -07:00
2012-12-18 10:55:28 -08:00
2013-01-27 19:23:31 +01:00
2013-05-01 17:51:54 -07:00
2013-04-15 13:25:16 +02:00
2013-05-08 11:51:05 -07:00
2012-10-26 14:27:49 -07:00
2013-02-23 18:50:11 -08:00
2013-03-12 20:42:10 -07:00
2013-05-05 13:23:27 -07:00
2013-02-05 00:48:46 +01:00
2012-08-06 19:00:35 +03:00
2013-04-15 15:17:26 +09:30
2012-12-20 17:40:19 -08:00
2012-09-13 17:56:13 +02:00
2013-04-30 17:04:07 -07:00
2013-05-16 12:01:11 -07:00
2013-04-18 08:58:38 -07:00
2012-12-11 18:10:49 -08:00
2013-04-30 17:04:02 -07:00
2012-10-24 12:39:09 +02:00
2013-05-08 11:51:05 -07:00
2013-05-05 10:58:06 -07:00
2013-03-15 15:09:43 +10:30
2012-12-20 17:40:21 -08:00
2012-12-05 11:27:24 +10:30
2012-10-19 17:30:40 -07:00
2013-05-17 09:53:36 +01:00
2013-04-19 09:33:36 +02:00
2013-05-01 17:29:39 -04:00
2012-12-06 17:16:23 +08:00
2013-04-30 17:04:02 -07:00
2013-03-18 11:40:21 +00:00
2013-05-01 17:51:54 -07:00
2013-05-01 17:51:54 -07:00
2013-04-22 19:59:25 +02:00
2013-04-18 12:51:19 +02:00
2013-05-07 20:16:25 -07:00
2013-05-01 17:29:18 -04:00
2013-05-07 20:16:25 -07:00
2013-05-17 11:49:10 -07:00
2013-01-28 22:06:21 -08:00
2013-01-28 22:25:21 -08:00
2013-01-28 22:06:21 -08:00
2013-01-28 22:25:21 -08:00
2013-02-04 12:18:20 -08:00
2013-05-15 10:41:12 -07:00
2013-05-05 00:16:35 -04:00
2013-06-10 13:37:11 -07:00
2013-06-10 13:37:11 -07:00
2013-05-08 10:13:35 -07:00
2012-12-18 15:02:12 -08:00
2013-04-29 15:54:40 -07:00
2013-02-07 20:51:08 +01:00
2013-04-10 14:48:37 +02:00
2013-02-07 20:51:08 +01:00
2013-03-23 15:53:52 -07:00
2013-03-26 11:07:19 +11:00
2013-04-30 17:04:08 -07:00
2013-05-01 14:08:52 -07:00
2013-04-30 17:04:03 -07:00
2013-04-12 14:18:43 +02:00
2012-08-13 17:01:07 +02:00
2013-05-05 13:23:27 -07:00
2013-02-07 15:19:36 -08:00
2013-02-26 22:25:17 +01:00
2013-05-09 13:46:38 -04:00
2013-05-01 07:21:43 -07:00
2013-05-09 14:53:20 -04:00
2013-04-29 15:54:36 -07:00
2012-09-13 16:47:34 +02:00
2012-10-06 03:05:31 +09:00
2013-04-29 18:28:42 -07:00
2013-03-15 16:50:20 -07:00
2013-02-16 23:17:25 +01:00
2013-05-15 14:05:17 -07:00
2013-04-29 13:55:38 -07:00
2013-01-27 19:23:31 +01:00
2013-03-03 22:58:33 -05:00
2013-05-01 17:51:54 -07:00
2013-02-27 19:10:24 -08:00
2013-05-01 17:51:54 -07:00
2013-02-27 19:10:22 -08:00
2013-05-01 17:29:39 -04:00
2012-12-06 10:39:54 +01:00
2013-03-14 08:24:05 +01:00
2013-04-30 17:04:02 -07:00
2013-05-15 14:24:24 -07:00