Has anybody seen an error of "scheduling while atomic" with the jffs2 fs? It looks to be in the gcd kernel thread. What's the exact fix? It's on 2.6.39 kernel. If preemption is not configured, the problem won't happen. Thanks in advance.
[72940.050000] BUG: scheduling while atomic: sync_supers/74/0x00000002
[72940.060000] 4 locks held by sync_supers/74:
[72940.060000] #0: (&type->s_umount_key#14){......}, at: [<c00ae9b0>] sync_supers+0x68/0xf0
[72940.070000] #1: (&type->s_lock_key){......}, at: [<c010e9a8>] jffs2_write_super+0x14/0x40
[72940.080000] #2: (&(&c->erase_completion_lock)->rlock){......}, at: [<c010baa0>] jffs2_garbage_collect_pass+0x1fc/0x790
[72940.090000] #3: (&c->alloc_sem){......}, at: [<c010baac>] jffs2_garbage_collect_pass+0x208/0x790
[72940.100000] Modules linked in:
[72940.110000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0214940>] (schedule+0x54/0x3c0)
[72940.210000] BUG: spinlock cpu recursion on CPU#0, rm/5927
[72940.210000] lock: c306e560, .magic: dead4ead, .owner: sync_supers/74, .owner_cpu: 0
[72940.220000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0152d48>] (do_raw_spin_lock+0x60/0x120)
[72940.230000] [<c0152d48>] (do_raw_spin_lock+0x60/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[72940.240000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[72940.250000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[72940.260000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[72940.270000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[72940.280000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72940.290000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[72940.290000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[72940.300000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[72940.310000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72940.320000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[72940.330000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)
[72945.070000] BUG: spinlock lockup on CPU#0, rm/5927, c306e560
[72945.080000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0152dc8>] (do_raw_spin_lock+0xe0/0x120)
[72945.090000] [<c0152dc8>] (do_raw_spin_lock+0xe0/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[72945.100000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[72945.110000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[72945.120000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[72945.130000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[72945.140000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72945.150000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[72945.150000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[72945.160000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[72945.170000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[72945.180000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[72945.190000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)
[73016.460000] BUG: soft lockup - CPU#0 stuck for 63s! [rm:5927]
[73016.460000] Modules linked in:
[73016.460000]
[73016.460000] Pid: 5927, comm: rm
[73016.460000] CPU: 0 Tainted: G W (2.6.39.2 #506)
[73016.460000] PC is at do_raw_spin_lock+0x90/0x120
[73016.460000] LR is at do_raw_spin_lock+0xac/0x120
[73016.460000] pc : [<c0152d78>] lr : [<c0152d94>] psr: 80000013
[73016.460000] sp : c36f9cb0 ip : c00334e0 fp : 00da47f0
[73016.460000] r10: 00000028 r9 : c3701384 r8 : 03144800
[73016.460000] r7 : 00000000 r6 : c36f8000 r5 : 001b16c7 r4 : c306e560
[73016.460000] r3 : 00000000 r2 : 00000000 r1 : 00000001 r0 : 00000001
[73016.460000] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[73016.460000] Control: 0005317f Table: 82b90000 DAC: 00000015
[73016.460000] Kernel panic - not syncing: softlockup: hung tasks
[73016.460000] [<c0038e78>] (unwind_backtrace+0x0/0xe4) from [<c0048740>] (panic+0x5c/0x19c)
[73016.460000] [<c0048740>] (panic+0x5c/0x19c) from [<c0078420>] (watchdog_timer_fn+0x11c/0x15c)
[73016.460000] [<c0078420>] (watchdog_timer_fn+0x11c/0x15c) from [<c006467c>] (hrtimer_run_queues+0x11c/0x1e8)
[73016.460000] [<c006467c>] (hrtimer_run_queues+0x11c/0x1e8) from [<c00538dc>] (run_local_timers+0xc/0x1c)
[73016.460000] [<c00538dc>] (run_local_timers+0xc/0x1c) from [<c0053b30>] (update_process_times+0x2c/0x4c)
[73016.460000] [<c0053b30>] (update_process_times+0x2c/0x4c) from [<c006bd70>] (tick_handle_periodic+0x18/0x84)
[73016.460000] [<c006bd70>] (tick_handle_periodic+0x18/0x84) from [<c003d890>] (lpc32xx_timer_interrupt+0x7c/0xa0)
[73016.460000] [<c003d890>] (lpc32xx_timer_interrupt+0x7c/0xa0) from [<c0078780>] (handle_irq_event_percpu+0x2c/0x19c)
[73016.460000] [<c0078780>] (handle_irq_event_percpu+0x2c/0x19c) from [<c007892c>] (handle_irq_event+0x3c/0x5c)
[73016.460000] [<c007892c>] (handle_irq_event+0x3c/0x5c) from [<c007aadc>] (handle_level_irq+0xbc/0xf0)
[73016.460000] [<c007aadc>] (handle_level_irq+0xbc/0xf0) from [<c0028078>] (asm_do_IRQ+0x78/0xa0)
[73016.460000] [<c0028078>] (asm_do_IRQ+0x78/0xa0) from [<c00330e8>] (__irq_svc+0x48/0x90)
[73016.460000] Exception stack(0xc36f9c68 to 0xc36f9cb0)
[73016.460000] 9c60: 00000001 00000001 00000000 00000000 c306e560 001b16c7
[73016.460000] 9c80: c36f8000 00000000 03144800 c3701384 00000028 00da47f0 c00334e0 c36f9cb0
[73016.460000] 9ca0: c0152d94 c0152d78 80000013 ffffffff
[73016.460000] [<c00330e8>] (__irq_svc+0x48/0x90) from [<c0152d78>] (do_raw_spin_lock+0x90/0x120)
[73016.460000] [<c0152d78>] (do_raw_spin_lock+0x90/0x120) from [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90)
[73016.460000] [<c011016c>] (__jffs2_flush_wbuf+0xae8/0xc90) from [<c0110514>] (jffs2_flash_writev+0x200/0x400)
[73016.460000] [<c0110514>] (jffs2_flash_writev+0x200/0x400) from [<c0107a14>] (jffs2_write_dirent+0x274/0x484)
[73016.460000] [<c0107a14>] (jffs2_write_dirent+0x274/0x484) from [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac)
[73016.460000] [<c0108c78>] (jffs2_do_unlink+0x1ec/0x3ac) from [<c0101448>] (jffs2_unlink+0x44/0x84)
[73016.460000] [<c0101448>] (jffs2_unlink+0x44/0x84) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[73016.460000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c011a010>] (call_unlink+0x90/0xe0)
[73016.460000] [<c011a010>] (call_unlink+0x90/0xe0) from [<c011a090>] (vfsub_unlink+0x30/0x50)
[73016.460000] [<c011a090>] (vfsub_unlink+0x30/0x50) from [<c0129dac>] (aufs_unlink+0x154/0x22c)
[73016.460000] [<c0129dac>] (aufs_unlink+0x154/0x22c) from [<c00b6364>] (vfs_unlink+0x68/0xec)
[73016.460000] [<c00b6364>] (vfs_unlink+0x68/0xec) from [<c00b8c84>] (do_unlinkat+0xac/0x158)
[73016.460000] [<c00b8c84>] (do_unlinkat+0xac/0x158) from [<c00334e0>] (ret_fast_syscall+0x0/0x2c)
A note: Another trace without enabling debug in kernel shows:
BUG: scheduling while atomic: jffs2_gcd_mtd7/288/0x00000002
[<c002e4e4>] (unwind_backtrace+0x0/0xe4) from [<c0200a64>] (schedule+0x54/0x3ec)
...
BUG: scheduling while atomic: sync_supers/72/0x00000002
[<c002e4e4>] (unwind_backtrace+0x0/0xe4) from [<c0200a64>] (schedule+0x54/0x3ec)
The sem is acquired when holding spinlock. Once the order is swapped, the problem goes away. The problem has been fixed in late kernels.
#2: (&(&c->erase_completion_lock)->rlock), jffs2_garbage_collect_pass+0x1fc/0x790
#3: (&c->alloc_sem), jffs2_garbage_collect_pass+0x208/0x790
The code is in fs/jffs2/gc.c. Swap line 228 and 229 to solve the problem.
227 D1(printk(KERN_DEBUG "No progress from erasing blocks; doing GC anyway\n"));
228 spin_lock(&c->erase_completion_lock);
229 mutex_lock(&c->alloc_sem);
230 }