-
NeilBrown authored
lli_trunc_sem can lead to a deadlock. vvp_io_read_start takes lli_trunc_sem, and can take mmap sem in the direct i/o case, via generic_file_read_iter->ll_direct_IO->get_user_pages_unlocked vvp_io_fault_start is called with mmap_sem held (taken in the kernel page fault code), and takes lli_trunc_sem. These aren't necessarily the same mmap_sem, but can be if you mmap a lustre file, then read into that mapped memory from the file. These are both 'down_read' calls on lli_trunc_sem so they don't directly conflict, but if vvp_io_setattr_start() is called to truncate the file between these, it does 'down_write' on lli_trunc_sem. As semaphores are queued, this down_write blocks subsequent reads. This means if the page fault has taken the mmap_sem, but not yet the lli_trunc_sem in vvp_io_fault_start, it will wait behind the lli_trunc_sem down_write from vvp_io_setattr_start. At the same time, vvp_io_read_start is holding the lli_trunc_sem and waiting for the mmap_sem, which will not be released because vvp_io_fault_start cannot get the lli_trunc_sem because the setattr 'down_write' operation is queued in front of it. Solve this by replacing with a hand-coded semaphore, using atomic counters and wait_var_event(). This allows a special down_read_nowait which ignores waiting down_write operations. This combined with waking up all waiters at once guarantees that down_read_nowait can always 'join' another down_read, guaranteeing our ability to take the semaphore twice for read and avoiding the deadlock. I'd like there to be a better way to fix this, but I haven't found it yet. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com> Change-Id: Ibd3abf4df1f1f6f45e440733a364999bd608b191 Reviewed-on: https://review.whamcloud.com/35271 Reviewed-by: Neil Brown <neilb@suse.de> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
e5914a61