• NeilBrown's avatar
    LU-12460 llite: replace lli_trunc_sem · e5914a61
    NeilBrown authored
    lli_trunc_sem can lead to a deadlock.
    vvp_io_read_start takes lli_trunc_sem, and can take
    mmap sem in the direct i/o case, via
    vvp_io_fault_start is called with mmap_sem held (taken in
    the kernel page fault code), and takes lli_trunc_sem.
    These aren't necessarily the same mmap_sem, but can be if
    you mmap a lustre file, then read into that mapped memory
    from the file.
    These are both 'down_read' calls on lli_trunc_sem so they
    don't directly conflict, but if vvp_io_setattr_start() is
    called to truncate the file between these, it does
    'down_write' on lli_trunc_sem.  As semaphores are queued,
    this down_write blocks subsequent reads.
    This means if the page fault has taken the mmap_sem,
    but not yet the lli_trunc_sem in vvp_io_fault_start,
    it will wait behind the lli_trunc_sem down_write from
    At the same time, vvp_io_read_start is holding the
    lli_trunc_sem and waiting for the mmap_sem, which will not
    be released because vvp_io_fault_start cannot get the
    lli_trunc_sem because the setattr 'down_write' operation is
    queued in front of it.
    Solve this by replacing with a hand-coded semaphore, using
    atomic counters and wait_var_event().  This allows a
    special down_read_nowait which ignores waiting down_write
    operations.  This combined with waking up all waiters at
    once guarantees that down_read_nowait can always 'join'
    another down_read, guaranteeing our ability to take the
    semaphore twice for read and avoiding the deadlock.
    I'd like there to be a better way to fix this, but I
    haven't found it yet.
    Signed-off-by: default avatarNeilBrown <neilb@suse.com>
    Signed-off-by: default avatarPatrick Farrell <pfarrell@whamcloud.com>
    Change-Id: Ibd3abf4df1f1f6f45e440733a364999bd608b191
    Reviewed-on: https://review.whamcloud.com/35271
    Reviewed-by: default avatarNeil Brown <neilb@suse.de>
    Tested-by: default avatarjenkins <devops@whamcloud.com>
    Tested-by: default avatarMaloo <maloo@whamcloud.com>
    Reviewed-by: default avatarShaun Tancheff <shaun.tancheff@hpe.com>
    Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>