b=17310
r=johann,shadow - fixes ptlrpcd blocking on very long reply unlink waiting. To do so new rpc phase introduced RQ_PHASE_UNREGISTERING in which request stay until we have reply_in_callback() called by lnet signaling that reply is unlinked. All requests in this state are skipped in processing by prlrcd instead of waiting n * 300s on each of them. This allows ptlrpcd to process other rpcs in the set; - make sure that inflight count is coherent with being present on sending or delay list. That is, if we see inflight != 0, rpc must be on one of these lists. This is very helpful in ptlrpc_invalidate_import() to show all rpcs still waiting after invalidating import; - in ptlrpc_invalidate_import() wait maximal rq_deadline - now from all inflight rpcs instead of obd_timeout which may be much longer. If calculated timeout is 0, obd_timeout is used. This fixes the issue that rq_deadline - now > obd_timeout (very easy to see in logs) which led to inflight != 0 assert because inflight rpcs timed out later than our wait period is finished; - in ptlrpc_invalidate_import() wait forever for rpcs in UNREGISTERING phase. Check in assert for inflight == 0 for wait timed out case if no rpcs in UNREGISTERING phase. Only those in UNREGISTERING phase are allowed to stay longer than obd_timeout; - added ptlrpc_move_rqphase() function. All phase changes go through it. Add debug_req() there to track down all phase changes; - conf_sanity.sh test_45 added to emulate very long reply unlink and also situation when rq_deadline - now > obd_timeout; - do not wait forever in ptlrpc_unregister_reply() for async case (using it from sets). sync case left unchanged; - make sure that ptlrpc_set_next_timeout() yields 1s timeout (instead of 0s) for the set with rpcs in "unregistering" stage to prevent ptlrpcd from sleeping forever and hanging in test_45; - in ptlrpcd() make sure that we do not sleep on 0 timeout.
Showing
- lustre/include/lustre_import.h 1 addition, 0 deletionslustre/include/lustre_import.h
- lustre/include/lustre_net.h 98 additions, 29 deletionslustre/include/lustre_net.h
- lustre/include/obd_support.h 1 addition, 0 deletionslustre/include/obd_support.h
- lustre/obdclass/genops.c 1 addition, 0 deletionslustre/obdclass/genops.c
- lustre/obdclass/lprocfs_status.c 2 additions, 0 deletionslustre/obdclass/lprocfs_status.c
- lustre/ptlrpc/client.c 147 additions, 86 deletionslustre/ptlrpc/client.c
- lustre/ptlrpc/events.c 3 additions, 3 deletionslustre/ptlrpc/events.c
- lustre/ptlrpc/import.c 113 additions, 24 deletionslustre/ptlrpc/import.c
- lustre/ptlrpc/niobuf.c 2 additions, 2 deletionslustre/ptlrpc/niobuf.c
- lustre/ptlrpc/pack_generic.c 1 addition, 1 deletionlustre/ptlrpc/pack_generic.c
- lustre/ptlrpc/pinger.c 4 additions, 4 deletionslustre/ptlrpc/pinger.c
- lustre/ptlrpc/ptlrpc_internal.h 1 addition, 1 deletionlustre/ptlrpc/ptlrpc_internal.h
- lustre/ptlrpc/ptlrpcd.c 4 additions, 3 deletionslustre/ptlrpc/ptlrpcd.c
- lustre/ptlrpc/recover.c 1 addition, 1 deletionlustre/ptlrpc/recover.c
- lustre/ptlrpc/service.c 2 additions, 2 deletionslustre/ptlrpc/service.c
- lustre/tests/conf-sanity.sh 22 additions, 1 deletionlustre/tests/conf-sanity.sh
Loading
Please register or sign in to comment