-
Mike Shaver authored
Highlights: - b=324: MDS recovery must replay transactions in strict transno sequence - b=325: getattr after OST failure returns -EIO - b=326: unlink after OST failure returns -EIO - b=400: new client can't join cluster after OST failure - b=403: multi-client access failure when OST fails - b=410: After an OST failure, lfind incorrectly displays file information - b=417: Freeing unreplayable requests twice (aed's fix from b_md) - b=402: (partial) give error for lstripe request that exceeds configured OSTs - much better support for reconnecting to MDS after network partition (still some lock-repeating issues to be resolved for some requests) - better support for connecting to multiple MDSes on one host (xid and transno and request_list are all per-import now) - track disconnecting clients in last_rcvd, for more reliable recovery - also, sync last_rcvd after connect/disconnect - reduced syslog/CERROR output for recovery (hi, Terry!) - server (DLM) timeout is half the system-wide timeout, to avoid cascading failure in the face of a dead client - don't wait for recovery to finish in order to send disconnect messages - removal of c_dying_head - don't wait for timeout to trigger recovery after ptl_send_rpc error - strict MDS transno ordering via mds_transno_sem (non-optimal, but correct) - many !handle -> IS_ERR(handle) fixes around mds_fs_start callers. - turn on client-eviction for bulk-timeouts in OST and MDS
Mike Shaver authoredHighlights: - b=324: MDS recovery must replay transactions in strict transno sequence - b=325: getattr after OST failure returns -EIO - b=326: unlink after OST failure returns -EIO - b=400: new client can't join cluster after OST failure - b=403: multi-client access failure when OST fails - b=410: After an OST failure, lfind incorrectly displays file information - b=417: Freeing unreplayable requests twice (aed's fix from b_md) - b=402: (partial) give error for lstripe request that exceeds configured OSTs - much better support for reconnecting to MDS after network partition (still some lock-repeating issues to be resolved for some requests) - better support for connecting to multiple MDSes on one host (xid and transno and request_list are all per-import now) - track disconnecting clients in last_rcvd, for more reliable recovery - also, sync last_rcvd after connect/disconnect - reduced syslog/CERROR output for recovery (hi, Terry!) - server (DLM) timeout is half the system-wide timeout, to avoid cascading failure in the face of a dead client - don't wait for recovery to finish in order to send disconnect messages - removal of c_dying_head - don't wait for timeout to trigger recovery after ptl_send_rpc error - strict MDS transno ordering via mds_transno_sem (non-optimal, but correct) - many !handle -> IS_ERR(handle) fixes around mds_fs_start callers. - turn on client-eviction for bulk-timeouts in OST and MDS