Skip to content
Snippets Groups Projects
Commit 31b264e0 authored by Peter Braam's avatar Peter Braam
Browse files

- mds failover code

- connection and recovd subsystem
- refined handling of replies/timeout with levels:
  - requests are delayed until the request level is lower than or
    equals to the connection level
- much updated network documentation
- updated file system recovery documentation
- server maintains lists of open files and handles "re-opening"
  maintains list in the metadata client info structures.
- flags on requests to indicate their disposition after a reply,
  e.g. retain until commit, retain until explicitly canceled etc.
- new failure instrumentation to drop a reply, but execute the
  request.
- handling of re-sent creation requests
- move file attribute updates on mds to close, remove from write
- reconnection routine in llight.
- work through recovery list more orderly:
  - retain list in sent order
  - handle according to disposition of request
  - return integers not void
  - add direct (0-copy) I/O support -- doesn't compile on 2.4.9
- failure handling in client reintegration code
- replay handling in server reintegration code
- add names to client systems to understand debugging/tracing output better
- remove most lists from the client structure: the multiple lists
  introduced request reordering.  We now use one list and flag the
  requests.
- re-addressing of connections: invoked by the client recovery scripts
- don't reallocate reply buffers if they were already there and not
  consumed in case of re-sending requests.
- introduce a request replay function: I want this to be merged with
  ptlrpc_queue wait soon.
- small support routines for continuing delayed requests, restarting
  requests for which replies were lost, etc.
- try to get negative errors back even when Portals errors return
  positive problems.
- make last committed and received 64 bit in network packets.
- write test programs that:
  - keep files open
  - do I/O every second
- include 5 basic regression cases for failover recovery:
  runfailure-client-mds.sh
- simplify ha_assist.sh -- the secondary ha_assist program does the
  work
parent 293a4936
No related branches found
No related tags found
No related merge requests found
Showing
with 570 additions and 163 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment