* userspace (catamount) ptllnd changes
- Error handling Ensure all communications complete in finite time. Ensure errors cause clean peer state teardown so that communications can be re-established after a peer crash. Note that this does NOT handle reconnection to a failed LNET router, which is required for routed configurations. - Environment tunables PTLLND_DEBUG (boolean, dflt 0) is a global switch to enable/disable debug features. PTLLND_TX_HISTORY (int, dflt debug?1024:0) sets the size of the history buffer. PTLLND_ABORT_ON_PROTOCOL_MISMATCH (boolean, dflt 1) calls abort on connecting to a peer running a different version of the ptllnd protocol. PTLLND_ABORT_ON_NAK (boolean, dflt 0) abort when a peer sends a NAK (e.g. because it has timed out this node). PTLLND_DUMP_ON_NAK (boolean, dflt debug?1:0) dumps peer debug and the history on receiving a NAK PTLLND_WATCHDOG_INTERVAL (int, dflt 1) sets how often to check some peers for timed-out communications while the application blocks for communications to complete. PTLLND_TIMEOUT (int, dflt 50) is the communications timeout in seconds. PTLLND_LONG_WAIT (int, dflt debug?5:PTLLND_TIMEOUT) is a time in seconds after which the ptllnd prints a warning if it blocks for longer during connection establishment, cleanup after an error or cleanup during shutdown.
Loading
Please register or sign in to comment