Skip to content
Snippets Groups Projects
Commit 4e78f6f5 authored by Eric Barton's avatar Eric Barton
Browse files

* userspace (catamount) ptllnd changes

  - Error handling

    Ensure all communications complete in finite time.  Ensure errors cause
    clean peer state teardown so that communications can be re-established
    after a peer crash.

    Note that this does NOT handle reconnection to a failed LNET router, which
    is required for routed configurations.

  - Environment tunables

    PTLLND_DEBUG (boolean, dflt 0) is a global switch to enable/disable debug
    features.

    PTLLND_TX_HISTORY (int, dflt debug?1024:0) sets the size of the history
    buffer.

    PTLLND_ABORT_ON_PROTOCOL_MISMATCH (boolean, dflt 1) calls abort on
    connecting to a peer running a different version of the ptllnd protocol.

    PTLLND_ABORT_ON_NAK (boolean, dflt 0) abort when a peer sends a NAK
    (e.g. because it has timed out this node).

    PTLLND_DUMP_ON_NAK (boolean, dflt debug?1:0) dumps peer debug and the
    history on receiving a NAK

    PTLLND_WATCHDOG_INTERVAL (int, dflt 1) sets how often to check some peers
    for timed-out communications while the application blocks for
    communications to complete.

    PTLLND_TIMEOUT (int, dflt 50) is the communications timeout in seconds.

    PTLLND_LONG_WAIT (int, dflt debug?5:PTLLND_TIMEOUT) is a time in seconds
    after which the ptllnd prints a warning if it blocks for longer during
    connection establishment, cleanup after an error or cleanup during shutdown.
parent dca8e8c0
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment