Skip to content
Snippets Groups Projects
Commit 4e78f6f5 authored by Eric Barton's avatar Eric Barton
Browse files

* userspace (catamount) ptllnd changes

  - Error handling

    Ensure all communications complete in finite time.  Ensure errors cause
    clean peer state teardown so that communications can be re-established
    after a peer crash.

    Note that this does NOT handle reconnection to a failed LNET router, which
    is required for routed configurations.

  - Environment tunables

    PTLLND_DEBUG (boolean, dflt 0) is a global switch to enable/disable debug
    features.

    PTLLND_TX_HISTORY (int, dflt debug?1024:0) sets the size of the history
    buffer.

    PTLLND_ABORT_ON_PROTOCOL_MISMATCH (boolean, dflt 1) calls abort on
    connecting to a peer running a different version of the ptllnd protocol.

    PTLLND_ABORT_ON_NAK (boolean, dflt 0) abort when a peer sends a NAK
    (e.g. because it has timed out this node).

    PTLLND_DUMP_ON_NAK (boolean, dflt debug?1:0) dumps peer debug and the
    history on receiving a NAK

    PTLLND_WATCHDOG_INTERVAL (int, dflt 1) sets how often to check some peers
    for timed-out communications while the application blocks for
    communications to complete.

    PTLLND_TIMEOUT (int, dflt 50) is the communications timeout in seconds.

    PTLLND_LONG_WAIT (int, dflt debug?5:PTLLND_TIMEOUT) is a time in seconds
    after which the ptllnd prints a warning if it blocks for longer during
    connection establishment, cleanup after an error or cleanup during shutdown.
parent dca8e8c0
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment