Commit 3f3c839f authored by Olaf Weber's avatar Olaf Weber Committed by Oleg Drokin
Browse files

LU-9971 lnet: use after free in lnet_discover_peer_locked()



When the lnet_net_lock is unlocked, the peer attached to an
lnet_peer_ni (found via lnet_peer_ni::lpni_peer_net->lpn_peer)
can change, and the old peer deallocated. If we are really
unlucky, then all the churn could give us a new, different,
peer at the same address in memory.

Change the reference counting on the lnet_peer lp so that it
is guaranteed to be alive when we relock the lnet_net_lock for
the cpt. When the reference count is dropped lp may go away if
it was unlinked, but the new peer is guaranteed to have a
different address, so we can still correctly determine whether
the peer changed and discovery should be redone.

LU-9971 lnet: fix peer ref counting

Exit from the loop after peer ref count has been incremented
to avoid wrong ref count.

The code makes sure that a peer is queued for discovery at most
once if discovery is disabled. This is done to use discovery
as a standard ping for gateways which do not have discovery feature
or discovery is disabled.
Signed-off-by: default avatarOlaf Weber <olaf.weber@hpe.com>
Change-Id: Ia44dce20074b27ec0e77d7c1908c6a44ec73d326
Reviewed-on: https://review.whamcloud.com/28944

Reviewed-by: default avatarAmir Shehata <ashehata@whamcloud.com>
Tested-by: Jenkins
Tested-by: default avatarMaloo <maloo@whamcloud.com>
Reviewed-by: default avatarJames Simmons <uja.ornl@yahoo.com>
Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/38891

Tested-by: default avatarjenkins <devops@whamcloud.com>
Reviewed-by: default avatarChris Horn <chris.horn@hpe.com>
Reviewed-by: default avatarJames Simmons <jsimmons@infradead.org>
parent 678772a1
......@@ -2047,6 +2047,7 @@ lnet_discover_peer_locked(struct lnet_peer_ni *lpni, int cpt, bool block)
DEFINE_WAIT(wait);
struct lnet_peer *lp;
int rc = 0;
int count = 0;
again:
lnet_net_unlock(cpt);
......@@ -2059,27 +2060,38 @@ again:
* zombie if we race with DLC, so we must check for that.
*/
for (;;) {
/* Keep lp alive when the lnet_net_lock is unlocked */
lnet_peer_addref_locked(lp);
prepare_to_wait(&lp->lp_dc_waitq, &wait, TASK_INTERRUPTIBLE);
if (signal_pending(current))
break;
if (the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING)
break;
/*
* Don't repeat discovery if discovery is disabled. This is
* done to ensure we can use discovery as a standard ping as
* well for backwards compatibility with routers which do not
* have discovery or have discovery disabled
*/
if (lnet_is_discovery_disabled(lp) && count > 0)
break;
if (lp->lp_dc_error)
break;
if (lnet_peer_is_uptodate(lp))
break;
lnet_peer_queue_for_discovery(lp);
count++;
CDEBUG(D_NET, "Discovery attempt # %d\n", count);
/*
* if caller requested a non-blocking operation then
* return immediately. Once discovery is complete then the
* peer ref will be decremented and any pending messages
* that were stopped due to discovery will be transmitted.
* If caller requested a non-blocking operation then
* return immediately. Once discovery is complete any
* pending messages that were stopped due to discovery
* will be transmitted.
*/
if (!block)
break;
lnet_peer_addref_locked(lp);
lnet_net_unlock(LNET_LOCK_EX);
schedule();
finish_wait(&lp->lp_dc_waitq, &wait);
......@@ -2087,26 +2099,18 @@ again:
lnet_peer_decref_locked(lp);
/* Peer may have changed */
lp = lpni->lpni_peer_net->lpn_peer;
/*
* Wait for discovery to complete, but don't repeat if
* discovery is disabled. This is done to ensure we can
* use discovery as a standard ping as well for backwards
* compatibility with routers which do not have discovery
* or have discovery disabled
*/
if (lnet_is_discovery_disabled(lp))
break;
}
finish_wait(&lp->lp_dc_waitq, &wait);
lnet_net_unlock(LNET_LOCK_EX);
lnet_net_lock(cpt);
lnet_peer_decref_locked(lp);
/*
* If the peer has changed after we've discovered the older peer,
* then we need to discovery the new peer to make sure the
* interface information is up to date
* The peer may have changed, so re-check and rediscover if that turns
* out to have been the case. The reference count on lp ensured that
* even if it was unlinked from lpni the memory could not be recycled.
* Thus the check below is sufficient to determine whether the peer
* changed. If the peer changed, then lp must not be dereferenced.
*/
if (lp != lpni->lpni_peer_net->lpn_peer)
goto again;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment