- May 03, 2018
-
-
Amir Shehata authored
In 2.10 the entire set of changes to using ktime function is not in creating a situation where routers consider peers dead, essentially breaking routing functionality. Revert the changes that were made to LNet as part of LU-6245. Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: Id0765e8332a8b97167d3a602e6410f5bb6a48137 Reviewed-on: https://review.whamcloud.com/32082 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
Sebastien Buisson <sbuisson@ddn.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
James Simmons authored
It was found for the case of routers which have LNet enabled ethernet and infiniband that they couldn't communicate properly with other. This was due to ko2iblnd still using jiffies and the socklnd driver using time64_t. After some discussion it was deceided that the best thing to do is roll back instead of additional changes. Revert "LU-9397 ksocklnd: move remaining time handling to 64 bits" This reverts commit 59c25356. Change-Id: I42fd5620ab8131cd5fdf4bf0ef15553f6eabe550 Signed-off-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-on: https://review.whamcloud.com/32015 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
Sebastien Buisson <sbuisson@ddn.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
- Apr 16, 2018
-
-
Emoly Liu authored
In cfs_ip_min_max(), (nidrange->nr_all == 1) means this nid range is a full IP address range(*.*.*.*). In this case, we don't need to compare it to any other nid range, but set min_nid to 0.0.0.0 and max_nid to 255.255.255.255 directly. Also, test_10d is added to sanity-sec.sh to verify this patch and some code cleanup is done for jt_nodemap_add/del_range(). Change minimum MGS version to 2.10.1 Lustre-change: https://review.whamcloud.com/31684 Lustre-commit: 23026632 Change-Id: I72c546b060f9e123204a566a3bd373b4f017502d Signed-off-by:
Emoly Liu <emoly.liu@intel.com> Reviewed-by:
Sebastien Buisson <sbuisson@ddn.com> Reviewed-by:
Fan Yong <fan.yong@intel.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/31950 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
- Apr 05, 2018
-
-
Kit Westneat authored
This patch fixes the contiguous range check to allow the addition of multiple "full" ([0-255]) ranges. As part of this change, is_contiguous and find_min_max are combined as they were always called together and the logic is fairly similar. This also removes the multiple range expression support, since it was broken. Also, sanity-sec.sh test_10c is added to verify this patch. Lustre-change: https://review.whamcloud.com/24397 Lustre-commit: eac95a65 Signed-off-by:
Kit Westneat <kit.westneat@gmail.com> Signed-off-by:
Emoly Liu <emoly.liu@intel.com> Change-Id: I3c49a077039327fcbde87196f82db140f67a74d0 Reviewed-by:
Sebastien Buisson <sbuisson@ddn.com> Reviewed-by:
Andreas Dilger <andreas.dilger@intel.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/31521 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
Amir Shehata authored
list_add was being used erroneously. The logic should be to move the txs on ibp_tx_queue on a local list which is then processed. The code, however, did the reverse, which would result in the pending txs not processed and thus dropped silently. This in turn would lead to peers reference counts at the LNet layer not decremented since lnet_finalize() might not be called for a message. Initialize local list and use list_splice_init() to move transmits on the ibp_tx_queue to the local list. Lustre-change: https://review.whamcloud.com/31374 Lustre-commit: f5c6228f Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I6b36f709db2c89e53e0b3354883a8a1b1052a1dd Reviewed-by:
Doug Oucharek <dougso@me.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/31520 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
- Nov 27, 2017
-
-
John L. Hammond authored
Request dynamic minor allocation when registering /dev/lnet and /dev/obd. Remove the obsolete create-device-if-not-found code from register_ioc_dev(). Lustre-change: https://review.whamcloud.com/29741 Lustre-commit: e446c166 Signed-off-by:
John L. Hammond <john.hammond@intel.com> Change-Id: I59c70912b4729f58a76dc6107b3e1d7379c6d7a3 Reviewed-by:
Andreas Dilger <andreas.dilger@intel.com> Reviewed-by:
Jian Yu <jian.yu@intel.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/29945 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com>
-
- Nov 17, 2017
-
-
Alexandr Boyko authored
cmid will be destroyed at OFED if kiblnd_cm_callback return error. if error happen before the end of kiblnd_connect_peer, it will touch destroyed cmid and fail as (o2iblnd_cb.c:1315:kiblnd_connect_peer()) ASSERTION( cmid->device != ((void *)0) ) failed: Lustre-change: https://review.whamcloud.com/29134 Lustre-commit: 576551cb Seagate-bug-id: MRP-4592 Signed-off-by:
Alexander Boyko <alexander.boyko@seagate.com> Change-Id: I83eb5bceeb567acef0316498b936d25d6c6ccd95 Reviewed-by:
Alexey Lyashkov <c17817@cray.com> Reviewed-by:
Doug Oucharek <dougso@me.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/29881 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
- Sep 06, 2017
-
-
Colin Ian King authored
The arguments args->lstio_ses_force and args->lstio_ses_timeout are in the incorrect order. Fix this by swapping them around. Detected by CoverityScan, CID#1226833 ("Arguments in wrong order") Test-Parameters: trivial testlist=lnet-selftest Lustre-change: https://review.whamcloud.com/28487 Lustre-commit: 0a80067c Change-Id: If11c574655425db5bbf21ba2264be8d83a7e8bf8 Signed-off-by:
Colin Ian King <colin.king@canonical.com> Signed-off-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/28763 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
- Aug 16, 2017
-
-
James Simmons authored
During the development of the linux 4.11 kernel it was discovered that the kernel socket layer could get into lockdep situation. To handle this a new bool argument was added to the accept member of struct socket. For LNet we can always pass false. Lustre-commit: 15045c90 Lustre-change: https://review.whamcloud.com/27642 Change-Id: I420cda95b70cf927b1a6e3493b631bc5a3585d74 Signed-off-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-on: https://review.whamcloud.com/27642 Reviewed-by:
Doug Oucharek <doug@cadentcomputing.com> Reviewed-by:
Bob Glossman <bob.glossman@intel.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com> Reviewed-on: https://review.whamcloud.com/28463 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
Amir Shehata authored
When tx credits are returned if there are pending messages they need to be sent. Messages could have different tx_cpts, so the correct one needs to be locked. After lnet_post_send_locked(), if we locked a different CPT then we need to relock the correct one However, as part of lnet_post_send_locked(), lnet_finalze() can be called which can free the message. Therefore, the cpt of the message being passed must be cached in order to prevent access to freed memory. Lustre-change: https://review.whamcloud.com/28308 Lustre-commit: 7af6307b Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I959fdc30daf87b5575d8371da20d5cf6f64e7d3c Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/28439 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
- Aug 10, 2017
-
-
Amir Shehata authored
The intent of this function is to get the cpt nearest to the memory described by the MD. There are three scenarios that must be handled: 1. The memory is described by an lnet_kiov_t structure -> this describes kernel pages 2. The memory is described by a struct kvec -> this describes kernel logical addresses 3. The memory is a contiguous buffer allocated via vmalloc For case 1 and 2 we look at the first vector which contains the data to be DMAed, taking into consideration the msg offset. For case 2 we have to take the extra step of translating the kernel logical address to a physical page using virt_to_page() macro. For case 3 we need to use is_vmalloc_addr() and vmalloc_to_page to get the associated page to be able to identify the CPT. o2iblnd uses the same strategy when it's mapping the memory into a scatter/gather list. Therefore, lnet_kvaddr_to_page() common function was created to be used by both the o2iblnd and lnet_cpt_of_md() kmap_to_page() performs the high memory check which lnet_kvaddr_to_page() does. However, unlike the latter it handles the highmem case properly instead of calling LBUG. It's not 100% clear why the code was written that way. Since the legacy code will need to still be maintained, adding kmap_to_page() will not simplify the code. Furthermore, the behavior for kernels which export kmap_to_page() will be different from kernels which do not. At worst calling kmap_to_page() might mask some problems which would've been caught by the LBUG earlier on. However, at the time of this fix, that LBUG has never been observed. Lustre-change: https://review.whamcloud.com/28165 Lustre-commit: 43b0e632 Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I2c67e5df77d60112bf27f900e0325d189f193aed Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/28400 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
Andreas Dilger authored
When the obdfilter code was split into separate OFD and OSD modules, the bulk IO page allocation was implemented to use GFP_NOFS to avoid allocations recursing into the filesystem and causing deadlocks. However, this is only possible if the RPC is coming from a local client, as we might end up waiting on a page sent in the request we're serving. Local RPCs use __GFP_HIGHMEM so that the pages can use all of the available memory on the OSS on 32-bit machines. It is possible to use more aggressive GFP_HIGHUSER flags for non-local clients to be able to generate more memory pressure on the OSS and allow inactive pages to be reclaimed, since the OSS doesn't have any other processes or allocations that generate memory reclaim pressure. See also b=17576 (bdf50dc9) and b=19529 (3dcf18d3) for details. The patch also implements an LNet function to determine if a client NID is local or not. This becomes more complex in the LNet Multi-Rail world and it is really LNet's job to handle NIDs, not that of Lustre. Lustre-change: https://review.whamcloud.com/27908 Lustre-commit: b0ab95d6 Signed-off-by:
Andreas Dilger <andreas.dilger@intel.com> Change-Id: I2806c9c5c2fe269669eafdafaf2310924c3ebbe5 Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
Patrick Farrell <paf@cray.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/28318 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
- Aug 07, 2017
-
-
Doug Oucharek authored
Trying to page align the remote_addr for IB_RDMA_WRITE work requests is triggering "dump_cqe" errors from MOFED 4.x + mlx5. This patch removes the address masking we were doing with FastReg which was trying to page align remote_addr values. I am also removing the setting of "mr->iova" with FastReg as this is being done in the call to ib_map_mr_sg() and could cause problems. Lustre-change: https://review.whamcloud.com/27149 Lustre-commit: 6c634180 Signed-off-by:
Doug Oucharek <doug.s.oucharek@intel.com> Change-Id: If35baa467d8d60866f709b5feea7f619063c6da4 Reviewed-by:
Gu Zheng <gzheng@ddn.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
Amir Shehata <amir.shehata@intel.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/28237 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
Amir Shehata authored
Make sure to unlock the api mutex properly in lnet_dyn_add_net() Lustre-change: https://review.whamcloud.com/27907 Lustre-commit: 65326ab2 Test-Parameters: trivial Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I786545de690ea5966771be3e84d3561b794d55ec Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Signed-off-by:
Minh Diep <minh.diep@intel.com> Reviewed-on: https://review.whamcloud.com/28236 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
John L. Hammond <john.hammond@intel.com>
-
- Jun 16, 2017
-
-
Dmitry Eremin authored
Unlock lnet_net_lock in case of error in function lnet_select_pathway(). Change-Id: Ib48fb3aebdc60bafff80f5c52b90301830ca4afa Signed-off-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-on: https://review.whamcloud.com/27455 Tested-by: Jenkins Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Andreas Dilger <andreas.dilger@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- Jun 10, 2017
-
-
Dmitry Eremin authored
Rework CPU partition code in the way of make it more tolerant to offline CPUs and empty nodes. For example, in KNL: available: 8 nodes (0-7) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 node 0 size: 24472 MB node 0 free: 12409 MB node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 node 1 size: 24576 MB node 1 free: 20388 MB node 2 cpus: 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 node 2 size: 24576 MB node 2 free: 20621 MB node 3 cpus: 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 node 3 size: 24576 MB node 3 free: 21183 MB node 4 cpus: node 4 size: 4096 MB node 4 free: 3982 MB node 5 cpus: node 5 size: 4096 MB node 5 free: 3982 MB node 6 cpus: node 6 size: 4096 MB node 6 free: 3982 MB node 7 cpus: node 7 size: 4096 MB node 7 free: 3981 MB node distances: node 0 1 2 3 4 5 6 7 0: 10 21 21 21 31 41 41 41 1: 21 10 21 21 41 31 41 41 2: 21 21 10 21 41 41 31 41 3: 21 21 21 10 41 41 41 31 4: 31 41 41 41 10 41 41 41 5: 41 31 41 41 41 10 41 41 6: 41 41 31 41 41 41 10 41 7: 41 41 41 31 41 41 41 10 Contain the fix for LU-8492 ptlrpc: Correctly calculate hrp->hrp_nthrs Fix an error code return which was introduced in commit def25e9c Change-Id: I7f64a20ee009a88e836f592ce044400f07ffbcdd Signed-off-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-on: https://review.whamcloud.com/23222 Reviewed-by:
Amir Shehata <amir.shehata@intel.com> Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- Jun 03, 2017
-
-
Amir Shehata authored
Increment the per NI stats for messages being routed. This will give a better view of the traffic distribution over multiple peer interfaces. Added extra trace messages to track the messages sent and received. Test-Parameters: trivial Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I213b1b36e9787d25705d91d091ad9e9c6a5b2ae8 Reviewed-on: https://review.whamcloud.com/26907 Tested-by: Jenkins Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Amir Shehata authored
Make sure to set all NIs to the proper LND tunables specified. Add ntx tunable to dynamic configuration. This way all tunables required to tune OPA performance can be configured via lnetctl, allowing the ability to tune OPA network and IB network differently Test-Parameters: trivial Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I015f3f959bd46784d4607bd4259b4640303dc362 Reviewed-on: https://review.whamcloud.com/27263 Tested-by: Jenkins Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Amir Shehata authored
lnet_peer_primary_nid() is called from lnet_parse. It checks ln_state outside the net lock, causing a race condition during shutdown where the code expects the state to be running, but it's stopping or shutdown. Fixed the issue by renaming lnet_peer_primary_nid() to lnet_peer_primary_nid_locked(). This function is now called when lnet_net_lock is held in lnet_parse(). In lnet_create_reply_msg() we already have access to the msg_txpeer, so we lookup the primary_nid directly Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I0518cdbec95b38bd8690517320b601676ae259f0 Reviewed-on: https://review.whamcloud.com/27262 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
James Simmons authored
Examination of the ksocklnd time handle revealed that the code only requires second level precision. Since this is the case we can move away from using jiffies to time64_t. This allows us to be independent of the HZ settings in addition to making it clear what is time handling, using time64_t verses unsigned long. In the process we remove many of the various libcfs time wrappers as well. Change-Id: I968630ef94febd4bff703fb633e677996939f95b Signed-off-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-on: https://review.whamcloud.com/26813 Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Amir Shehata <amir.shehata@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Amir Shehata authored
Make the LND code handle empty CPTs. If a scheduler is associated with an empty CPT it will have no threads created. If a NID hashes to that CPT, then pick the next scheduler which does have at least 1 started thread. Associate the connection with the CPT of the selected scheduler Test-Parameters: trivial Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I8ed83f295fe9852537d4bb063a4d8271c6a45c2c Reviewed-on: https://review.whamcloud.com/27145 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- May 29, 2017
-
-
Arnd Bergmann authored
The CFS_TIME_T macro serves no real purpose as we stopped using time_t and changed over to time64_t, so we can remove the last remaining uses of this. Two uses of this macro are incorrect and refer to jiffies values rather than time_t, and one refers to an inode timespec that gets changed separately. Linux-commit: 93d3a405a168fba4450bdda793149e3cd4174736 Change-Id: I548ec8fffc9c46b8b2025b094f1e5d9cd469e3b7 Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Oleg Drokin <green@linuxhacker.ru> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-on: https://review.whamcloud.com/27025 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Andreas Dilger <andreas.dilger@intel.com> Reviewed-by:
Mike Pershin <mike.pershin@intel.com>
-
Doug Oucharek authored
LU-8943 activated the ability to have multiple connections between peers. If any of those connections need to be reconnected (i.e. parameter renegotiation), we were getting an assert from kiblnd_reconnect_peer() which was not changed to allow for having multiple connections ongoing at the same time. This patch gets rid of the assert from kiblnd_reconnect_peer() which is no longer valid after LU-8943. Signed-off-by:
Doug Oucharek <doug.s.oucharek@intel.com> Change-Id: I9cc7fae8836f2648603018fac38a88e3f90ec190 Reviewed-on: https://review.whamcloud.com/27139 Reviewed-by:
Amir Shehata <amir.shehata@intel.com> Tested-by: Jenkins Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- May 20, 2017
-
-
Arnd Bergmann authored
The cfs_duration_usec() function has a timeval as its output, which we want to avoid in general because of the y2038 problem. There are only two locations remaining in lustre, so we can for now eplace one with jiffies_to_timeval(), which is a generic kernel function that does the same thing, the other can just use jiffies_to_usecs() and completely avoid the timeval. This is not a full solution yet, but it's a small step that lets us build a larger portion of lustre without this reference to timeval in a header file, and avoid triggering automated checking tools that wants to warn about timeval. Linux-commit: 70513c5d17b9812cc218e8b4c7826ebb5f375d9a Test-Parameters: trivial testlist=lnet-selftest Change-Id: If39f4d4857a2b3210bb0dc634b8bb42530df83dc Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Oleg Drokin <green@linuxhacker.ru> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-on: https://review.whamcloud.com/27019 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Doug Oucharek authored
The FastReg support in ko2iblnd was not unmapping pool items causing the items to leak. In addition, the mapping code is not growing the pool like we do with FMR. This patch makes sure we are unmapping FastReg pool elements when we are done with them. It also makes sure the pool will grow when we depleat the pool. Signed-off-by:
Doug Oucharek <doug.s.oucharek@intel.com> Change-Id: I4b4ba4de72941b38c4115a00a992cfd1e78e9e49 Reviewed-on: https://review.whamcloud.com/27015 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Andrew Perepechko <andrew.perepechko@seagate.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- May 16, 2017
-
-
Sonia Sharma authored
Deleting peers with lnetctl command "import --del < config.yaml" throws error The above command deletes prim_nid first and then tries deleting the other nids which results in error, since deleting the primary_nid deletes the entire peer and then after that we try to delete non-existent NIDs. The behavior should be if the primary_nid is present in the list of NIDs then delete the entire peer, otherwise delete only the NIDs specified within the peer Test-Parameters: trivial Signed-off-by:
Sonia Sharma <sonia.sharma@intel.com> Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I55114fca4d332c950872bd446e02e4f0904ee716 Reviewed-on: https://review.whamcloud.com/27001 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Doug Oucharek authored
Currently, the fix in LU-5718 which allows for multiple sges to deal with RDMA fragmentation is turned off by deafult (set to 1). This patch changes the default to 2 so RDMA fragmentation is fixed by default. Test-Parameters: trivial Signed-off-by:
Doug Oucharek <doug.s.oucharek@intel.com> Change-Id: I8a29a7b32ababd37cbc471664083362bc7253d97 Reviewed-on: https://review.whamcloud.com/26911 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Ned Bass <bass6@llnl.gov> Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- May 12, 2017
-
-
Doug Oucharek authored
Changing all calls in the ksocklnd from sock_create() to sock_create_kern(). Signed-off-by:
Doug Oucharek <doug.s.oucharek@intel.com> Change-Id: Ib8b175e73478b1edfb5e8cd3491e589e8267f52a Reviewed-on: https://review.whamcloud.com/26958 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Andreas Dilger <andreas.dilger@intel.com> Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-by:
Olaf Weber <olaf.weber@hpe.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Bobi Jam authored
So that it can be included in other projects. Signed-off-by:
Bobi Jam <bobijam.xu@intel.com> Change-Id: I980578742fd194e2464870f1ab8d6a9ae8deb9e2 Reviewed-on: https://review.whamcloud.com/26859 Reviewed-by:
Andreas Dilger <andreas.dilger@intel.com> Reviewed-by:
Niu Yawei <yawei.niu@intel.com> Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Doug Oucharek authored
OPA driver optimizations are based on the MPI model where it is expected to have multiple endpoints between two given nodes. To enable this optimization for Lustre, we need to make it possible, via an LND-specific tuneable, to create multiple endpoints and to balance the traffic over them. Both sides of a connection must have this patch for it to work. Only the active side of the connection (usually the client) needs to have the new tuneable set > 1. Signed-off-by:
Doug Oucharek <doug.s.oucharek@intel.com> Change-Id: Iaf3b49bf0aecf79cb67eb1bacba1940cd811b2fb Reviewed-on: https://review.whamcloud.com/25168 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Amir Shehata <amir.shehata@intel.com> Reviewed-by:
Dmitry Eremin <dmitry.eremin@intel.com> Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- May 09, 2017
-
-
Chris Horn authored
A flags argument was added to ib_alloc_pd() in Linux 4.9 commit ed082d36a7b2c27d1cda55fdfb28af18040c4a89. The fix for LU-9026, Lustre commit e4297ef3, accounted for this change by checking for the removal of ib_get_dma_mr() (which happened separately). However, SLES 12 SP3 beta 1 adopted the extra argument to ib_alloc_pd(), but retains the ib_get_dma_mr() function. As a result, we need an explicit check for the two argument version of ib_alloc_pd(). Signed-off-by:
Chris Horn <hornc@cray.com> Change-Id: Iecde347e9f18149cac63e243082a2686de260ba7 Reviewed-on: https://review.whamcloud.com/26934 Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Tested-by: Jenkins Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Amir Shehata authored
selftest always responded to the primary nid of the peer rather than the source of the message, which it should be. Test-Parameters: trivial Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Signed-off-by:
Olaf Weber <olaf@sgi.com> Change-Id: I14a4b6ffc5882cb23298429d8a4bd0bcb0a8a5be Reviewed-on: https://review.whamcloud.com/26723 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- May 05, 2017
-
-
Olaf Weber authored
LNET_MAX_INTERFACES is the number of interfaces supported by interface bonding in the ksocknal LND. It shows up in LNet because a number of data structures are shared between LNDs. Rename it to LNET_NUM_INTERFACES to reduce the confusion of what it does. Test-Parameters: trivial Signed-off-by:
Olaf Weber <olaf@sgi.com> Change-Id: Ibc1d85a379d6616eb1db2fcb54aaffc835ffa9f4 Reviewed-on: https://review.whamcloud.com/26693 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Olaf Weber authored
In lnet_select_pathway() sending to the loopback NID is handled as a special case, because there are no credits involved. (The loopback NID doesn't use credits, and therefore does not have any credits. If a message goes through the credit-managing code it therefore ends up waiting indefinitely for credits to become available.) The check whether we're sending over the loopback NID must be done after we've completed choosing the NI to send over. In its present location it only handles the case where the loopback NID was explicitly passed in as the source NID. (Lustre does not exercise this code path during normal operation, the bug was encountered while testing code for the peer discovery feature.) Test-Parameters: trivial Signed-off-by:
Olaf Weber <olaf@sgi.com> Change-Id: Ifa25abf508214ae363a2f1bb04ffeab1891a2564 Reviewed-on: https://review.whamcloud.com/26692 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Olaf Weber authored
When an attempt to send a message fails, for example because no connection could be established with the remote address, socklnd drops the message. For a PUT or REPLY message with non-zero payload, ksocknal_tx_done() calls lnet_finalize() with -EIO as the error code. But for an ACK or GET message there is no payload, and lnet_finalize() is called with 0 (no error) as the error code. This leaves upper layers to rely on other means to determine that sending the message did actually fail, and that (for example) no REPLY will ever answer a failed GET. Add an error code parameter to ksocknal_tx_done(). In ksocknal_txlist_done() change the 0/1 'error' indicator to be an actual error code that is passed on the ksocknal_tx_done(). Update the callers of ksocknal_txlist_done() to pass in the error code if they have encountered an error. Test-Parameters: trivial Signed-off-by:
Olaf Weber <olaf@sgi.com> Change-Id: I66b897a31e537e70dcc2622ffdfcc6e96fa93193 Reviewed-on: https://review.whamcloud.com/26691 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Olaf Weber authored
The locking changes for the lnet_net_lock made for Multi-Rail introduce a race in the LNet shutdown path. The code keeps two states in the_lnet.ln_shutdown: 0 means LNet is either up and running or shut down, while 1 means lnet is shutting down. In lnet_select_pathway() if we need to restart and drop and relock the lnet_net_lock we can find that LNet went from running to stopped, and not be able to tell the difference. Replace ln_shutdown with a three-state ln_state patterned on ln_rc_state: states are LNET_STATE_SHUTDOWN, LNET_STATE_RUNNING, and LNET_STATE_STOPPING. Most checks against ln_shutdown now test ln_state against LNET_STATE_RUNNING. LNet moves to RUNNING state in lnet_startup_lndnets(). Test-Parameters: trivial Signed-off-by:
Olaf Weber <olaf@sgi.com> Change-Id: I7afcbeb793dfa4d0a361e421ae06a99b7d4db903 Reviewed-on: https://review.whamcloud.com/26690 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
- May 01, 2017
-
-
Doug Oucharek authored
When the patch for LU-5710 landed, a check for message size was landed that should not have been. This check was part of a patch in LU-7650 which was later pulled because it broke things. LU-5718 picked up this code via its many rebases (it too forever to land LU-5718 which is the core problem here). This patch removes that messaage size check. Test-Parameters: trivial Signed-off-by:
Doug Oucharek <doug.s.oucharek@intel.com> Change-Id: I3d114ec16cfbfd994efd9aee55e28a09159597be Reviewed-on: https://review.whamcloud.com/26891 Tested-by: Jenkins Reviewed-by:
Andreas Dilger <andreas.dilger@intel.com> Reviewed-by:
Sonia Sharma <sonia.sharma@intel.com> Reviewed-by:
James Simmons <uja.ornl@yahoo.com> Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Amir Shehata <amir.shehata@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Amir Shehata authored
To avoid backwards compatibility issues between base MR and Dynamic Discovery standardize the ioctl interface by bringing in changes to the interface required by Dynamic Discovery now. Test-Parameters: trivial Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I014d74943b893ec24e3d42e1eb6824d755460c2b Reviewed-on: https://review.whamcloud.com/26689 Tested-by: Jenkins Reviewed-by:
Olaf Weber <olaf@sgi.com> Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Olaf Weber authored
Remove a debug ioctl that was added to allow for debug messages from user space. However, the code is currently not being used. Test-Parameters: trivial Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Signed-off-by:
Olaf Weber <olaf@sgi.com> Change-Id: Ifd2bee73ef507bd07296af76dac1caf08ded9e64 Reviewed-on: https://review.whamcloud.com/26688 Tested-by: Jenkins Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-
Amir Shehata authored
Fixed a set of issues found while running static analysis. Test-Parameters: trivial Signed-off-by:
Amir Shehata <amir.shehata@intel.com> Change-Id: I22ddfdda86c979c7a300ab9df777efbdd5973ac5 Reviewed-on: https://review.whamcloud.com/26687 Tested-by: Jenkins Reviewed-by:
Olaf Weber <olaf@sgi.com> Tested-by:
Maloo <hpdd-maloo@intel.com> Reviewed-by:
Doug Oucharek <doug.s.oucharek@intel.com> Reviewed-by:
Oleg Drokin <oleg.drokin@intel.com>
-