1. 07 Aug, 2020 1 commit
  2. 20 Jul, 2020 1 commit
    • Sebastien Buisson's avatar
      LU-12275 sec: atomicity of encryption context getting/setting · 40d91eaf
      Sebastien Buisson authored
      
      
      Encryption layer needs to set an encryption context on files and dirs
      that are encrypted. This context is stored as an extended attribute,
      that then needs to be fetched upon metadata ops like lookup, getattr,
      open, truncate, and layout.
      
      With this patch we send encryption context to the MDT along with
      create RPCs. This closes the insecure window between creation and
      setting of the encryption context, and saves a setxattr request.
      
      This patch also introduces a way to have the MDT return encryption
      context upon granted lock reply, making the encryption context
      retrieval atomic, and sparing the client an additional getxattr
      request.
      
      Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49" clientdistro=el8.1 fstype=ldiskfs mdscount=2 mdtcount=4
      Test-Parameters: testlist=sanity-sec envdefinitions=ONLY="36 37 38 39 40 41 42 43 44 45 46 47 48 49" clientdistro=el8.1 fstype=zfs mdscount=2 mdtcount=4
      Test-Parameters: clientversion=2.12 env=SANITY_EXCEPT="27M 56ra 151 156 802"
      Test-Parameters: serverversion=2.12 env=SANITY_EXCEPT="56oc 56od 165a 165b 165d 205b"
      Test-Parameters: serverversion=2.12 clientdistro=el8.1 env=SANITYN_EXCEPT=106,SANITY_EXCEPT="56oc 56od 165a 165b 165d 205b"
      Signed-off-by: default avatarSebastien Buisson <sbuisson@ddn.com>
      Change-Id: I45599cdff13d5587103aff6edd699abcda6cb8f4
      Reviewed-on: https://review.whamcloud.com/38430
      
      
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarMike Pershin <mpershin@whamcloud.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
      40d91eaf
  3. 10 Jul, 2020 1 commit
  4. 04 Jul, 2020 2 commits
  5. 23 Jun, 2020 1 commit
  6. 19 Jun, 2020 1 commit
  7. 16 Jun, 2020 1 commit
  8. 10 Jun, 2020 2 commits
  9. 06 Jun, 2020 1 commit
  10. 27 May, 2020 2 commits
    • James Simmons's avatar
      LU-13258 obdclass: bind zombie export cleanup workqueue · 76b602c2
      James Simmons authored
      
      
      Lustre uses a workqueue to clear out stale exports. Bind this
      workqueue to the cores used by Lustre defined by the CPT setup.
      
      Move the code handling workqueue binding to libcfs so it can be
      used by everyone.
      
      Rename CONFIG_LUSTRE_PINGER to CONFIG_LUSTRE_FS_PINGER to match
      linux client.
      
      Change-Id: Ifa109f6a93e6ec6bbdef5e91fe8ca1cde0eaea3e
      Signed-off-by: default avatarJames Simmons <jsimmons@infradead.org>
      Reviewed-on: https://review.whamcloud.com/38212
      
      
      Reviewed-by: default avatarShaun Tancheff <shaun.tancheff@hpe.com>
      Reviewed-by: default avatarWang Shilong <wshilong@ddn.com>
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
      76b602c2
    • Qian Yingjin's avatar
      LU-10934 llite: integrate statx() API with Lustre · 3f7853b3
      Qian Yingjin authored
      
      
      System call statx() interface can specify a bitmask to fetch
      specific attributes from a file (e.g. st_uid, st_gid, st_mode, and
      st_btime = file creation time), rather than fetching all of the
      normal stat() attributes (such as st_size and st_blocks). It also
      has a AT_STATX_DONT_SYNC mode which allows the kernel to return
      cached attributes without flushing all of the client data and
      fetching an accurate result from the server.
      The conditions for adding statx() API for Lustre are mature:
      1. statx() is added to Linux 4.11+;
      2. glibc supports statx() (glibc 2.28+ -> RHEL 8, Ubuntun 18.10+)
      3. The support for stat(1) and ls(1) to use statx(3) to fetch
         only the required attributes has landed to the upstream GNU
         coreutils package.
      
      This patch integrates statx() API with Lustre so that we can take
      advantage of the efficiencies available:
      - Only fetch MDS attributes if STATX_SIZE, STATX_BLOCKS and
        STATX_MTIME are not requested, and avoid OSS glimpse RPCs
        completely;
      - Hook this into statahead to avoid async glimpse locks (AGL) if
        OST information not needed;
      - Enhance the MDS RPC interface to return the file creation time
        stored in both ldiskfs and ZFS already, and enable STATX_BTIME;
      - Better support with AT_STATX_DONT_SYNC mode. Return the "lazy"
        attributes or cached attributes (even stale) on a client if
        available without any RPCs to servers (MDS and OSS).
      - statx (lustre/test/statx): port coreutils ls/stat by using
        statx(3) system call if OS supported it.
      - Test scripts. Using statx() to verify btime attribute and the
        advantage described above.
      
      Test-Parameters: clientdistro=el8
      Test-Parameters: clientdistro=ubuntu1804
      Signed-off-by: default avatarQian Yingjin <qian@ddn.com>
      Change-Id: I8432c9029bad9dea3e1ebc13a0d6978131d9b929
      Reviewed-on: https://review.whamcloud.com/36674
      
      
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Reviewed-by: default avatarJames Simmons <jsimmons@infradead.org>
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      3f7853b3
  11. 14 May, 2020 1 commit
  12. 08 May, 2020 1 commit
  13. 07 May, 2020 1 commit
  14. 23 Apr, 2020 1 commit
  15. 07 Apr, 2020 1 commit
  16. 06 Apr, 2020 1 commit
  17. 31 Mar, 2020 1 commit
    • Lai Siyao's avatar
      LU-11025 dne: introduce new directory hash type: "crush" · 0a1cf8da
      Lai Siyao authored
      Current directory striping strategy is fixed striping, i.e., it
      calls (hash(filename) % stripe_count) to decide stripe for file.
      The problem with this approach is that if stripe_count changes,
      most of the files will need to be relocated between MDTs. This
      makes directory split/merge quite expensive.
      
      This patch introduces consistent hash striping strategy:
      it calls (hash(filename) % LMV_CRUSH_PG_COUNT) to locate PG_ID
      (placement group index), and then calls
      crush_hash(PG_ID, stripe_index) to get a straw for each stripe,
      and the stripe with the highest staw will be used to place this
      file.
      
      As we can see, it uses the CRUSH algorithm, but it only uses it
      to map placement group pseudo-randomly among all stripes, while
      doesn't use it to choose MDTs if MDT is not specified. The latter
      is done by MDT object QoS allocation in LMV and LOD (LMV decides
      the starting stripe MDT, while LOD decides the rest stripes).
      
      This implementation contains below changes:
      ...
      0a1cf8da
  18. 24 Mar, 2020 1 commit
    • Wang Shilong's avatar
      LU-12748 readahead: limit async ra requests · 1427a720
      Wang Shilong authored
      
      
      Currently async readahead is limited by following factors:
      
      1) @ra_max_pages_per_file
      2) @ra_max_read_ahead_whole_pages;
      3) @ra_async_pages_per_file_threshold
      
      If admin change a large value 4G to @ra_max_read_ahead_whole_pages,
      with 16M RPC we could have 256 async readahead requests
      flighting at the same time, this could consume all CPU
      resources for readahead without limiting.
      
      Even though we could set @max_active for workqueue,
      RA requests still kept in the workqueue pool which help
      prevent from CPU busying, the problem is RA still try to
      use CPU later, we might still submit too many requests
      to workqueue, so instead of limiting it in the workqueue,
      we could limit it earlier, if there has been too many
      async RA requests in the system(let's say default is 1/2
      of CPU cores), we just fallback to sync RA, which limit
      read threads using all CPU resources.
      
      Change-Id: I370c04e014f24c795c1a28effca9c51b1db2a417
      Signed-off-by: default avatarWang Shilong <wshilong@ddn.com>
      Reviewed-on: https://review.whamcloud.com/37927
      
      
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Reviewed-by: default avatarJames Simmons <jsimmons@infradead.org>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
      1427a720
  19. 01 Mar, 2020 1 commit
  20. 20 Feb, 2020 3 commits
  21. 14 Feb, 2020 1 commit
    • Mr NeilBrown's avatar
      LU-12930 various: use schedule_timeout_*interruptible · 5c883ea2
      Mr NeilBrown authored
      
      
      The construct:
      
        set_current_state(TASK_UNINTERRUPTIBLE);
        schedule_timeout(time);
      
      Is more clearly expressed as
      
        schedule_timeout_uninterruptible(time);
      
      And similarly with TASK_INTERRUPTIBLE /
      schedule_timeout_interruptible()
      
      Establishing this practice makes it harder to forget to call
      set_current_state() as has happened a couple of times - in
      lnet_peer_discovery and mdd_changelog_fini().
      
      Also, there is no need to set_current_state(TASK_RUNNABLE) after
      calling schedule*().  That state is guaranteed to have been set.
      
      In mdd_changelog_fini() there was an attempt to sleep for
      10 microseconds.  This will always round up to 1 jiffy, so
      just make it schedule_timeout_uninterruptible(1).
      
      Finally a few places where the number of seconds was multiplied
      by 1, have had the '1 *' removed.
      
      Test-Parameters: trivial
      Signed-off-by: default avatarMr NeilBrown <neilb@suse.de>
      Change-Id: I01b37039de0bf7e07480de372c1a4cfe78a8cdd8
      Reviewed-on: https://review.whamcloud.com/3665...
      5c883ea2
  22. 28 Jan, 2020 1 commit
  23. 23 Jan, 2020 3 commits
    • Lai Siyao's avatar
      LU-13121 llite: fix deadlock in ll_update_lsm_md() · 37465502
      Lai Siyao authored
      
      
      Deadlock may happen in in following senario: a lookup process called
      ll_update_lsm_md(), it found lli->lli_lsm_md is NULL, then
      down_write(&lli->lli_lsm_sem). but another lookup process initialized
      lli->lli_lsm_md after this check and before write lock, so the first
      lookup process called up_read(&lli->lli_lsm_sem) and return, so the
      write lock is never released, which cause subsequent lookups deadlock.
      
      Rearrange the code to simplify the locking:
      1. take read lock.
      2. if lsm was initialized and unchanged, release read lock and return.
      3. otherwise release read lock and take write lock.
      4. free current lsm and initialize with new lsm.
      5. release write lock.
      6. initialize stripes with read lock.
      
      Signed-off-by: default avatarLai Siyao <lai.siyao@whamcloud.com>
      Change-Id: Ifcc25a957983512db6f29105b5ca5b6ec914cb4b
      Reviewed-on: https://review.whamcloud.com/37182
      
      
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarHongchao Zhang <hongchao@whamcloud.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
      37465502
    • Andreas Dilger's avatar
      LU-12521 llapi: add separate fsname and instance API · 00d14521
      Andreas Dilger authored
      The llapi_getname() function returns the combined fsname and client
      instance as one string, which is fine when using the entire string,
      but the output cannot be safely parsed into separate fsname and
      instance strings in all cases.
      
      Introduce new llapi_get_fsname() and llapi_get_instance() functions
      that return only the fsname and instance strings, since the source
      string returned from the kernel can be unambiguously separated before
      it is returned in a combined string via llapi_getname().
      
      Fix the lfs_getname() '-n' and '-i' options to use the new routines
      rather than parsing the output from llapi_getname().
      
      Add man pages for these functions.
      
      Fixes: 2a4821b8
      
       ("LU-12159 utils: improve lfs getname functionality")
      Signed-off-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Change-Id: Iaf5846a0ae147a428f66ec8a1d0251e7e12540e5
      Reviewed-on: https://review.whamcloud.com/35451
      
      
      Reviewed-by: default avatarOlaf Faaland-LLNL <faaland1@llnl.gov>
      Reviewed-by: default avatarJames Simmons <jsimmons@infradead.org>
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
      00d14521
    • NeilBrown's avatar
      LU-12460 llite: replace lli_trunc_sem · e5914a61
      NeilBrown authored
      
      
      lli_trunc_sem can lead to a deadlock.
      
      vvp_io_read_start takes lli_trunc_sem, and can take
      mmap sem in the direct i/o case, via
      generic_file_read_iter->ll_direct_IO->get_user_pages_unlocked
      
      vvp_io_fault_start is called with mmap_sem held (taken in
      the kernel page fault code), and takes lli_trunc_sem.
      
      These aren't necessarily the same mmap_sem, but can be if
      you mmap a lustre file, then read into that mapped memory
      from the file.
      
      These are both 'down_read' calls on lli_trunc_sem so they
      don't directly conflict, but if vvp_io_setattr_start() is
      called to truncate the file between these, it does
      'down_write' on lli_trunc_sem.  As semaphores are queued,
      this down_write blocks subsequent reads.
      
      This means if the page fault has taken the mmap_sem,
      but not yet the lli_trunc_sem in vvp_io_fault_start,
      it will wait behind the lli_trunc_sem down_write from
      vvp_io_setattr_start.
      
      At the same time, vvp_io_read_start is holding the
      lli_trunc_sem and waiting for the mmap_sem, which will not
      be released because vvp_io_fault_start cannot get the
      lli_trunc_sem because the setattr 'down_write' operation is
      queued in front of it.
      
      Solve this by replacing with a hand-coded semaphore, using
      atomic counters and wait_var_event().  This allows a
      special down_read_nowait which ignores waiting down_write
      operations.  This combined with waking up all waiters at
      once guarantees that down_read_nowait can always 'join'
      another down_read, guaranteeing our ability to take the
      semaphore twice for read and avoiding the deadlock.
      
      I'd like there to be a better way to fix this, but I
      haven't found it yet.
      
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarPatrick Farrell <pfarrell@whamcloud.com>
      Change-Id: Ibd3abf4df1f1f6f45e440733a364999bd608b191
      Reviewed-on: https://review.whamcloud.com/35271
      
      
      Reviewed-by: default avatarNeil Brown <neilb@suse.de>
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarShaun Tancheff <shaun.tancheff@hpe.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
      e5914a61
  24. 18 Jan, 2020 1 commit
    • Mr NeilBrown's avatar
      LU-9679 llite: fix possible race with module unload. · 89aff2f3
      Mr NeilBrown authored
      
      
      lustre_fill_super() calls client_fill_super() without holding a
      reference to the module containing client_fill_super.  If that
      module is unloaded at a bad time, this can crash.
      
      To be able to get a reference to the module using
      try_get_module(), we need a pointer to the module.
      
      So replace
        lustre_register_client_fill_super() and
        lustre_register_kill_super_cb()
      with a single
        lustre_register_super_ops()
      which also passed a module pointer.
      
      Then use a spinlock to ensure the module pointer isn't removed
      while try_module_get() is running, and use try_module_get() to
      ensure we have a reference before calling client_fill_super().
      
      Now that we take the reference to the module before calling
      luster_fill_super(), we don't need to take one inside
      lustre_fill_super().
      
      Linux-commit: d487fe31f49e78f3cdd826923bf0c340a839ffd8
      
      Signed-off-by: default avatarMr NeilBrown <neilb@suse.de>
      Change-Id: I9474622f2a253d9882eae3f0578c50782dd11ad4
      Reviewed-on: https://review.whamcloud.co...
      89aff2f3
  25. 10 Jan, 2020 1 commit
  26. 20 Dec, 2019 1 commit
  27. 06 Dec, 2019 2 commits
  28. 05 Dec, 2019 1 commit
  29. 03 Dec, 2019 1 commit
    • Qian Yingjin's avatar
      LU-13030 pcc: auto attach not work after client cache clear · a5ef2d6e
      Qian Yingjin authored
      When the inode of a PCC cached file in unused state was evicted
      from icache due to memory pressure or manual icache cleanup (i.e.
      "echo 3 > /proc/sys/vm/drop_caches"), this file will be detached
      from PCC also, and all PCC state for this file is cleared.
      In the current design, PCC only tries to auto attache the file
      once attached into PCC according to the in-memery PCC state. Thus
      later IO for the file is not directed to PCC and will trigger the
      data restore.
      
      If this is a not desired result for the user, then we need to try
      to auto attach file that was never attached into PCC or once
      attached but detached as a result of shrinking its inode from
      icache.
      
      Although the candidates to try auto attach are increased, but only
      the file in HSM released state (which can directly get from file
      layout) will be checked.
      
      This bug is easy reproduced on rhel8. It seems that the command
      "echo 3 > /proc/sys/vm/drop_caches" will drop all unused inodes
      from icache,...
      a5ef2d6e
  30. 12 Nov, 2019 2 commits
  31. 22 Oct, 2019 1 commit