1. 07 Aug, 2020 1 commit
  2. 20 Jul, 2020 2 commits
  3. 10 Jun, 2020 1 commit
  4. 06 Jun, 2020 1 commit
  5. 02 Jun, 2020 1 commit
    • Lai Siyao's avatar
      LU-11025 dne: directory restripe and auto split · a336d7c7
      Lai Siyao authored
      A specific restriper thread is created for each MDT, it does three
      tasks in a loop:
      1. If there is directory whose total sub-files exceeds threshold
         (50000 by default, can be changed "lctl set_param
         mdt.*.dir_split_count=N"), split this directory by adding new
         stripes (4 stripes by default, which can be adjusted by
         "lctl set_param mdt.*.dir_split_delta=N").
      2. If a directory stripe LMV is marked 'MIGRATION', migrate sub file
         from current offset, and update offset to next file.
      3. If a directory master LMV is marked 'RESTRIPING', check whether
         all stripe LMV 'MIGRATION' flag is cleared, if so, clear
         'RESTRIPING' flag and update directory LMV.
      In last patch, the first part of manual directory stripe is
      implemented, and in this patch, sub file migrations and dir layout
      update is done. Directory auto-split is done in similar way, except
      that the first step is done by this thread too.
      Directory auto-split can be enabled/disabled by "lctl set_param
      mdt.*.enable_dir_auto_split=[0|1]", it's turned on by default.
      Auto split is triggered at the end of getattr(): since now the attr
      contains dirent count, check whether it exceeds threshold, if so,
      add this directory into mdr_auto_split list and wake up the dir
      restriper thread.
      Restripe migration is also triggered in getattr(): if the object is
      directory stripe, and LMV 'MIGRATION' flag set, add this object into
      mdr_restripe_migrate list and wake up the dir restriper thread.
      Directory layout update is similar: if current directory is striped,
      and LNV 'RESTRIPING' flag is set, add this directory into
      mdr_restripe_update list and wake up restriper thread.
      By default restripe migrate dirent only, and leave inode unchanged, it
      can be adjusted by "lctl set_param mdt.*.dir_restripe_nsonly=[0|1]".
      Currently DoM file inode migration is not supported, migrate dirent
      only for such files to avoid leaving dir migration/restripe
      Add sanity.sh 230o, 230p and 230q, adjust 230j since DoM files migrate
      Signed-off-by: default avatarLai Siyao <lai.siyao@whamcloud.com>
      Change-Id: I8c83b42e4acbaab067d0092d0b232de37f956588
      Reviewed-on: https://review.whamcloud.com/37284
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarHongchao Zhang <hongchao@whamcloud.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
  6. 20 May, 2020 1 commit
  7. 14 May, 2020 2 commits
  8. 07 May, 2020 2 commits
  9. 23 Apr, 2020 1 commit
  10. 14 Apr, 2020 1 commit
  11. 06 Apr, 2020 2 commits
  12. 31 Mar, 2020 2 commits
    • Lai Siyao's avatar
      LU-11025 lmv: simplify name to stripe mapping · 703afd15
      Lai Siyao authored
      Handle layout change internally when mapping name to stripe:
      * Move layout changing related code into lmv name to stripe mapping
        so callers doesn't need to take care of the internals.
      * lmv_name_to_stripe_index() maps name in new layout, and
        lmv_name_to_stripe_index_old() in old layout.
      * rename lmv_migrate_existence_check() to lmv_old_layout_lookup()
        to support directory restripe in the future.
      * support layout changing directory in LFSCK.
      Signed-off-by: default avatarLai Siyao <lai.siyao@whamcloud.com>
      Change-Id: Icf8bda5db884784f761a2d373a6f81d7e13f525f
      Reviewed-on: https://review.whamcloud.com/37711
      Tested-by: default avatarjenkins <devops@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Reviewed-by: default avatarHongchao Zhang <hongchao@whamcloud.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
    • Lai Siyao's avatar
      LU-11025 dne: introduce new directory hash type: "crush" · 0a1cf8da
      Lai Siyao authored
      Current directory striping strategy is fixed striping, i.e., it
      calls (hash(filename) % stripe_count) to decide stripe for file.
      The problem with this approach is that if stripe_count changes,
      most of the files will need to be relocated between MDTs. This
      makes directory split/merge quite expensive.
      This patch introduces consistent hash striping strategy:
      it calls (hash(filename) % LMV_CRUSH_PG_COUNT) to locate PG_ID
      (placement group index), and then calls
      crush_hash(PG_ID, stripe_index) to get a straw for each stripe,
      and the stripe with the highest staw will be used to place this
      As we can see, it uses the CRUSH algorithm, but it only uses it
      to map placement group pseudo-randomly among all stripes, while
      doesn't use it to choose MDTs if MDT is not specified. The latter
      is done by MDT object QoS allocation in LMV and LOD (LMV decides
      the starting stripe MDT, while LOD decides the rest stripes).
      This implementation contains below changes:
  13. 24 Mar, 2020 1 commit
  14. 11 Mar, 2020 1 commit
  15. 08 Feb, 2020 1 commit
  16. 28 Jan, 2020 3 commits
  17. 14 Dec, 2019 2 commits
  18. 22 Oct, 2019 2 commits
  19. 30 Sep, 2019 1 commit
  20. 20 Sep, 2019 1 commit
    • Qian Yingjin's avatar
      LU-11367 som: integrate LSOM with lfs find · 11aa7f87
      Qian Yingjin authored
      The patch integrates LSOM functionality with lfs find so that it
      is possible to use LSOM functionality directly on the client. The
      MDS fills in the mbo_size and mbo_blocks fields from the LSOM
      xattr, if the actual size/blocks are not available, and then set
      new OBD_MD_FLLSIZE and OBD_MD_FLLBLOCKS flags in the reply so that
      the client knows these fields are valid.
      The lfs find command adds "-l|--lazy" option to allow the use of
      LSOM data from the MDS.
      Add a new version of ioctl(LL_IOC_MDC_GETINFO) call that also returns
      valid flags from the MDS RPC to userspace in struct lov_user_mds_data
      so that it is possible to determine whether the size and blocks are
      returned by the call.  The old LL_IOC_MDC_GETINFO ioctl number is
      renamed to LL_IOC_MDC_GETINFO_OLD and is binary compatible, but
      newly-compiled applications will use the new struct lov_user_mds_data.
      New llapi interfaces llapi_get_lum_file(), llapi_get_lum_dir(),
      llapi_get_lum_file_fd(), llapi_get_lum_dir_fd(...
  21. 07 Sep, 2019 1 commit
  22. 27 Jul, 2019 1 commit
  23. 12 Jul, 2019 3 commits
  24. 03 Jul, 2019 1 commit
  25. 16 Jun, 2019 1 commit
  26. 13 Jun, 2019 2 commits
    • Li Xi's avatar
      LU-10092 llite: Add persistent cache on client · f172b116
      Li Xi authored
      PCC is a new framework which provides a group of local cache
      on Lustre client side. No global namespace will be provided
      by PCC. Each client uses its own local storage as a cache for
      itself. Local file system is used to manage the data on local
      caches. Cached I/O is directed to local filesystem while
      normal I/O is directed to OSTs.
      PCC uses HSM for data synchronization. It uses HSM copytool
      to restore file from local caches to Lustre OSTs. Each PCC
      has a copytool instance running with unique archive number.
      Any remote access from another Lustre client would trigger
      the data synchronization. If a client with PCC goes offline,
      the cached data becomes inaccessible for other client
      temporarilly. And after the PCC client reboots and the copytool
      restarts, the data will be accessible again.
      1) Make PCC exclusive with HSM.
      2) Strong size consistence for PCC cached file among clients.
      3) Support to cache partial content of a file.
      Change-Id: I188ed36c48aae22...
    • Lai Siyao's avatar
      LU-11213 lmv: reuse object alloc QoS code from LOD · b601eb35
      Lai Siyao authored
      Reuse the same object alloc QoS code as LOD, but the QoS code is
      not moved to lower layer module, instead it's copied to LMV, because
      it involves almost all LMV code, which is too big a change and should
      be done separately in the future.
      And for LMV round-robin object allocation, because we only need to
      allocate one object, use the MDT index saved and update it to next
      Add sanity 413b.
      Signed-off-by: default avatarLai Siyao <lai.siyao@whamcloud.com>
      Change-Id: I53c3d863dafda534eebb6b95da205b395071cd25
      Reviewed-on: https://review.whamcloud.com/34657
      Tested-by: Jenkins
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarHongchao Zhang <hongchao@whamcloud.com>
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
  27. 01 Jun, 2019 2 commits
    • Patrick Farrell's avatar
      LU-9846 lod: Add overstriping support · 591a9b4c
      Patrick Farrell authored
      Each stripe in a shared file in Lustre corresponds to a
      single LDLM extent locking domain and also to a single
      object on disk (and in the OSS page cache).  LDLM locks are
      extent locks, but there are still significant issues with
      false sharing with multiple writers.  On-disk file systems
      also have per-object performance limitations for both read
      and write.
      The LDLM limitation means it is best to have a single
      writer per stripe, but modern OSTs can be faster than a
      single client, so this restricts maximum performance unless
      special methods are used (eg, Lustre lock ahead).
      The on disk file system limitations mean that even if LDLM
      locking is not an issue (read and not write, or lockahead),
      OST performance in a shared file is still limited by having
      only one object per OST.
      These limitations make it impossible to get the full
      performance of a modern Lustre FS with a single shared
      This patch makes it possible to have >1 stripe on a given
      OST in each layout component.  This is known as
      overstriping.  It works exactly like a normally striped
      file, and is largely transparent to users.
      By raising the object count per OST, this avoids the single
      object limits, and by creating more stripes, also avoids
      the "single effective writer per stripe" LDLM limitation.
      However, it is only desirable in some situations, so users
      must request it with a special setstripe command:
      lfs setstripe -C [count] [file]
      Users can also access overstriping using the standard '-o'
      option to manually select OSTs:
      lfs setstripe -o [ost_indices] [file]
      Overstriping also makes it easy to test layout size limits,so we add a
      test for that.
      Signed-off-by: default avatarPatrick Farrell <pfarrell@whamcloud.com>
      Change-Id: I14bb94b05642b3542a965e84fda4615b997a4dea
      Reviewed-on: https://review.whamcloud.com/28425
      Tested-by: Jenkins
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarBobi Jam <bobijam@hotmail.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>
    • Mikhail Pershin's avatar
      LU-11359 mdt: fix mdt_dom_discard_data() timeouts · 9c028e74
      Mikhail Pershin authored
      The mdt_dom_discard_data() issues new lock to cause data
      discard for all conflicting client locks. This was done in
      context of unlink RPC processing and may cause it to be stuck
      waiting for client to cancel their locks leading to cascading
      timeouts for any other locks waiting on the same resource and
      parent directory.
      Patch skips discard lock waiting in the current context by
      using own CP callback for that which doesn't wait for blocking
      locks. They will be finished later by LDLM and cleaned up in
      that completion callback. So current thread just makes sure
      discard locks are taken and BL ASTs are sent but doesnt't wait
      for lock granting and that fixes the original problem.
      At the same time that opens window for race with data being
      flushed on client, so it is possible that new IO from client
      will happen on just unlinked object causing error message and
      it is not possible to distinguish that case from other
      possibly critical situations. To solve that the unlinked object
      is pinned in memory while until discard lock is granted.
      Therefore, such objects can be easily distinguished as stale one
      and any IO against it can be just silently ignored.
      Older clients are not fully compatible with async DoM discard so
      patch adds also new connection flag ASYNC_DISCARD to distinguish
      old clients and use old blocking discard for then.
      Test-Parameters: testlist=racer,racer,racer
      Signed-off-by: default avatarMikhail Pershin <mpershin@whamcloud.com>
      Change-Id: I419677af43c33e365a246fe12205b506209deace
      Reviewed-on: https://review.whamcloud.com/34071
      Tested-by: Jenkins
      Reviewed-by: default avatarAndreas Dilger <adilger@whamcloud.com>
      Reviewed-by: default avatarPatrick Farrell <pfarrell@whamcloud.com>
      Tested-by: default avatarMaloo <maloo@whamcloud.com>
      Reviewed-by: default avatarOleg Drokin <green@whamcloud.com>