Skip to content
  • Lai Siyao's avatar
    LU-11025 dne: introduce new directory hash type: "crush" · 0a1cf8da
    Lai Siyao authored
    Current directory striping strategy is fixed striping, i.e., it
    calls (hash(filename) % stripe_count) to decide stripe for file.
    The problem with this approach is that if stripe_count changes,
    most of the files will need to be relocated between MDTs. This
    makes directory split/merge quite expensive.
    This patch introduces consistent hash striping strategy:
    it calls (hash(filename) % LMV_CRUSH_PG_COUNT) to locate PG_ID
    (placement group index), and then calls
    crush_hash(PG_ID, stripe_index) to get a straw for each stripe,
    and the stripe with the highest staw will be used to place this
    As we can see, it uses the CRUSH algorithm, but it only uses it
    to map placement group pseudo-randomly among all stripes, while
    doesn't use it to choose MDTs if MDT is not specified. The latter
    is done by MDT object QoS allocation in LMV and LOD (LMV decides
    the starting stripe MDT, while LOD decides the rest stripes).
    This implementation contains below changes:
    * new hash type "crush", and lmv_name_to_stripe_index() will take
      care of it in mapping name to stripe index.
    * add "mdt_hash" sysfs tunable for LOD, which will be the default LMV
      hash type used in mkdir.
    * if 'lfs setdirstripe' doesn't set hash type, server will choose
      the default LMV hash type.
    * place temp file on MDT where target is: map temporary file created
      by rsync or mpifileutils dstripe on the same MDT where target is
      located, so later rename won't make target a remote entry. And the
      same for backup file ends with .bak, .sav, .orig or ~.
    * if server doesn't support OBD_CONNECT2_CRUSH, client will switch
      to "fnv_1a_64" hash type if "crush" is specified.
    * if client doesn't know "crush" hash type, it will try all stripes
      in lookup.
    * client will set hash type to LMV_HASH_TYPE_UNKNOWN if setdirstripe
      doesn't set it explicitly, and server will use the default hash
      type to create striped directory, NB, for backward compatibility,
      client will set it to LMV_HASH_TYPE_DEFAULT if server < 2.14.
    test_mkdir() in sanity test will create striped directory with
    random hash type if not specified.
    sanity test 160g, 160h and 160i are fragile, which require a certain
    amount of files created under each stripe, use 'fnv_1a_64' hash type
    to fullfill this requirement.
    sanity-lfsck test 31b has the same limit, create more subdirs to
    make it more robust.
    Add sanity 33h for temp file mapping.
    Test-Parameters: trivial
    Signed-off-by: default avatarLai Siyao <>
    Change-Id: I669e561c667f926c35cf1338f4c6604249e1ee51
    Tested-by: default avatarjenkins <>
    Tested-by: default avatarMaloo <>
    Reviewed-by: default avatarAndreas Dilger <>
    Reviewed-by: default avatarHongchao Zhang <>