-
Lai Siyao authored
Current directory striping strategy is fixed striping, i.e., it calls (hash(filename) % stripe_count) to decide stripe for file. The problem with this approach is that if stripe_count changes, most of the files will need to be relocated between MDTs. This makes directory split/merge quite expensive. This patch introduces consistent hash striping strategy: it calls (hash(filename) % LMV_CRUSH_PG_COUNT) to locate PG_ID (placement group index), and then calls crush_hash(PG_ID, stripe_index) to get a straw for each stripe, and the stripe with the highest staw will be used to place this file. As we can see, it uses the CRUSH algorithm, but it only uses it to map placement group pseudo-randomly among all stripes, while doesn't use it to choose MDTs if MDT is not specified. The latter is done by MDT object QoS allocation in LMV and LOD (LMV decides the starting stripe MDT, while LOD decides the rest stripes). This implementation contains below changes: * new hash type "crush", and lmv_name_to_stripe_index() will take care of it in mapping name to stripe index. * add "mdt_hash" sysfs tunable for LOD, which will be the default LMV hash type used in mkdir. * if 'lfs setdirstripe' doesn't set hash type, server will choose the default LMV hash type. * place temp file on MDT where target is: map temporary file created by rsync or mpifileutils dstripe on the same MDT where target is located, so later rename won't make target a remote entry. And the same for backup file ends with .bak, .sav, .orig or ~. Compatibility: * if server doesn't support OBD_CONNECT2_CRUSH, client will switch to "fnv_1a_64" hash type if "crush" is specified. * if client doesn't know "crush" hash type, it will try all stripes in lookup. * client will set hash type to LMV_HASH_TYPE_UNKNOWN if setdirstripe doesn't set it explicitly, and server will use the default hash type to create striped directory, NB, for backward compatibility, client will set it to LMV_HASH_TYPE_DEFAULT if server < 2.14. test_mkdir() in sanity test will create striped directory with random hash type if not specified. sanity test 160g, 160h and 160i are fragile, which require a certain amount of files created under each stripe, use 'fnv_1a_64' hash type to fullfill this requirement. sanity-lfsck test 31b has the same limit, create more subdirs to make it more robust. Add sanity 33h for temp file mapping. Test-Parameters: trivial Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: I669e561c667f926c35cf1338f4c6604249e1ee51 Reviewed-on: https://review.whamcloud.com/36775 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
0a1cf8da