colossalai.zero.shard_utils.bucket_tensor_shard_strategy

class colossalai.zero.shard_utils.bucket_tensor_shard_strategy.BucketTensorShardStrategy[source]

Use the same shard scheme as TensorShardStrategy’s, but it gathers tensors of a sub-module together, which will fully utilize network bandwidth. It is especially useful when sub-module contains bias, since we cannot utilize network bandwidth well if we only gather a bias tensor (bias is usaully small).