colossalai.zero.shard_utils

class colossalai.zero.shard_utils.TensorShardStrategy[source]

A naive implementation which shard each tensor evenly over all ranks

class colossalai.zero.shard_utils.BucketTensorShardStrategy[source]

Use the same shard scheme as TensorShardStrategy’s, but it gathers tensors of a sub-module together, which will fully utilize network bandwidth. It is especially useful when sub-module contains bias, since we cannot utilize network bandwidth well if we only gather a bias tensor (bias is usaully small).