colossalai.gemini
- class colossalai.gemini.StatefulTensorMgr(tensor_placement_policy)[source]
Stateful Tensor Manager, inspired from PatrickStar
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management https://arxiv.org/abs/2108.05818
- class colossalai.gemini.GeminiManager(placement_policy, chunk_manager, memstats=None)[source]
Stateful Tensor Manager, inspired from PatrickStar
PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management https://arxiv.org/abs/2108.05818
- Parameters:
placement_policy (str) – Which device to place held tensors. It can be ‘cpu’, ‘cuda’ and ‘auto’. If it’s ‘cpu’, parameters, gradients and optimizer states will be offloaded to CPU, which means min CUDA memory will be used. If it’s ‘cuda’, they won’t be offloaded, which means max CUDA memory will be used. If it’s ‘auto’, they are moving dynamically based on CPU and CUDA memory usage. It will utilize heterogeneous memory space evenly and well. Note that ‘auto’ policy can only work well when no other processes use CUDA during your training.
chunk_manager (ChunkManager) – A
ChunkManager
instance.memstats (MemStats, optional) – a mem stats collected by a runtime mem tracer. if None then GeminiManager will collect it during a warmup iteration.
- class colossalai.gemini.TensorInfo(state: colossalai.gemini.chunk.chunk.TensorState, offset: int, end: int)[source]
- class colossalai.gemini.ChunkManager(chunk_configuration, init_device=None)[source]
A manager class to manipulate the tensors in chunks.
- Parameters:
chunk_configuration (Dict[int, Dict]) – the configuration dictionary of this chunk manager.
init_device (torch.device) – optional, the device on which the chunk is initialized. The default is None.
- register_tensor(tensor, group_type, config_key, cpu_offload=False, pin_memory=False)[source]
Register a tensor to the chunk manager. Then, the tensor should be accessed by get_chunks.
- Parameters:
tensor – the tensor appended to the chunk
group_type – the data type of the group.
config_key – the key of the group’s name, the size of the dp world
cpu_offload – if True, the chunk will be closed on CPU
pin_memory – whether the chunk is pinned in the cpu memory
- move_chunk(chunk, device, force_copy=False)[source]
Move the shard of the chunk to the target device.
- trans_tensor_state(tensor, state)[source]
Transit tensor state according to pre-defined state machine.
- copy_tensor_to_chunk_slice(tensor, data)[source]
Copy data to the chunk.
- Parameters:
tensor (torch.Tensor) – the tensor used to retrive meta information
data (torch.Tensor) – the tensor to be copied to the chunk
- get_chunk(tensor)[source]
Return the chunk owning the tensor.
- Parameters:
tensor (torch.Tensor) – a torch tensor object
- get_chunks(tensors)[source]
Get all chunks owning the input tensors.
- Parameters:
tensors (Iterable[torch.Tensor]) – the tensors used to look for chunks
- add_extern_static_tensor(tensor)[source]
Add extern static tensor to chunk manager. Those tensors won’t be managed by chunk manager, but we want to monitor memory usage of them. They are “static”, which means their shape, dtype, device never change. Thus, their memory usage never changes.
- Parameters:
tensor (torch.Tensor) – An extern static tensor. E.g. optimizer state.
- colossalai.gemini.search_chunk_configuration(model, search_range_mb, search_interval_byte, min_chunk_size_mb=32, filter_exlarge_params=True, strict_ddp_flag=False, memstas=None)[source]
- Parameters:
model (nn.Module) – torch module
search_range_mb (float) – searching range in mega byte.
search_interval_byte (int) – searching interval in byte.
min_chunk_size_mb (float, optional) – the minimum size of a distributed chunk.
filter_exlarge_params (bool, optional) – filter extreme large parameters. Defaults to True.
strict_ddp_flag (bool, optional) – whether to enable the strict ddp mode. all parameters keep replicated in this mode.
- Returns:
chunk config (a dict of dp_degree -> chunk init args) and its memory chunk waste in byte.
- Return type:
Tuple[Dict, int]
- colossalai.gemini.chunk
- colossalai.gemini.chunk_mgr
- colossalai.gemini.gemini_context
- colossalai.gemini.gemini_mgr
- colossalai.gemini.placement_policy
- colossalai.gemini.stateful_tensor
- colossalai.gemini.stateful_tensor_container
- colossalai.gemini.stateful_tensor_mgr
- colossalai.gemini.tensor_placement_policy
- colossalai.gemini.tensor_utils