colossalai.nn.init
- colossalai.nn.init.zeros_()[source]
Return the initializer filling the input Tensor with the scalar zeros
- colossalai.nn.init.ones_()[source]
Return the initializer filling the input Tensor with the scalar ones
- colossalai.nn.init.uniform_(a=0.0, b=1.0)[source]
Return the initializer filling the input Tensor with values drawn from the uniform distribution \(\mathcal{U}(a, b)\).
- Parameters:
a (float) – the lower bound of the uniform distribution. Defaults 0.0.
b (float) – the upper bound of the uniform distribution. Defaults 1.0.
- colossalai.nn.init.normal_(mean=0.0, std=1.0)[source]
Return the initializer filling the input Tensor with values drawn from the normal distribution
\[\mathcal{N}(\text{mean}, \text{std}^2)\]- Parameters:
mean (float) – the mean of the normal distribution. Defaults 0.0.
std (float) – the standard deviation of the normal distribution. Defaults 1.0.
- colossalai.nn.init.trunc_normal_(mean=0.0, std=1.0, a=-2.0, b=2.0)[source]
Return the initializer filling the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\) redrawn until they are within the bounds. The method used for generating the random values works best when \(a \leq \text{mean} \leq b\).
- Parameters:
mean (float) – the mean of the normal distribution. Defaults 0.0.
std (float) – the standard deviation of the normal distribution. Defaults 1.0.
a (float) – the minimum cutoff value. Defaults -2.0.
b (float) – the maximum cutoff value. Defaults 2.0.
- colossalai.nn.init.kaiming_uniform_(a=0, mode='fan_in', nonlinearity='leaky_relu')[source]
Return the initializer filling the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from \(\mathcal{U}(-\text{bound}, \text{bound})\) where
\[\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan_mode}}}\]Also known as ‘He initialization’.
- Parameters:
a (int) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu'
).mode (str, optional) – either
'fan_in'
(default) or'fan_out'
. Choosing'fan_in'
preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'
preserves the magnitudes in the backwards pass.nonlinearity (str, optional) – the non-linear function (nn.functional name), recommended to use only with
'relu'
or'leaky_relu'
(default).
- colossalai.nn.init.kaiming_normal_(a=0, mode='fan_in', nonlinearity='leaky_relu')[source]
Return the initializer filling the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from \(\mathcal{N}(0, \text{std}^2)\) where
\[\text{std} = \frac{\text{gain}}{\sqrt{\text{fan_mode}}}\]Also known as ‘He initialization’.
- Parameters:
a (int) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu'
).mode (str, optional) – either
'fan_in'
(default) or'fan_out'
. Choosing'fan_in'
preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'
preserves the magnitudes in the backwards pass.nonlinearity (str, optional) – the non-linear function (nn.functional name), recommended to use only with
'relu'
or'leaky_relu'
(default).
- colossalai.nn.init.xavier_uniform_(a=1.7320508075688772, scale=2.0, gain=1.0)[source]
Return the initializer filling the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled from \(\mathcal{U}(-a, a)\) where
\[a = \text{gain} \times \sqrt{\frac{6}{\text{fan_in} + \text{fan_out}}}\]Also known as ‘Glorot initialization’.
- Parameters:
a (float, optional) – an optional scaling factor used to calculate uniform bounds from standard deviation. Defaults
math.sqrt(3.)
.scale (float, optional) – an optional scaling factor used to calculate standard deviation. Defaults 2.0.
gain (float, optional) – an optional scaling factor. Defaults 1.0.
- colossalai.nn.init.xavier_normal_(scale=2.0, gain=1.0)[source]
Return the initializer filling the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled from \(\mathcal{N}(0, \text{std}^2)\) where
\[\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}\]Also known as ‘Glorot initialization’.
- Parameters:
scale (float, optional) – an optional scaling factor used to calculate standard deviation. Defaults 2.0.
gain (float, optional) – an optional scaling factor. Defaults 1.0.