colossalai.nn.init

colossalai.nn.init.zeros_()[source]

Return the initializer filling the input Tensor with the scalar zeros

colossalai.nn.init.ones_()[source]

Return the initializer filling the input Tensor with the scalar ones

colossalai.nn.init.uniform_(a=0.0, b=1.0)[source]

Return the initializer filling the input Tensor with values drawn from the uniform distribution $$\mathcal{U}(a, b)$$.

Parameters:
• a (float) – the lower bound of the uniform distribution. Defaults 0.0.

• b (float) – the upper bound of the uniform distribution. Defaults 1.0.

colossalai.nn.init.normal_(mean=0.0, std=1.0)[source]

Return the initializer filling the input Tensor with values drawn from the normal distribution

$\mathcal{N}(\text{mean}, \text{std}^2)$
Parameters:
• mean (float) – the mean of the normal distribution. Defaults 0.0.

• std (float) – the standard deviation of the normal distribution. Defaults 1.0.

colossalai.nn.init.trunc_normal_(mean=0.0, std=1.0, a=-2.0, b=2.0)[source]

Return the initializer filling the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution $$\mathcal{N}(\text{mean}, \text{std}^2)$$ with values outside $$[a, b]$$ redrawn until they are within the bounds. The method used for generating the random values works best when $$a \leq \text{mean} \leq b$$.

Parameters:
• mean (float) – the mean of the normal distribution. Defaults 0.0.

• std (float) – the standard deviation of the normal distribution. Defaults 1.0.

• a (float) – the minimum cutoff value. Defaults -2.0.

• b (float) – the maximum cutoff value. Defaults 2.0.

colossalai.nn.init.kaiming_uniform_(a=0, mode='fan_in', nonlinearity='leaky_relu')[source]

Return the initializer filling the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from $$\mathcal{U}(-\text{bound}, \text{bound})$$ where

$\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan_mode}}}$

Also known as ‘He initialization’.

Parameters:
• a (int) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu').

• mode (str, optional) – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

• nonlinearity (str, optional) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).

colossalai.nn.init.kaiming_normal_(a=0, mode='fan_in', nonlinearity='leaky_relu')[source]

Return the initializer filling the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from $$\mathcal{N}(0, \text{std}^2)$$ where

$\text{std} = \frac{\text{gain}}{\sqrt{\text{fan_mode}}}$

Also known as ‘He initialization’.

Parameters:
• a (int) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu').

• mode (str, optional) – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

• nonlinearity (str, optional) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).

colossalai.nn.init.xavier_uniform_(a=1.7320508075688772, scale=2.0, gain=1.0)[source]

Return the initializer filling the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled from $$\mathcal{U}(-a, a)$$ where

$a = \text{gain} \times \sqrt{\frac{6}{\text{fan_in} + \text{fan_out}}}$

Also known as ‘Glorot initialization’.

Parameters:
• a (float, optional) – an optional scaling factor used to calculate uniform bounds from standard deviation. Defaults math.sqrt(3.).

• scale (float, optional) – an optional scaling factor used to calculate standard deviation. Defaults 2.0.

• gain (float, optional) – an optional scaling factor. Defaults 1.0.

colossalai.nn.init.xavier_normal_(scale=2.0, gain=1.0)[source]

Return the initializer filling the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled from $$\mathcal{N}(0, \text{std}^2)$$ where

$\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}$

Also known as ‘Glorot initialization’.

Parameters:
• scale (float, optional) – an optional scaling factor used to calculate standard deviation. Defaults 2.0.

• gain (float, optional) – an optional scaling factor. Defaults 1.0.