colossalai.nn.init

colossalai.nn.init.zeros_()[source]

Return the initializer filling the input Tensor with the scalar zeros

colossalai.nn.init.ones_()[source]

Return the initializer filling the input Tensor with the scalar ones

colossalai.nn.init.uniform_(a=0.0, b=1.0)[source]

Return the initializer filling the input Tensor with values drawn from the uniform distribution \(\mathcal{U}(a, b)\).

Parameters:
  • a (float) – the lower bound of the uniform distribution. Defaults 0.0.

  • b (float) – the upper bound of the uniform distribution. Defaults 1.0.

colossalai.nn.init.normal_(mean=0.0, std=1.0)[source]

Return the initializer filling the input Tensor with values drawn from the normal distribution

\[\mathcal{N}(\text{mean}, \text{std}^2)\]
Parameters:
  • mean (float) – the mean of the normal distribution. Defaults 0.0.

  • std (float) – the standard deviation of the normal distribution. Defaults 1.0.

colossalai.nn.init.trunc_normal_(mean=0.0, std=1.0, a=-2.0, b=2.0)[source]

Return the initializer filling the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\) redrawn until they are within the bounds. The method used for generating the random values works best when \(a \leq \text{mean} \leq b\).

Parameters:
  • mean (float) – the mean of the normal distribution. Defaults 0.0.

  • std (float) – the standard deviation of the normal distribution. Defaults 1.0.

  • a (float) – the minimum cutoff value. Defaults -2.0.

  • b (float) – the maximum cutoff value. Defaults 2.0.

colossalai.nn.init.kaiming_uniform_(a=0, mode='fan_in', nonlinearity='leaky_relu')[source]

Return the initializer filling the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from \(\mathcal{U}(-\text{bound}, \text{bound})\) where

\[\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan_mode}}}\]

Also known as ‘He initialization’.

Parameters:
  • a (int) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu').

  • mode (str, optional) – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

  • nonlinearity (str, optional) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).

colossalai.nn.init.kaiming_normal_(a=0, mode='fan_in', nonlinearity='leaky_relu')[source]

Return the initializer filling the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from \(\mathcal{N}(0, \text{std}^2)\) where

\[\text{std} = \frac{\text{gain}}{\sqrt{\text{fan_mode}}}\]

Also known as ‘He initialization’.

Parameters:
  • a (int) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu').

  • mode (str, optional) – either 'fan_in' (default) or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass.

  • nonlinearity (str, optional) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' (default).

colossalai.nn.init.xavier_uniform_(a=1.7320508075688772, scale=2.0, gain=1.0)[source]

Return the initializer filling the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled from \(\mathcal{U}(-a, a)\) where

\[a = \text{gain} \times \sqrt{\frac{6}{\text{fan_in} + \text{fan_out}}}\]

Also known as ‘Glorot initialization’.

Parameters:
  • a (float, optional) – an optional scaling factor used to calculate uniform bounds from standard deviation. Defaults math.sqrt(3.).

  • scale (float, optional) – an optional scaling factor used to calculate standard deviation. Defaults 2.0.

  • gain (float, optional) – an optional scaling factor. Defaults 1.0.

colossalai.nn.init.xavier_normal_(scale=2.0, gain=1.0)[source]

Return the initializer filling the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled from \(\mathcal{N}(0, \text{std}^2)\) where

\[\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan_in} + \text{fan_out}}}\]

Also known as ‘Glorot initialization’.

Parameters:
  • scale (float, optional) – an optional scaling factor used to calculate standard deviation. Defaults 2.0.

  • gain (float, optional) – an optional scaling factor. Defaults 1.0.