abstract class MXNet::Optimizer

Overview

The base class inherited by all optimizers.

Custom optimizers can be created by subclassing Optimizer and implementing the required function #update. By default, the created optimizer will be registered under its simplified class name (class.name.split("::").last.downcase) but it may also be registered under another name by calling #register.

class MyOptimizer < MXNet::Optimizer
  register :myopt
  def update(index, weight, gradient, state)
    weight
  end
end

Direct Known Subclasses

Defined in:

mxnet/optimizer.cr

Constructors

Class Method Summary

Instance Method Summary

Constructor Detail

def self.new(rescale_grad = 1.0, clip_gradient = -1.0, lr = 0.01, wd = 0.0) #

Creates a new instance.

Parameters

  • rescale_grad (Float, optional) Before updating, multiply the gradient by rescale_grad. Often chosen to be 1.0 / batch_size.
  • clip_gradient (Float, optional) Clip the gradient by projecting onto the box [-clip_gradient, clip_gradient].
  • lr (Float, optional) The initial learning rate.
  • wd (Float, optional) The weight decay (or L2 regularization) coefficient. Modifies the objective by adding a penalty for having large weights.

[View source]

Class Method Detail

def self.create(optimizer, **kwargs) #

[View source]

Instance Method Detail

def create_state(index, weight) #

Creates auxiliary state for a given weight.

Some optimizers require additional states (e.g. momentum) in addition to gradients in order to update weights. This function creates state for a given weight which will be used in update. This function is called only once for each weight.

Parameters

  • index (Int) A unique index to identify the weight.
  • weight (NDArray) The weight.

[View source]
def lr : Float64 #

[View source]
def rescale_grad : Float64 #

[View source]
def rescale_grad=(rescale_grad) #

[View source]
def set_lr_mult(lr_mult) #

Sets an individual learning rate multiplier for each parameter.

If you specify a learning rate multiplier for a parameter, then the learning rate for that parameter will be set as the product of the global learning rate and its multiplier.

Parameters

  • lr_mult (Hash(Int, Float)) For each of the entries, the learning rate multipler for the parameter specified will be set as the given value.

[View source]
def set_wd_mult(wd_mult) #

Sets an individual weight decay multiplier for each parameter.

If you specify a weight decay multiplier for a parameter, then the weight decay for that parameter will be set as the product of the global weight decay and its multiplier.

Parameters

  • wd_mult (Hash(Int, Float)) For each of the entries, the weight decay multipler for the parameter specified will be set as the given value.

[View source]
abstract def update(index, weight, gradient, state) #

Updates the given parameter using the corresponding gradient and state.

Parameters

  • index (Int) The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via #set_lr_mult and #set_wd_mult, respectively.
  • weight (NDArray) The parameter to be updated.
  • gradient (NDArray) The gradient of the objective with respect to this parameter.
  • state (any) The state returned by #create_state.

[View source]
def wd : Float64 #

[View source]