abstract class MXNet::Optimizer

MXNet::Optimizer
Reference
Object

Overview

The base class inherited by all optimizers.

Custom optimizers can be created by subclassing Optimizer and implementing the required function #update. By default, the created optimizer will be registered under its simplified class name (class.name.split("::").last.downcase) but it may also be registered under another name by calling #register.

class MyOptimizer < MXNet::Optimizer
  register :myopt
  def update(index, weight, gradient, state)
    weight
  end
end

Direct Known Subclasses

MXNet::Optimizer::SGD

Defined in:

mxnet/optimizer.cr

Constructors

.new(rescale_grad = 1.0, clip_gradient = -1.0, lr = 0.01, wd = 0.0)
Creates a new instance.

Class Method Summary

.create(optimizer, **kwargs)

Instance Method Summary

#create_state(index, weight)
Creates auxiliary state for a given weight.
#lr : Float64
#rescale_grad : Float64
#rescale_grad=(rescale_grad)
#set_lr_mult(lr_mult)
Sets an individual learning rate multiplier for each parameter.
#set_wd_mult(wd_mult)
Sets an individual weight decay multiplier for each parameter.
#update(index, weight, gradient, state)
Updates the given parameter using the corresponding gradient and state.
#wd : Float64

Constructor Detail

def self.new(rescale_grad = 1.0, clip_gradient = -1.0, lr = 0.01, wd = 0.0) #

Creates a new instance.

Parameters

rescale_grad (Float, optional) Before updating, multiply the gradient by rescale_grad. Often chosen to be 1.0 / batch_size.
clip_gradient (Float, optional) Clip the gradient by projecting onto the box [-clip_gradient, clip_gradient].
lr (Float, optional) The initial learning rate.
wd (Float, optional) The weight decay (or L2 regularization) coefficient. Modifies the objective by adding a penalty for having large weights.

[View source]

Class Method Detail

def self.create(optimizer, **kwargs) #

[View source]

Instance Method Detail

def create_state(index, weight) #

Creates auxiliary state for a given weight.

Some optimizers require additional states (e.g. momentum) in addition to gradients in order to update weights. This function creates state for a given weight which will be used in update. This function is called only once for each weight.

Parameters

index (Int) A unique index to identify the weight.
weight (NDArray) The weight.

[View source]

def lr : Float64 #

[View source]

def rescale_grad : Float64 #

[View source]

def rescale_grad=(rescale_grad) #

[View source]

def set_lr_mult(lr_mult) #

Sets an individual learning rate multiplier for each parameter.

If you specify a learning rate multiplier for a parameter, then the learning rate for that parameter will be set as the product of the global learning rate and its multiplier.

Parameters

lr_mult (Hash(Int, Float)) For each of the entries, the learning rate multipler for the parameter specified will be set as the given value.

[View source]

def set_wd_mult(wd_mult) #

Sets an individual weight decay multiplier for each parameter.

If you specify a weight decay multiplier for a parameter, then the weight decay for that parameter will be set as the product of the global weight decay and its multiplier.

Parameters

wd_mult (Hash(Int, Float)) For each of the entries, the weight decay multipler for the parameter specified will be set as the given value.

[View source]

abstract def update(index, weight, gradient, state) #

Updates the given parameter using the corresponding gradient and state.

Parameters

index (Int) The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via #set_lr_mult and #set_wd_mult, respectively.
weight (NDArray) The parameter to be updated.
gradient (NDArray) The gradient of the objective with respect to this parameter.
state (any) The state returned by #create_state.

[View source]

def wd : Float64 #

[View source]