abstract class MXNet::Optimizer
- MXNet::Optimizer
- Reference
- Object
Overview
The base class inherited by all optimizers.
Custom optimizers can be created by subclassing Optimizer
and
implementing the required function #update
. By default, the
created optimizer will be registered under its simplified class
name (class.name.split("::").last.downcase
) but it may also be
registered under another name by calling #register
.
class MyOptimizer < MXNet::Optimizer
register :myopt
def update(index, weight, gradient, state)
weight
end
end
Direct Known Subclasses
Defined in:
mxnet/optimizer.crConstructors
-
.new(rescale_grad = 1.0, clip_gradient = -1.0, lr = 0.01, wd = 0.0)
Creates a new instance.
Class Method Summary
Instance Method Summary
-
#create_state(index, weight)
Creates auxiliary state for a given weight.
- #lr : Float64
- #rescale_grad : Float64
- #rescale_grad=(rescale_grad)
-
#set_lr_mult(lr_mult)
Sets an individual learning rate multiplier for each parameter.
-
#set_wd_mult(wd_mult)
Sets an individual weight decay multiplier for each parameter.
-
#update(index, weight, gradient, state)
Updates the given parameter using the corresponding gradient and state.
- #wd : Float64
Constructor Detail
Creates a new instance.
Parameters
- rescale_grad (
Float
, optional) Before updating, multiply the gradient by rescale_grad. Often chosen to be 1.0 / batch_size. - clip_gradient (
Float
, optional) Clip the gradient by projecting onto the box[-clip_gradient, clip_gradient]
. - lr (
Float
, optional) The initial learning rate. - wd (
Float
, optional) The weight decay (or L2 regularization) coefficient. Modifies the objective by adding a penalty for having large weights.
Class Method Detail
Instance Method Detail
Creates auxiliary state for a given weight.
Some optimizers require additional states (e.g. momentum) in addition to gradients in order to update weights. This function creates state for a given weight which will be used in update. This function is called only once for each weight.
Parameters
- index (
Int
) A unique index to identify the weight. - weight (
NDArray
) The weight.
Sets an individual learning rate multiplier for each parameter.
If you specify a learning rate multiplier for a parameter, then the learning rate for that parameter will be set as the product of the global learning rate and its multiplier.
Parameters
- lr_mult (
Hash(Int, Float)
) For each of the entries, the learning rate multipler for the parameter specified will be set as the given value.
Sets an individual weight decay multiplier for each parameter.
If you specify a weight decay multiplier for a parameter, then the weight decay for that parameter will be set as the product of the global weight decay and its multiplier.
Parameters
- wd_mult (
Hash(Int, Float)
) For each of the entries, the weight decay multipler for the parameter specified will be set as the given value.
Updates the given parameter using the corresponding gradient and state.
Parameters
- index (
Int
) The unique index of the parameter into the individual learning rates and weight decays. Learning rates and weight decay may be set via#set_lr_mult
and#set_wd_mult
, respectively. - weight (
NDArray
) The parameter to be updated. - gradient (
NDArray
) The gradient of the objective with respect to this parameter. - state (any)
The state returned by
#create_state
.