Optimization overview in Machine Learning


A branch of mathematics which encompasses many diverse areas of minimization and optimization. Optimization theory is the more modern term for operations research. Optimization theory includes the calculus of variations, control theory, convex optimization theory, decision theory, game theory, linear programming, Markov chains, network analysis, optimization theory, queuing systems, etc.


[1]. http://mathworld.wolfram.com/OptimizationTheory.html

[2]. http://www.convexoptimization.com/

[3]. http://wireless.egr.uh.edu/Optimization/index.htm

[4]. https://ocw.mit.edu/courses/sloan-school-of-management/15-093j-optimization-methods-fall-2009/index.htm

Optimization in Deep learning



Common Methods

The objective function (loss function) and transformation (activation function) are usually non-linear in learning problems. Thus, the closed-form solutions are commonly not available in practical learning problems. In addtion, it is not easy to discuss the monotonicity of the loss functions.

Actually, searching methods are more feasible to solve the learning problems. Here, we conclude some widely adopted optimization-solving methods.

  1. Stochasitc Gradient descent
  2. Adagrad
  3. Adadelta
  4. RMSprop
  5. Momentum
  6. Adam
  7. Adamax
  8. Nesterov
  9. Nadam


[1]. https://zhuanlan.zhihu.com/p/22252270

[2]. http://blog.csdn.net/muyu709287760/article/details/62531509

[3]. http://blog.csdn.net/shenxiaoming77/article/details/41444269

More Discussion on SGD

  1. Shuffling and Curriculum Learning
  2. Batch Normalization
  3. Early stopping
  4. Gradient Noise

Optimization Method Selection in DL

Table of Contents