Paper Stack 文章堆栈
由于文章stack实在过于混乱，更新且将该post改写为笔记的stack，文章的stack将在WhatIveRead的papers当中的stack.md进行更新
20200317 MCMC
 撒豆子算面积的实验
20200320 DeConv
20200324 Gumble Softmax
 从VAE(连续分布采样的求导)，加入一个Gumble噪声(包含了参数的信息)，首先从Gaussian中采样，带入值
 对于离散分布，加一个Argmax>Softmax
20200405
20210103 NES
 Keys
a family of blackbox optimization algorithms use natural grad to update the parameterized distribution in the direction of higher expected fitness

Source  Jurgen

Motivation

Methodoly
Iteratively update the searched distribution by using estimated grad on its distribution param(e.g. \miu & \sigma of gaussian), iter stops while the criterion is met.
 Search Gradient method
what is search space? what is fitness?
however, to locate a quadratic optimum, should be at least quatratic, since 1order methods will be unstable
the update is not scaleinvariant
this does not occur in general gradbased case, since here the grad controls both the position and the variance of distribution over the same search space dimension
this problem is solved with the natural gradient
 Natural gradient
how to make it a constraint optimization
natural gradient is proposed in the ml field to help mitigate the slow convergence in plataeu landscape / ridges(山岭，山脊) the plain grad \delta{J} represent the steepest ascent(in the space of the actual param \theta) when the learningstep \epsilon is small, the problem could be reformed as finding a new distribution with param chosen from the hypersphere of radius \epsilon and center \theta that maximize the plain grad \delta{J}. so the Enculidean distance is used for measuring the distance between the distribution, so the update is dependent of the parameterization of the distribution. the key of natural is to remove this dependence, finding a natural distance(e.g. KL distance), reduce it to the constraint optimizaiton **use the natural grad instead of the steepest grad for optimization*
 Fitness shaping  make the distribution invariante to (arbitary but orderpreserving transformation)
what is order? simply make the ranking of the fiteness function of the population no change?
not so crucial.
 Adaptive sampling  adjust the lr online
metalearning based(sample new \theta, if quality significantly better, continue with \theta_hat), apply hillclimbing and the MannWitney UTest
 rotationinvariant distribution
localnatural coordinate sampling from radical distribution

Experiment

Ideas
multivariant gaussian, fisher information matrix
 closely related to CMAES(Convariance Matrix Adaptation Evolutional Strategy)