Understanding and Improving One-Shot NAS
- 场景是One-shot, one-shot节省资源,但是ranking correlation差
- 这篇文章认为问题在于one-shot training和full training的差异性
- 分析了Gap的存在原因NAO-V2来减少这个gap
- 增加每个独立架构的平均update次数
- 对于更加complex的结构增加update次数
- Make the one-shot training of the supernet inde- pendent at each iteration
Related Fields
- One-shot NAS
- hyper-network trained once
- Sample Sub-Net and share weight
- ENAS - Weight Sharing Search by RL
- NAO
- DARTS - search thorugh supernet’s gradient
insufficient optimization of individual architecture
- Supernet train 100-200 epoch
- each step a different arch is trained
- Empirical 更容易采集出Small and slim的结构(一般包含很多non-parameter的)
- 作者认为是small的网络在小在小的step的时候比较容易训上来,因此更容易被采样到
- NAS的架构update是从中间截断的
- 会导致momentum和lr decay dependent of previous layer
- 也会导致最后的结果下降
Imbalanced Training
Training Independent
Expeiments
- 为啥没有相关性的实验结果?