论文阅读《Understanding and Improving One-Shot NAS》

Posted by tianchen on February 1, 2020

Understanding and Improving One-Shot NAS

  • 场景是One-shot, one-shot节省资源,但是ranking correlation差
  • 这篇文章认为问题在于one-shot training和full training的差异性
  • 分析了Gap的存在原因NAO-V2来减少这个gap
    • 增加每个独立架构的平均update次数
    • 对于更加complex的结构增加update次数
    • Make the one-shot training of the supernet inde- pendent at each iteration
  • One-shot NAS
    • hyper-network trained once
    • Sample Sub-Net and share weight
      • ENAS - Weight Sharing Search by RL
      • NAO
      • DARTS - search thorugh supernet’s gradient

insufficient optimization of individual architecture

  • Supernet train 100-200 epoch
    • each step a different arch is trained
  • Empirical 更容易采集出Small and slim的结构(一般包含很多non-parameter的)
    • 作者认为是small的网络在小在小的step的时候比较容易训上来,因此更容易被采样到
  • NAS的架构update是从中间截断的
    • 会导致momentum和lr decay dependent of previous layer
    • 也会导致最后的结果下降

Imbalanced Training

Training Independent

Expeiments

  • 为啥没有相关性的实验结果?