Scaling Laws for Neural Language Models简读

# Scaling Laws for Neural Language Models


Throughout we will observe precise power-law scalings for performance as a function of training time, context length, dataset size, model size, and compute budget.

Natural Lanuage Models Power-law Relationship

该文主要建模了模型性能与非embedding参数 N,数据集大小 D 与计算量 C之间的关系。最主要的发现:

  • 性能主要与模型大小相关,而与模型结构弱相关
  • 性能与上面三个因素有比较贴合的power-law关系



Our results strongly suggest that larger models will continue to perform better, and will also be much more sample efficient than has been previously appreciated. Big models may be more important than big data.