Overtraining Harms AI: Less Pre-Training Boosts Model Performance

Researchers from leading US universities, including Carnegie Mellon, Stanford, Harvard, and Princeton, have discovered that excessive pre-training of large language AI models can lead to diminished performance rather than improvements. This phenomenon, termed “catastrophic overtraining,” challenges the prevailing belief that more pre-training data invariably enhances AI performance.

Their study revealed that prolonged pre-training increases a model’s sensitivity to minor changes, similar to the butterfly effect, causing progressive sensitivity and performance degradation. This was demonstrated by comparing two versions of the OLMo-1B model, trained on 2.3 trillion and 3 trillion tokens respectively. The more extensively trained model showed up to 3% poorer performance on benchmarks like AlpacaEval and ARC.

Researchers identified an “inflection point,” typically beyond 2.5 trillion tokens in smaller models, where additional training induces internal instability and outweighs training benefits. By injecting Gaussian noise into pre-trained models, they observed a sharper performance decline the longer the model was trained.

They caution that “catastrophic overtraining” might be unavoidable, especially when pre-training tasks do not align with fine-tuning tasks. Rather than stopping pre-training altogether, developers are urged to reconsider the extent of pre-training necessary.

The conclusion calls for focusing on model scaling that fully considers the entire training pipeline. The main takeaway is that in AI development, sometimes less pre-training can lead to better results. This reevaluation aims to balance performance enhancement with the avoidance of detrimental overtraining effects.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later