ML Scientist

Connecting Scholars with the Latest Academic News and Career Paths

Featured

Unraveling the Knowledge Capacity of Language Models: A New Era of AI

A groundbreaking study, “Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws,” uncovers the impressive knowledge storage capabilities of large language models. Researchers from Mohamed bin Zayed University of AI and Meta/FAIR Labs reveal that a 7B parameter model can store a staggering 14B bits of knowledge, surpassing the combined information in the English Wikipedia and textbooks!

In this new research, several intriguing insights have emerged:

  • GPT-2 architecture with rotary embedding demonstrates competitive knowledge storage capabilities compared to more recent architectures like LLaMA and Mistral, particularly over shorter training durations. This finding indicates that architectural innovations might have reached a point of diminishing returns, and training compute now drives capability improvements.
  • Prepending training data with domain names, such as wikipedia.org, allows language models to identify and prioritize high-quality knowledge domains autonomously. This simple yet effective technique significantly improves knowledge storage efficiency, especially when dealing with noisy or junk pretraining data.
  • Knowledge is stored in a compressed format across all layers of the model, not just in specific “knowledge layers.” Quantizing models to int8 preserves this knowledge, while quantizing to int4 degrades it, implying that knowledge is stored at the bit-level granularity. Understanding these mechanisms could lead to more efficient model architectures and training techniques.
  • Mixture-of-experts models maintain knowledge storage capacity while offering inference speedups, challenging the notion that sparse models necessarily trade off quality for efficiency. This finding suggests that sparsity could be the key to scaling models to larger sizes without hitting diminishing returns.

This research marks a significant milestone in AI systems with vast knowledge storage capabilities. As we continue pushing the boundaries of model scale and efficiency, these insights will undoubtedly shape the development of the next generation of language models.

Leave a Reply

Your email address will not be published. Required fields are marked *