The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

Publication
International Conference on Learning Representations (ICLR), Poster
Date