Less is More: Undertraining Experts Improves Model Upcycling

Publication
arXiv
Date
Links