1.
Petrov N, Andersson S. Sparse Experts Scale Better in Efficient Mixture Architectures for Trillion Parameter Models. CPL. 2026;14(2):16-22. doi:10.54097/baczzj49