[1]

N. Petrov and S. Andersson, “Sparse Experts Scale Better in Efficient Mixture Architectures for Trillion Parameter Models”, CPL, vol. 14, no. 2, pp. 16–22, May 2026, doi: 10.54097/baczzj49.