Return to Article Details Sparse Experts Scale Better in Efficient Mixture Architectures for Trillion Parameter Models Download Download PDF