Return to Article Details
Sparse Experts Scale Better in Efficient Mixture Architectures for Trillion Parameter Models
Download
Download PDF