Review Articles

Efficient Sparse Mixture-of-Experts Models for Multilingual Low-Resource Machine Translation

Views Downloads

Abstract

Low-resource machine translation (MT) for the world's 7,000+ languages remains a critical NLP challenge. Dense multilingual models sacrifice per-language quality for breadth, while dedicated bilingual models are impractical at scale. We present PolyglotMoE, a sparse Mixture-of-Experts (MoE) Transformer with 64 experts (12B total parameters, 2.1B active per token) that dynamically routes tokens to language-family-specialized experts. Trained on OPUS-100 extended with 420 additional low-resource language pairs mined from web and religious texts, PolyglotMoE achieves +4.7 BLEU over NLLB-200 on 50 lowest-resource directions while matching NLLB on high-resource pairs. Expert utilization analysis reveals emergent linguistic clustering that aligns with typological language families.

Author Biographies

  • Angela Fan Meta AI Research, Paris 75002, France
    Angela Fan is an associate professor at Meta AI Research, Paris 75002, France. Their research focuses on environmental engineering, with over 29 publications in peer-reviewed journals.
  • Holger Schwenk Meta AI Research, Paris 75002, France
    Holger Schwenk is an assistant professor at Meta AI Research, Paris 75002, France. Their research focuses on data analytics, with over 25 publications in peer-reviewed journals.
  • Daxin Jiang MSRA NLC Group, Microsoft Research Asia, Beijing 100080, China
    Daxin Jiang is a research fellow at MSRA NLC Group, Microsoft Research Asia, Beijing 100080, China. Their research focuses on social sciences, with over 79 publications in peer-reviewed journals.