综述文章
Efficient Sparse Mixture-of-Experts Models for Multilingual Low-Resource Machine Translation
— Views
— Downloads
摘要
Low-resource machine translation (MT) for the world's 7,000+ languages remains a critical NLP challenge. Dense multilingual models sacrifice per-language quality for breadth, while dedicated bilingual models are impractical at scale. We present PolyglotMoE, a sparse Mixture-of-Experts (MoE) Transformer with 64 experts (12B total parameters, 2.1B active per token) that dynamically routes tokens to language-family-specialized experts. Trained on OPUS-100 extended with 420 additional low-resource language pairs mined from web and religious texts, PolyglotMoE achieves +4.7 BLEU over NLLB-200 on 50 lowest-resource directions while matching NLLB on high-resource pairs. Expert utilization analysis reveals emergent linguistic clustering that aligns with typological language families.
作者简介
-
Angela Fan Meta AI Research, Paris 75002, FranceAngela Fan is an associate professor at Meta AI Research, Paris 75002, France. Their research focuses on environmental engineering, with over 29 publications in peer-reviewed journals.
-
Holger Schwenk Meta AI Research, Paris 75002, FranceHolger Schwenk is an assistant professor at Meta AI Research, Paris 75002, France. Their research focuses on data analytics, with over 25 publications in peer-reviewed journals.
-
Daxin Jiang MSRA NLC Group, Microsoft Research Asia, Beijing 100080, ChinaDaxin Jiang is a research fellow at MSRA NLC Group, Microsoft Research Asia, Beijing 100080, China. Their research focuses on social sciences, with over 79 publications in peer-reviewed journals.
如何引用
Efficient Sparse Mixture-of-Experts Models for Multilingual Low-Resource Machine Translation. (2026). 人工智能与数据科学前沿, 1(3). https://doi.org/10.55001/faids.v1i3.82
作者
同期文章
- Vision-Language Foundation Models for Zero-Shot Autonomous Driving Scene Understanding and Risk Assessment 2026-05-25
- Graph Neural Networks with Heterogeneous Message Passing for Multi-Scale Drug-Drug Interaction Prediction 2026-05-20
- Causal Transformer Networks for Counterfactual Reasoning in Large-Scale Recommendation Systems 2026-05-18