研究论文
Vision-Language Foundation Models for Zero-Shot Autonomous Driving Scene Understanding and Risk Assessment
— Views
— Downloads
摘要
Autonomous driving perception systems trained on fixed taxonomies fail when encountering novel objects or unusual scenarios not represented in their training data — the "long tail" problem. We present DriveLM, a vision-language model (VLM) fine-tuned on 2.8 million driving scene-narration pairs that performs zero-shot risk assessment by generating natural language scene descriptions and structured risk scores. On the nuScenes-QA benchmark, DriveLM achieves 78.4% accuracy on novel-object-related questions (vs. 31.2% for CLIP-based baselines) and a 0.91 Spearman correlation with human risk ratings. In closed-loop CARLA simulation, DriveLM-guided planning reduces collision rate by 42% in rare-event scenarios compared to end-to-end learned planners.
作者简介
-
Yue Wang Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USAYue Wang is an associate professor at Department of Computer Science, University of Southern California, Los Angeles, CA 90089, USA. Their research focuses on computational science, with over 17 publications in peer-reviewed journals.
-
Hang Zhao Institute for AI Industry Research, Tsinghua University, Beijing 100084, ChinaHang Zhao is a professor at Institute for AI Industry Research, Tsinghua University, Beijing 100084, China. Their research focuses on energy systems, with over 64 publications in peer-reviewed journals.
-
Laura Leal-Taixé Department of Informatics, Technical University of Munich, 85748 Garching, GermanyLaura Leal-Taixé is a professor at Department of Informatics, Technical University of Munich, 85748 Garching, Germany. Their research focuses on machine learning, with over 22 publications in peer-reviewed journals.
如何引用
Vision-Language Foundation Models for Zero-Shot Autonomous Driving Scene Understanding and Risk Assessment. (2026). 人工智能与数据科学前沿, 1(3). https://doi.org/10.55001/faids.v1i3.78
作者
同期文章
- Graph Neural Networks with Heterogeneous Message Passing for Multi-Scale Drug-Drug Interaction Prediction 2026-05-20
- Causal Transformer Networks for Counterfactual Reasoning in Large-Scale Recommendation Systems 2026-05-18
- Efficient Sparse Mixture-of-Experts Models for Multilingual Low-Resource Machine Translation 2026-05-14