Hardware-Aware Neural Architecture Search with Latency Constraints for Edge AI Deployment

Michael Stein; Raj Krishnamurthy; Lin Huang

doi:10.55001/faids.v1i2.56

摘要

Deploying deep learning models on edge devices requires architectures that balance accuracy with strict latency, memory, and power constraints — a combinatorial design space that manual engineering cannot efficiently explore. We present HA-NAS, a hardware-aware neural architecture search framework that co-optimizes model topology and quantization policy under device-specific latency budgets. HA-NAS employs a pre-trained accuracy predictor and a differentiable latency estimator calibrated on target hardware (ARM Cortex-A78, NVIDIA Jetson Orin, Intel Movidius VPU). Across ImageNet classification and COCO detection tasks, HA-NAS discovers architectures achieving 79.8% top-1 accuracy at 12ms latency on Jetson Orin — matching MobileNetV3-Large accuracy at 3.2× lower latency. On ARM Cortex-A78 microcontrollers, HA-NAS finds models with 71.2% accuracy running at 8ms with only 1.8MB memory footprint, enabling on-device inference for wearable health monitors.

作者简介

Michael Stein Department of Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland

Michael Stein is a senior researcher at Department of Electrical Engineering, ETH Zürich, 8092 Zürich, Switzerland. Their research focuses on advanced materials, with over 50 publications in peer-reviewed journals.
Raj Krishnamurthy Qualcomm AI Research, San Diego, CA 92121, USA

Raj Krishnamurthy is a research fellow at Qualcomm AI Research, San Diego, CA 92121, USA. Their research focuses on environmental engineering, with over 68 publications in peer-reviewed journals.
Lin Huang School of Microelectronics, Fudan University, Shanghai 200433, China

Lin Huang is an associate professor at School of Microelectronics, Fudan University, Shanghai 200433, China. Their research focuses on energy systems, with over 58 publications in peer-reviewed journals.