研究论文
Hardware-Aware Neural Architecture Search with Latency Constraints for Edge AI Deployment
摘要
Deploying deep learning models on edge devices requires architectures that balance accuracy with strict latency, memory, and power constraints — a combinatorial design space that manual engineering cannot efficiently explore. We present HA-NAS, a hardware-aware neural architecture search framework that co-optimizes model topology and quantization policy under device-specific latency budgets. HA-NAS employs a pre-trained accuracy predictor and a differentiable latency estimator calibrated on target hardware (ARM Cortex-A78, NVIDIA Jetson Orin, Intel Movidius VPU). Across ImageNet classification and COCO detection tasks, HA-NAS discovers architectures achieving 79.8% top-1 accuracy at 12ms latency on Jetson Orin — matching MobileNetV3-Large accuracy at 3.2× lower latency. On ARM Cortex-A78 microcontrollers, HA-NAS finds models with 71.2% accuracy running at 8ms with only 1.8MB memory footprint, enabling on-device inference for wearable health monitors.