Detection probability is the metric most vision AI vendors optimize for and most frequently cite. "99.2% detection accuracy on benchmark X" is a common marketing claim. But detection probability alone tells you almost nothing about operational utility.
In real-world deployment, the metric that determines whether operators trust and use a vision system is not detection rate — it is false alarm rate. Specifically, the ratio of true positives to false positives under operational conditions.
The False Alarm Problem
Operator Fatigue — When a system generates hundreds of false alarms per shift, operators develop alarm blindness. They stop investigating alerts because experience tells them most are false. When a genuine threat occurs, it is dismissed along with the noise. This is not a technology problem — it is a human factors engineering failure.
Resource Drain — Every false alarm that requires investigation consumes operator time, communication bandwidth, and response resources. In high-security environments, false alarm response may require physical deployment of personnel. The cumulative operational cost of false alarms frequently exceeds the cost of the system itself.
Trust Erosion — Once operators lose confidence in a detection system, reestablishing trust is extraordinarily difficult. The perception that the system "cries wolf" persists long after technical improvements are made.
Engineering for False Alarm Discipline
False alarm suppression is not a post-processing step — it must be engineered into the system architecture. This means multi-stage detection pipelines with confirmation logic, temporal consistency requirements before alert generation, contextual filtering that accounts for known environmental stimuli, and operator feedback integration for continuous model refinement.
The Right Metric
The operationally relevant metric is not detection probability in isolation. It is the Receiver Operating Characteristic — the relationship between detection probability and false alarm rate across the system's operating threshold. Clients should demand ROC curves validated under realistic field conditions, not benchmark datasets.
A system that detects 95% of threats with a 0.1% false alarm rate is operationally superior to a system that detects 99% of threats with a 5% false alarm rate. Engineering this discipline requires sensor fusion, model validation under environmental stress, and relentless attention to the human-system interface.
