A Transformer-Enhanced Human-Centric Unsupervised Framework for Multi-Person Video Anomaly Detection
DOI:
https://doi.org/10.29304/jqcsm.2026.18.22700Keywords:
Anomaly detection, Skeleton-based analysis, HuVAD, Video surveillanceAbstract
Video anomaly detection in surveillance environments is still difficult. This is because abnormal events do not happen often, take different forms, and depend on the complex nature of real scenes. In addition, methods that depend on visual appearance are affected by changes in lighting, camera angles, and background conditions. These issues can reduce detection accuracy and also cause privacy problems. For this reason, recent studies focus more on motion-based representations that describe human behavior and reduce the effect of unnecessary visual details.
In this work, a framework for video anomaly detection is proposed. Spatial motion features are extracted using a 2D convolutional neural network. These features are then passed to a GRU network to model motion over time. A Transformer module is also used to help capture longer temporal relationships in motion sequences.
The proposed framework is able to handle scenes that include more than one person. During training and testing, all detected persons are considered within fixed-length temporal windows. Information from each person is then combined to produce an anomaly score that represents the overall scene behavior. This helps the model detect abnormal activities even in crowded surveillance scenes.
The proposed model uses an unsupervised one class learning approach. Training is performed using normal motion data only, Abnormal events are identified by observing deviations from the learned patterns of normal behavior. The experiments were carried out on real surveillance datasets using standard evaluation metrics. The model achieved an AUC-ROC of 93% at the frame level, indicating stable and consistent performance across different cases. The integration of spatial and temporal features contributed to a more accurate representation of complex motion patterns and reduced the likelihood of confusing abnormal behavior with normal activity.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Hiba Mohsin Abdulameer, Ali Mohsin Al-juboori

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.








