A Transformer-Enhanced Human-Centric Unsupervised Framework for Multi-Person Video Anomaly Detection

Hiba Mohsin Abdulameer; Ali Mohsin Al-juboori

doi:10.29304/jqcsm.2026.18.22700

Authors

Hiba Mohsin Abdulameer College of Computer Science and Information Technology, University of Al-Qadisiyah, Iraq
Ali Mohsin Al-juboori College of Computer Science and Information Technology,University of Al-Qadisiyah, Al-Qadisiyah, Iraq

DOI:

https://doi.org/10.29304/jqcsm.2026.18.22700

Keywords:

Anomaly detection, Skeleton-based analysis, HuVAD, Video surveillance

Abstract

Video anomaly detection in surveillance environments is still difficult. This is because abnormal events do not happen often, take different forms, and depend on the complex nature of real scenes. In addition, methods that depend on visual appearance are affected by changes in lighting, camera angles, and background conditions. These issues can reduce detection accuracy and also cause privacy problems. For this reason, recent studies focus more on motion-based representations that describe human behavior and reduce the effect of unnecessary visual details.

In this work, a framework for video anomaly detection is proposed. Spatial motion features are extracted using a 2D convolutional neural network. These features are then passed to a GRU network to model motion over time. A Transformer module is also used to help capture longer temporal relationships in motion sequences.

The proposed framework is able to handle scenes that include more than one person. During training and testing, all detected persons are considered within fixed-length temporal windows. Information from each person is then combined to produce an anomaly score that represents the overall scene behavior. This helps the model detect abnormal activities even in crowded surveillance scenes.

The proposed model uses an unsupervised one class learning approach. Training is performed using normal motion data only, Abnormal events are identified by observing deviations from the learned patterns of normal behavior. The experiments were carried out on real surveillance datasets using standard evaluation metrics. The model achieved an AUC-ROC of 93% at the frame level, indicating stable and consistent performance across different cases. The integration of spatial and temporal features contributed to a more accurate representation of complex motion patterns and reduced the likelihood of confusing abnormal behavior with normal activity.

Downloads

Download data is not yet available.

A Transformer-Enhanced Human-Centric Unsupervised Framework for Multi-Person Video Anomaly Detection

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

indexed

Make a Submission

Information

Developed By

journaldetails

details

Journal Details

Journal Policy

Aims and Scope

About Paper Review

Review Process

Abstracting and Indexing

Feedback

guidelines

Guidelines for Authors

Instruction for Authors

Copyright Agreement

DECLARATION FORM

Example of Published Paper

Licenses and Copyright

Publishing Fees:

Current Issue

Journal of Al-Qadisiyah for computer science and mathematics (JQCSM)

ISSN 2521-3504 (Online), ISSN 2074-0204 (Print)

It is scientific journal issued by College of computer Science and IT / University of Al-Qadisiyah