Abstract:
Global violence needs to be stopped to increase public safety. With the increasing number of surveillance cameras, manual monitoring of all surveillance feeds is less practical. Because of that, the development of technology-driven solutions to detect real-time violence and inform authorities to prevent it has become necessary. This study focuses on finding a novel deep learning approach to enhance violence detection, specifically addressing the limitations and complexities of previous studies. Notably, the research utilizes proposed models and techniques to evaluate real-life violence scenarios captured in Closed-Circuit Television (CCTV) footage, overcoming the challenges identified and improving the accuracy of violence detection. Two models were proposed in this research paper. The model architecture consists of a multimodal approach, integrating two deep learning techniques, Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). The proposed model utilizing VGG-16 with CNN layers and LSTM, achieved 89% accuracy on the real life violence situations dataset. This emphasizes the effectiveness of applying multimodal deep learning technique in detecting violence, outperforming similar research in accuracy.