Non-Facial Video Spatiotemporal Forensic Analysis Using Deep Learning Techniques


  • Premanand Ghadekar Department of Information Technology, Vishwakarma Institute of Technology, Pune, India
  • Vaibhavi Shetty Department of Information Technology, Vishwakarma Institute of Technology, Pune, India
  • Prapti Maheshwari Department of Information Technology, Vishwakarma Institute of Technology, Pune, India
  • Raj Shah Department of Information Technology, Vishwakarma Institute of Technology, Pune, India
  • Anish Shaha Department of Information Technology, Vishwakarma Institute of Technology, Pune, India
  • Vaishnav Sonawane Department of Information Technology, Vishwakarma Institute of Technology, Pune, India



transfer learning, mel-spectrogram, forgery, data augmentation


Digital content manipulation software is working as a boon for people to edit recorded video or audio content. To prevent the unethical use of such readily available altering tools, digital multimedia forensics is becoming increasingly important. Hence, this study aims to identify whether the video and audio of the given digital content are fake or real. For temporal video forgery detection, the convolutional 3D layers are used to build a model which can identify temporal forgeries with an average accuracy of 85% on the validation dataset. Also, the identification of audio forgery, using a ResNet-34 pre-trained model and the transfer learning approach, has been achieved. The proposed model achieves an accuracy of 99% with 0.3% validation loss on the validation part of the logical access dataset, which is better than earlier models in the range of 90-95% accuracy on the validation set.


S. Fadl, Q. Han, and Q. Li, “CNN Spatiotemporal Features and Fusion for Surveillance Video Forgery Detection,” Signal Processing: Image Communication, vol. 90, article no. 116066, January 2021.

Y. B. Deshmukh and S. K. Korde, “Forensic Video/Image Analytics – A Deep Learning Approach,” International Journal of Creative Research Thoughts (IJCRT), vol. 8, no. 9, pp. 411-418, September 2020.

D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, “MesoNet: A Compact Facial Video Forgery Detection Network,’’ IEEE International Workshop on Information Forensics and Security (WIFS), article no. 8630761, December 2018.

J. Xiao, S. Li, and Q. Xu, “Video-Based Evidence Analysis and Extraction in Digital Forensic Investigation,” IEEE Access, vol. 7, pp. 55432-55442, April 2019.

P. Ghadekar, P. Maheshwari, R. Shah, A. Shaha, V. Sonawane, and V. Shetty, “Video Forgery Dataset,”, September 10, 2022.

H. Malik and H. Farid, “Audio Forensics from Acoustic Reverberation,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1710-1713, March 2010.

C. Kraetzer, A. Oermann, J. Dittmann, and A. Lang, “Digital Audio Forensics: A First Practical Evaluation on Microphone and Environment Classification,” MM&Sec '07: Proceedings of the 9th Workshop on Multimedia & Security, pp. 63-74, September 2007.

J. Yamagishi, M. Todisco, M. Sahidullah, H. Delgado, X. Wang, N. Evans, et al., “ASVspoof 2019: The 3rd Automatic Speaker Verification Spoofing and Countermeasures Challenge database,” [sound]. University of Edinburgh. The Centre for Speech Technology Research (CSTR),

I. I. I. Richard and V. Roussev, “Digital Forensic Tools: The Next Generation,” Digital Crime and Forensic Science in Cyberspace, IGI Global, pp. 75-90, April 2006.

J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,” University of Washington, Technical Report, article no. 1804.02767, April 2018.

H. Farid, Photo Forensics, The MIT Press, February 2019.

D. Güera, Y. Wang, L. Bondi, P. Bestagini, S. Tubaro, and E. J. Delp, “A Counter-Forensic Method for CNN-Based Camera Model Identification,” IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1840-1847, July 2017.

D. Güera, F. Zhu, S. K. Yarlagadda, S. Tubaro, P. Bestagini, and E. J. Delp, “Reliability Map Estimation for Cnn-Based Camera Model Attribution,” IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 964-973, March 2018.

P. Bestagini, S. Milani, M. Tagliasacchi, and S. Tubaro, “Local Tampering Detection in Video Sequences,” IEEE 15th International Workshop on Multimedia Signal Processing (MMSP), pp. 488-493, September-October 2013.

D. Graupe, “Principles of Artificial Neural Networks,” Advanced Series in Circuits and Systems, Vol. 7, World Scientific, 2013.

P. Zhou, X. Han, V. I. Morariu, and L. S. Davis, “Two-Stream Neural Networks for Tampered Face Detection,” IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1831-1839, July 2017.

A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “Faceforensics: A Large-Scale Video Dataset for Forgery Detection in Human Faces,” arXiv preprint, article no. 1803.09179, March 2018.

G. E. Hinton, A. Krizhevsky, and S. D. Wang, “Transforming Auto-Encoders,” Artificial Neural Networks and Machine Learning – ICANN 2011, Lecture Notes in Computer Science, vol. 6791, pp. 44-51, 2011.

R. Sharma and A. Singh, “An Integrated Approach towards Efficient Image Classification Using Deep CNN with Transfer Learning and PCA,” Advances in Technology Innovation, vol. 7, no. 2, pp. 105-117, April 2022.

S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic Routing Between Capsules,” Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), pp. 3859-3869, December 2017.

M. Saddique, K. Asghar, U. I. Bajwa, M. Hussain, and Z. Habib, “Spatial Video Forgery Detection and Localization Using Texture Analysis of Consecutive Frames,” Advances in Electrical and Computer Engineering, vol. 19, no. 3, pp. 97-108, 2019.

V. Christlein, C. Riess, J. Jordan, C. Riess, and E. Angelopoulou, “An Evaluation of Popular Copy-Move Forgery Detection Approaches,” IEEE Transactions on Information Forensics and Security, vol. 7, no. 6, pp. 1841-1854, December 2012.

L. D’Amiano, D. Cozzolino, G. Poggi, and L. Verdoliva, “A PatchMatch-Based Dense-Field Algorithm for Video Copy–Move Detection and Localization,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 3, pp. 669-682, March 2019.

Y. Wu, X. Jiang, T. Sun, and W. Wang, “Exposing Video Inter-Frame Forgery Based on Velocity Field Consistency,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2674-2678, May 2014.

G. Ulutas, B. Ustubioglu, M. Ulutas, and V. V. Nabiyev, “Frame Duplication Detection Based on Bow Model,” Multimedia Systems, vol. 24, no. 5, pp. 549-567, October 2018.

G. Singh and K. Singh, “Video Frame and Region Duplication Forgery Detection Based on Correlation Coefficient and Coefficient of Variation,” Multimedia Tools and Applications, vol. 78, no. 9, pp. 11527-11562, May 2019.

Z. Wang, Y. Yang, C. Zeng, S. Kong, S. Feng, and N. Zhao, “Shallow and Deep Feature Fusion for Digital Audio Tampering Detection,” EURASIP Journal on Advances in Signal Processing, vol. 2022, article no. 69, 2022.

F. H. Chan, Y. T. Chen, Y. Xiang, and M. Sun, “Anticipating Accidents in Dashcam Videos,” Computer Vision – ACCV 2016, vol. 10114, pp 136-153, 2016.

S. Tyagi and D. Yadav, “A Detailed Analysis of Image And Video Forgery Detection Techniques,” The Visual Computer, 2022, in press.

I. B. K. Sudiatmika, F. Rahman, T. Trisno, and S. Suyoto, “Image Forgery Detection Using Error Level Analysis and Deep Learning,” Telecommunication Computing Electronics and Control (TELKOMNIKA), vol. 17, no. 2, pp. 653-659, April 2019.




How to Cite

Premanand Ghadekar, Vaibhavi Shetty, Prapti Maheshwari, Raj Shah, Anish Shaha, and Vaishnav Sonawane, “Non-Facial Video Spatiotemporal Forensic Analysis Using Deep Learning Techniques”, Proc. eng. technol. innov., vol. 23, pp. 01–14, Jan. 2023.