Skip to content
2000
Volume 19, Issue 2
  • ISSN: 1872-2121
  • E-ISSN: 2212-4047

Abstract

Background

The automated classification of videos through artificial neural networks is addressed in this work. To explore the concepts and measure the results, the data set UCF101 is used, consisting of video clips taken from YouTube to recognize actions. The study is carried out with the authors' resources to determine the feasibility of independent research in the area.

Methods

This work was developed in the Python programming language using the Keras library with Tensorflow as the back-end. The objective is to develop a network that presents performance compatible with the state of the art in terms of classifying videos according to the actions taken.

Results

Given the hardware limitations, there is considerable distance between the implementation possibilities in this work and what is known as the state-of-the-art.

Conclusion

Throughout the work, some aspects in which this limitation influenced the development are presented, but it is shown that this realization is feasible and that obtaining expressive results is possible 98.6% accuracy is obtained in the UCF101 data set, compared to the 98 percentage points of the best result ever reported, using, however, considerably fewer resources. In addition, the importance of transfer learning in achieving expressive results as well as the different performances of each architecture are reviewed. Thus, this work may open doors to carry patent-based outcomes.

Loading

Article metrics loading...

/content/journals/eng/10.2174/0118722121248139231023111754
2025-02-01
2024-11-22
Loading full text...

Full text loading...

References

  1. JeghamI. Ben KhalifaA. AlouaniI. MahjoubM.A. Vision-based human action recognition: An overview and real world challenges.Forensic Sci. Int. Digit. Investig.20203220090110.1016/j.fsidi.2019.200901
    [Google Scholar]
  2. FeichtenhoferC. FanH. MalikJ. HeK. Slowfast networks for video recognitionProceedings of the IEEE/CVF International Conference on Computer VisionSeoul, South Korea, pp. 6202-6211, 2019.
    [Google Scholar]
  3. BeddiarD.R. NiniB. SabokrouM. HadidA. Vision-based human activity recognition: A survey.Multimedia Tools Appl.20207941-42305093055510.1007/s11042‑020‑09004‑3
    [Google Scholar]
  4. MonfortM. VondrickC. OlivaA. AndonianA. ZhouB. RamakrishnanK. BargalS.A. YanT. BrownL. FanQ. GutfreundD. Moments in time dataset: One million videos for event understanding.IEEE Trans. Pattern Anal. Mach. Intell.202042250250810.1109/TPAMI.2019.290146430802849
    [Google Scholar]
  5. SongL. YuG. YuanJ. LiuZ. Human pose estimation and its application to action recognition: A survey.J. Vis. Commun. Image Represent.20217610305510.1016/j.jvcir.2021.103055
    [Google Scholar]
  6. LiJ. LiuX. ZhangM. WangD. Spatio-temporal deformable 3D ConvNets with attention for action recognition.Pattern Recognit.20209810703710.1016/j.patcog.2019.107037
    [Google Scholar]
  7. MajumderS. KehtarnavazN. A review of real-time human action recognition involving vision sensing, Real-Time Image Processing and Deep Learning.Int. Society Optics Photonics202111736117360A
    [Google Scholar]
  8. ZhouY. SunX. LuoC. ZhaZ-J. ZengW. Spatiotemporal fusion in 3d cnns: A probabilistic viewProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, pp. 9829-9838, 2020.10.1109/CVPR42600.2020.00985
    [Google Scholar]
  9. MajdM. SafabakhshR. Correlational convolutional LSTM for human action recognition.Neurocomputing202039622422910.1016/j.neucom.2018.10.095
    [Google Scholar]
  10. QiuZ. YaoT. NgoC-W. TianX. MeiT. Learning spatio-temporal representation with local and global diffusionProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Long Beach, CA, USA, pp. 12056-12065, 2019.10.1109/CVPR.2019.01233
    [Google Scholar]
  11. HuangG. BorsA.G. Learning spatio-temporal representations with temporal squeeze poolingProceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)IEEEBarcelona, Spain21032107, 202010.1109/ICASSP40776.2020.9054200
    [Google Scholar]
  12. LiuQ. CheX. BieM. R-stan: R-STAN: Rpublisheresidual spatial-temporal attention network for action recognition.IEEE Access20197822468225510.1109/ACCESS.2019.2923651
    [Google Scholar]
  13. KalfaogluM.E. KalkanS. AlatanA.A. Late temporal modeling in 3d cnn architectures with bert for action recognitionProceedings of the European Conference on Computer VisionSpringerGlasgow, Scotland731747, 202010.1007/978‑3‑030‑68238‑5_48
    [Google Scholar]
  14. ChiL. TianG. MuY. TianQ. Two-stream video classification with cross-modality attentionProceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 2019. Seoul, Korea
    [Google Scholar]
  15. LiX. ShuaiB. TigheJ. Directional temporal modeling for action recognitionProceedings of the European Conference on Computer Vision275291SpringerGlasgow, Scotland2020
    [Google Scholar]
  16. HongJ. ChoB. HongY. ByunH. Contextual action cues from camera sensor for multi-stream action recognition.Sensors2019196138210.3390/s1906138230897792
    [Google Scholar]
  17. XuJ. SongR. WeiH. GuoJ. ZhouY. HuangX. A fast human action recognition network based on spatio-temporal features.Neurocomputing202144135035810.1016/j.neucom.2020.04.150
    [Google Scholar]
  18. DibaA. FayyazM. SharmaV. Large scale holistic video understandingProceedings of the European Conference on Computer VisionSpringerGlasgow, Scotland593610, 2020
    [Google Scholar]
  19. LiuZ. LiZ. WangR. ZongM. JiW. Spatiotemporal saliency-based multi-stream networks with attention-aware LSTM for action recognition.Neural. Comput. Appl.20203218145931460210.1007/s00521‑020‑05144‑7
    [Google Scholar]
  20. WangX. GaoL. WangP. SunX. LiuX. Two-stream 3D convNet fusion for action recognition in videos with arbitrary size and length.IEEE Trans. Multimed.201820363464410.1109/TMM.2017.2749159
    [Google Scholar]
  21. LiS. ShengY. LuoZ. MinH. SuS. CaoD. Human behavior recognition method based on deep neural network.U.S. Patent 109919031B2019
    [Google Scholar]
  22. TianlianfangW.Q. QichaoW. QiliangD. 23Method for detecting and identifying abnormal behaviors of multiple persons based on machine visionU.S. Patent 109522793B2015
    [Google Scholar]
  23. https://www.crcv.ucf.edu/data/UCF101.php
/content/journals/eng/10.2174/0118722121248139231023111754
Loading
/content/journals/eng/10.2174/0118722121248139231023111754
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test