SKELETON-BASED HUMAN ACTION RECOGNITION USING CNN+SOFTMAX WITH MULTI-DIMENSIONAL CONNECTED WEIGHTS
Authors
Avazjon Rakhimovich Marakhimov, Kabul Kadirbergenovich Khudaybergenov

Share
Annotation
Skeleton-based human activity recognition through closed-circuit television surveillance systems has garnered substantial attention within the artificial intelligence research domain, primarily attributed to the rich feature representation inherent in skeletal data. Contemporary machine learning approaches predominantly employ joint-coordinate representations of human anatomical structure, resulting in suboptimal understanding of motion pattern classification. This work introduces a novel methodology utilizing SoftMax classification enhanced with multi-dimensional connected weights for improved human action categorization accuracy. Our approach emphasizes skeletal edge point analysis as discriminative features and develops a skeleton-driven algorithmic framework that extracts robust deep feature representations from skeletal point vectors through convolutional neural network architectures integrated with the proposed multi-dimensional weighted SoftMax classifier. Empirical validation conducted on established human action recognition benchmarks, including PennAction and CSL datasets, demonstrates the superior performance of our proposed methodology.
Keywords
Authors
Avazjon Rakhimovich Marakhimov, Kabul Kadirbergenovich Khudaybergenov

Share
References:
[1] T.V. Nguyen, B. Mirza, Dual-layer kernel extreme learning machine for action recognition, Neurocomputing 260 (2017) 123–130.
[2] R. Minhas, A. Baradarani, S. Seifzadeh, Q.J. Wu, Human action recognition using extreme learning machine based on visual vocabularies, Neurocomputing 73 (10–12) (2010) 1906–1917.
[3] D. Zhao, L. Shao, X. Zhen, Y. Liu, Combining appearance and structural features for human action recognition, Neurocomputing 113 (2013) 88–96.
[4] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri, A closer look at spatiotemporal convolutions for action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6450–6459.
[5] Z. Zhang, Microsoft kinect sensor and its effect, IEEE Multimedia 19 (2) (2012) 4–10.
[6] Z. Cao, T. Simon, S.-E. Wei, Y. Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291–7299.
[7] C. Cao, Y. Zhang, C. Zhang, H. Lu, Body joint guided 3-d deep convolutional descriptors for action recognition, IEEE Transactions on Cybernetics 48 (3) (2017) 1095–1108.
[8] J. Liu, A. Shahroudy, D. Xu, G. Wang, Spatio-temporal lstm with trust gates for 3d human action recognition, European Conference on Computer Vision (2016) 816–833.
[9] Y. Hou, Z. Li, P. Wang, W. Li, Skeleton optical spectra-based action recognition using convolutional neural networks, IEEE Transactions on Circuits and Systems for Video Technology 28 (3) (2016) 807–811.
[10] M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, Q. Tian, Actional-structural graph convolutional networks for skeleton-based action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3595–3603.
[11] C. Li, Y. Hou, P. Wang, W. Li, Joint distance maps based action recognition with convolutional neural networks, IEEE Signal Processing Letters 24 (5) (2017) 624–628.
[12] C. Si, Y. Jing, W. Wang, L. Wang, T. Tan, Skeleton-based action recognition with spatial reasoning and temporal stack learning, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 103–118.
[13] D. Wei, J.J. Lim, A. Zisserman, W.T. Freeman, Learning and using the arrow of time, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8052–8060.
[14] W. Zhang, M. Zhu, K.G. Derpanis, From actemes to action: A strongly-supervised representation for detailed action understanding, in: Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2248–2255.
[15] J. Zhang, W. Zhou, C. Xie, J. Pu, H. Li, Chinese sign language recognition with adaptive hmm, in: 2016 IEEE International Conference on Multimedia and Expo, 2016, pp. 1–6.
[16] M. Zolfaghari, K. Singh, T. Brox, Eco: Efficient convolutional network for online video understanding, in: Proceedings of the European Conference on Computer Vision, 2018, pp. 695–712.
[17] W. Du, Y. Wang, Y. Qiao, Rpan: An end-to-end recurrent pose-attention network for action recognition in videos, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3725–3734.
[18] W. Zhu, C. Lan, J. Xing, W. Zeng, Y. Li, L. Shen, X. Xie, Co-occurrence feature learning for skeleton based action recognition using regularized deep lstm networks, in: Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 3697–3703.
[19] S. Song, C. Lan, J. Xing, W. Zeng, J. Liu, An end-to-end spatio-temporal attention model for human action recognition from skeleton data, in: Thirty-first AAAI Conference on Artificial Intelligence, 2017, pp. 4263–4270.
[20] B. Zhang, Y. Yang, C. Chen, L. Yang, J. Han, L. Shao, Action recognition using 3d histograms of texture and a multi-class boosting classifier, IEEE Transactions on Image processing 26 (10) (2017) 4648–4660.
[21] L. Shi, Y. Zhang, J. Cheng, H. Lu, Skeleton-based action recognition with directed graph neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7912–7921.
[22] A. Newell, K. Yang, J. Deng, Stacked hourglass networks for human pose estimation, in: European Conference on Computer Vision, Springer, 2016, pp. 483–499.
[23] L. Shi, Y. Zhang, J. Cheng, H. Lu, Two-stream adaptive graph convolutional networks for skeleton-based action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 12026–12035.
[24] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[25] B. Xu, J. Li, Y. Wong, M. S. Kankanhalli, Q. Zhao, Interact as you intend: Intention-driven human-object interaction detection, in arXiv:1808.09796, 2019.
[26] M. Devanne, H. Wannous, S. Berretti, P. Pala, M. Daoudi, A. Del Bimbo, 3-d human action recognition by shape analysis of motion trajectories on riemannian manifold, IEEE Transactions on Cybernetics 45 (7) (2014) 1340–1352.
[27] C. Li, Q. Zhong, D. Xie, S. Pu, Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation, in: International Joint Conferences on Artificial Intelligence, 2018, pp. 786–792.
[28] E. Ohn-Bar, M. Trivedi, Joint angles similarities and hog2 for action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2013, pp. 465–470.
[29] R. Vemulapalli, F. Arrate, R. Chellappa, Human action recognition by representing 3d skeletons as points in a lie group, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 588–595.
[30] F. Ofli, R. Chaudhry, G. Kurillo, R. Vidal, R. Bajcsy, Sequence of the most informative joints (smij): A new representation for human skeletal action recognition, Journal of Visual Communication and Image Representation 25 (1) (2014) 24–38.
[31] Z. Huang, C. Wan, T. Probst, L. Van Gool, Deep learning on lie groups for skeleton-based action recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6099–6108.
[32] L. Shi, Y. Zhang, J. Cheng, H. Lu, Skeleton-based action recognition with directed graph neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7912–7921.
[33] H. Rahmani, M. Bennamoun, Learning action recognition model from depth and skeleton videos, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5832–5841.
