A widely utilized object detection technique in computer vision involves Convolutional Neural Networks (CNN) due to their simplicity and efficiency. The effectiveness of CNN-based object detection relies significantly on the choice of loss function, with localization precision being a critical determinant. In order to improve localization accuracy, we have made changes inside CIoU loss function resulting in the development of a new loss function known as Area-CIoU (ACIoU). This new loss function specifically adopts a comprehensive approach by taking into account the alignment of bounding boxes between predictions and ground truth, combining the relationship between aspect ratio and area for both bounding boxes. When both bounding boxes have the same aspect ratio, we take into account how the prediction box may affect localization accuracy. As a result, the penalty function is strengthened, which improves the network model's localization precision. Experimental results on a custom dataset of vehicles including car, person, motorcycle, truck and bus, affirm the efficacy of ACIoU in enhancing the localization accuracy of network models, as demonstrated through its application in the one-stage object detector YOLOv4. Experiments also show that the network’s accuracy was enhanced but its FPS dropped due to the new penalty term composition in the loss function. We achieved AP of 88.48% and average recall rate of 86.37% with 41 frames per second.
Published in | Mathematics and Computer Science (Volume 8, Issue 5) |
DOI | 10.11648/j.mcs.20230805.11 |
Page(s) | 104-111 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2023. Published by Science Publishing Group |
Object Detection, Loss Function, Real-Time, YOLOv4
[1] | A. Kumar and S. Srivastava, “Object Detection System Based on Convolution Neural Networks Using Single Shot Multi-Box Detector,” Procedia Comput. Sci., vol. 171, no. 2019, pp. 2610–2617, 2020, doi: 10.1016/j.procs.2020.04.283. |
[2] | J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 779–788, 2016, doi: 10.1109/CVPR.2016.91. |
[3] | J. Tao, H. Wang, X. Zhang, X. Li, and H. Yang, “An object detection system based on YOLO in traffic scene,” Proc. 2017 6th Int. Conf. Comput. Sci. Netw. Technol. ICCSNT 2017, vol. 2018-Janua, pp. 315–319, 2018, doi: 10.1109/ICCSNT.2017.8343709. |
[4] | T. Ahmad et al., “Object Detection through Modified YOLO Neural Network,” Sci. Program., vol. 2020, 2020, doi: 10.1155/2020/8403262. |
[5] | Z. Q. Zhao, P. Zheng, S. T. Xu, and X. Wu, “Object Detection with Deep Learning: A Review,” IEEE Trans. Neural Networks Learn. Syst., vol. 30, no. 11, pp. 3212–3232, 2019, doi: 10.1109/TNNLS.2018.2876865. |
[6] | S. Lu, B. Wang, H. Wang, L. Chen, M. Linjian, and X. Zhang, “A real-time object detection algorithm for video,” Comput. Electr. Eng., vol. 77, pp. 398–408, 2019, doi: 10.1016/j.compeleceng.2019.05.009. |
[7] | M. Algabri, H. Mathkour, M. A. Bencherif, M. Alsulaiman, and M. A. Mekhtiche, “Towards Deep Object Detection Techniques for Phoneme Recognition,” IEEE Access, vol. 8, pp. 54663–54680, 2020, doi: 10.1109/ACCESS.2020.2980452. |
[8] | P. Sermanet and D. Eigen, “OverFeat : Integrated Recognition, Localization and Detection using Convolutional Networks arXiv : 1312. 6229v4 [cs. CV] 24 Feb 2014”. |
[9] | R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 580–587, 2014, doi: 10.1109/CVPR.2014.81. |
[10] | R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448. doi: 10.1109/ICCV.2015.169. |
[11] | S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031. |
[12] | W. Liu et al., “SSD: Single shot multibox detector,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9905 LNCS, pp. 21–37, 2016, doi: 10.1007/978-3-319-46448-0_2. |
[13] | J. Redmon and A. Farhadi, “Yolo V2.0,” Cvpr2017, no. April, pp. 187–213, 2017, [Online]. Available: http://www.worldscientific.com/doi/abs/10.1142/9789812771728_0012 |
[14] | J. Redmon and A. Farhadi, “YOLO v.3, An incremental improvement” Tech Rep., pp. 1–6, 2018, [Online]. Available: https://pjreddie.com/media/files/papers/YOLOv3.pdf |
[15] | A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv. 2020. |
[16] | X. Wang and J. Song, “ICIoU: Improved Loss Based on Complete Intersection over Union for Bounding Box Regression,” IEEE Access, vol. 9, pp. 105686–105695, 2021, doi: 10.1109/ACCESS.2021.3100414. |
[17] | J. Yu, Y. Jiang, Z. Wang, Z. Cao, and T. Huang, “UnitBox,” pp. 516–520, 2016, doi: 10.1145/2964284.2967274. |
[18] | H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 658–666, 2019, doi: 10.1109/CVPR.2019.00075. |
[19] | X. Qian, S. Lin, G. Cheng, X. Yao, H. Ren, and W. Wang, “Object detection in remote sensing images based on improved bounding box regression and multi-level features fusion,” Remote Sens., vol. 12, no. 1, 2020, doi: 10.3390/RS12010143. |
[20] | Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU loss: Faster and better learning for bounding box regression,” AAAI 2020 - 34th AAAI Conf. Artif. Intell., no. 2, pp. 12993–13000, 2020, doi: 10.1609/aaai.v34i07.6999. |
APA Style
Saleem, M., Sheikh, N., Rehman, A., Rafiq, M., Jahan, S. (2023). Real-Time Object Identification Through Convolution Neural Network Based on YOLO Algorithm. Mathematics and Computer Science, 8(5), 104-111. https://doi.org/10.11648/j.mcs.20230805.11
ACS Style
Saleem, M.; Sheikh, N.; Rehman, A.; Rafiq, M.; Jahan, S. Real-Time Object Identification Through Convolution Neural Network Based on YOLO Algorithm. Math. Comput. Sci. 2023, 8(5), 104-111. doi: 10.11648/j.mcs.20230805.11
AMA Style
Saleem M, Sheikh N, Rehman A, Rafiq M, Jahan S. Real-Time Object Identification Through Convolution Neural Network Based on YOLO Algorithm. Math Comput Sci. 2023;8(5):104-111. doi: 10.11648/j.mcs.20230805.11
@article{10.11648/j.mcs.20230805.11, author = {Muhammad Saleem and Naveed Sheikh and Abdul Rehman and Muhammad Rafiq and Shah Jahan}, title = {Real-Time Object Identification Through Convolution Neural Network Based on YOLO Algorithm}, journal = {Mathematics and Computer Science}, volume = {8}, number = {5}, pages = {104-111}, doi = {10.11648/j.mcs.20230805.11}, url = {https://doi.org/10.11648/j.mcs.20230805.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.mcs.20230805.11}, abstract = {A widely utilized object detection technique in computer vision involves Convolutional Neural Networks (CNN) due to their simplicity and efficiency. The effectiveness of CNN-based object detection relies significantly on the choice of loss function, with localization precision being a critical determinant. In order to improve localization accuracy, we have made changes inside CIoU loss function resulting in the development of a new loss function known as Area-CIoU (ACIoU). This new loss function specifically adopts a comprehensive approach by taking into account the alignment of bounding boxes between predictions and ground truth, combining the relationship between aspect ratio and area for both bounding boxes. When both bounding boxes have the same aspect ratio, we take into account how the prediction box may affect localization accuracy. As a result, the penalty function is strengthened, which improves the network model's localization precision. Experimental results on a custom dataset of vehicles including car, person, motorcycle, truck and bus, affirm the efficacy of ACIoU in enhancing the localization accuracy of network models, as demonstrated through its application in the one-stage object detector YOLOv4. Experiments also show that the network’s accuracy was enhanced but its FPS dropped due to the new penalty term composition in the loss function. We achieved AP of 88.48% and average recall rate of 86.37% with 41 frames per second. }, year = {2023} }
TY - JOUR T1 - Real-Time Object Identification Through Convolution Neural Network Based on YOLO Algorithm AU - Muhammad Saleem AU - Naveed Sheikh AU - Abdul Rehman AU - Muhammad Rafiq AU - Shah Jahan Y1 - 2023/12/28 PY - 2023 N1 - https://doi.org/10.11648/j.mcs.20230805.11 DO - 10.11648/j.mcs.20230805.11 T2 - Mathematics and Computer Science JF - Mathematics and Computer Science JO - Mathematics and Computer Science SP - 104 EP - 111 PB - Science Publishing Group SN - 2575-6028 UR - https://doi.org/10.11648/j.mcs.20230805.11 AB - A widely utilized object detection technique in computer vision involves Convolutional Neural Networks (CNN) due to their simplicity and efficiency. The effectiveness of CNN-based object detection relies significantly on the choice of loss function, with localization precision being a critical determinant. In order to improve localization accuracy, we have made changes inside CIoU loss function resulting in the development of a new loss function known as Area-CIoU (ACIoU). This new loss function specifically adopts a comprehensive approach by taking into account the alignment of bounding boxes between predictions and ground truth, combining the relationship between aspect ratio and area for both bounding boxes. When both bounding boxes have the same aspect ratio, we take into account how the prediction box may affect localization accuracy. As a result, the penalty function is strengthened, which improves the network model's localization precision. Experimental results on a custom dataset of vehicles including car, person, motorcycle, truck and bus, affirm the efficacy of ACIoU in enhancing the localization accuracy of network models, as demonstrated through its application in the one-stage object detector YOLOv4. Experiments also show that the network’s accuracy was enhanced but its FPS dropped due to the new penalty term composition in the loss function. We achieved AP of 88.48% and average recall rate of 86.37% with 41 frames per second. VL - 8 IS - 5 ER -