Litchi detection in the field using an improved YOLOv3 model

Hongxing Peng; Chao Xue; Yuanyuan Shao; Keyin Chen; Huanai Liu; Juntao Xiong; Hu Chen; Zongmei Gao; Zhengang Yang

Authors

Hongxing Peng College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China http://orcid.org/0000-0002-1872-8855
Chao Xue College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
Yuanyuan Shao College of Mechanical and electronic engineering, Shandong Agricultural University, Tai’an 271018, Shandong, China
Keyin Chen School of Electronic and Information Engineering, Jiaying University, Meizhou 514015, Guangdong, China
Huanai Liu School of Chemistry and Chemical Engineering, South China Technology of University, Guangzhou 510641, China
Juntao Xiong College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
Hu Chen College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
Zongmei Gao Center for Precision and Automated Agricultural Systems, Department of Biological Systems Engineering, Washington State University, Prosser, WA 99350, USA
Zhengang Yang College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China

Keywords:

deep learning, residual network, dense connection, feature pyramid network

Abstract

Due to the illumination, complex background, and occlusion of the litchi fruits, the accurate detection of litchi in the field is extremely challenging. In order to solve the problem of the low recognition rate of litchi-picking robots in field conditions, this study was inspired by the ideas of ResNet and dense convolution and proposed an improved feature-extraction network model named “YOLOv3_Litchi”, combining dense connections and residuals for the detection of litchis. Firstly, based on the traditional YOLOv3 deep convolution neural network and regression detection, the idea of residuals was to be put into the feature-extraction network to effectively avoid the problem of decreasing detection accuracy due to the excessive depths of the network layers. Secondly, under the premise of a good receptive field and high detection accuracy, the large convolution kernel was replaced by a small convolution kernel in the shallow layer of the network, thereby effectively reducing the model parameters. Finally, the idea of feature pyramid was used to design the network to identify the small target litchi to ensure that the shallow features were not lost and simultaneously reduced the model parameters. Experimental results show that the improved YOLOv3_Litchi model achieved better results than the classic YOLOv3_DarkNet-53 model and the YOLOv3_Tiny model. The mean average precision (mAP) score was 97.07%, which was higher than the 95.18% mAP of the YOLOv3_DarkNet-53 model and the 94.48% mAP of the YOLOv3_Tiny model. The frame frequency was 58 fps, which was higher than 29 fps of the YOLOv3_DarkNet-53 model. Compared with the classic Faster R-CNN model with the feature-extraction network VGG16, the mAP was increased by 1%, and the FPS advantage was obvious. Compared with the classic single shot multibox detector (SSD) model, both the accuracy and the running efficiency were improved. The results show that the improved YOLOv3_Litchi model had stronger robustness, higher detection accuracy, and less computational complexity for the identification of litchi in the field conditions, which should be helpful for litchi orchard precision management. Keywords: deep learning; residual network; dense connection; feature pyramid network DOI: 10.25165/j.ijabe.20221502.6541 Citation: Peng H X, Xue C, Shao Y Y, Chen K Y, Liu H N, Xiong J T, et al. Litchi detection in the field using an improved YOLOv3 model. Int J Agric & Biol Eng, 2022; 15(2): 211–220.

References

[1] Wei X Q, Ji K, Lan J H, Li Y W, Zeng Y L, Wang C M. Automatic method of fruit object extraction under complex agricultural background for vision system of fruit picking robot. Optik, 2014; 125(19): 5684–5689.
[2] Zhuang J J, Luo S M, Hou C J, Tang Y, He Y, Xue X Y. Detection of orchard citrus fruits using a monocular machine vision-based method for automatic fruit picking applications. Computers and Electronics in Agriculture, 2018; 152: 64–73.
[3] Tao Y T, Zhou J. Automatic apple recognition based on the fusion of color and 3D feature for robotic fruit picking. Computers and Electronics in Agriculture, 2017; 142(Part A): 388–396.
[4] Ni X D, Wang X, Wang S M, Wang S B, Yao Z, Ma Y B. Structure design and image recognition research of a picking device on the apple picking robot. IFAC-PapersOnLine, 2018, 51(17): 489–494.
[5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2012; 60: 84–90.
[6] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA: IEEE, 2015; pp.1–9. doi: 10.1109/CVPR.2015.7298594.
[7] Peng H X, Xue C, Shao Y Y, Chen K Y, Xiong J T, Xie Z H, et al. Semantic segmentation of litchi branches using DeepLabV3+ model. IEEE Access, 2020; 8: 164546–164555.
[8] Kang H W, Zhou H Y, Wang X, Chen C. Real-time fruit recognition and grasping estimation for robotic apple harvesting. Sensors, 2020; 20(19): 5670. doi: 10.3390/s20195670.
[9] He, K., Zhang, X., Ren, S. and Sun, J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas: IEEE, 2016; pp.770–778.
[10] Sa I, Ge Z Y, Dayoub F, Upcroft B, Perez T, McCool C. DeepFruits: A fruit detection system using deep neural networks. Sensors, 2016; 16(8): 1222. doi: 10.3390/s16081222.
[11] Bargoti S, Underwood J P. Image segmentation for fruit detection and yield estimation in apple orchards. Journal of Field Robotics, 2017; 34(6): 1039–1060.
[12] Zhang Y D, Dong Z C, Chen X Q, Jia W J, Du S D, Muhammad K, et al. Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation. Multimedia Tools and Applications, 2019; 78: 3613–3632.
[13] Kestur R, Meduri A, Narasipura O. MangoNet: A deep semantic segmentation architecture for a method to detect and count mangoes in an open orchard. Engineering Applications of Artificial Intelligence, 2019; 77: 59–69.
[14] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016; 39(6): 1137–1149.
[15] Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016; pp.779–788.
[16] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. SSD: Single Shot MultiBox Detector. In: European Conference on Computer Vision-ECCV 2016, Springer, 2016; 9905: 21–37. doi: 10.1007/ 978-3-319-46448-0_2.
[17] Tian Y N, Yang G D, Wang Z, Wang H, Li E, Liang, Z Z. Apple detection during different growth stages in orchards using the improved YOLO-v3 model. Computers and Electronics in Agriculture, 2019; 157: 417–426.
[18] Zhao J Y, Qu J H. Healthy and diseased tomatoes detection based on YOLOv2. In: International Conference on Human Centered Computing-HCC 2018, Springer, 2018; 11354: 347–353. doi: 10.1007/ 978-3-030-15127-0_34.
[19] Liu G X, Nouaze J C, Touko Mbouembe P L, Kim J H. YOLO-Tomato: A robust algorithm for tomato detection Based on YOLOv3. Sensors, 2020(7); 2145. doi: 10.3390/s20072145.
[20] Neubeck A, Gool L J V. Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR'06), HongKong, China: IEEE, 2006; pp.850–855. doi: 10.1109/ICPR.2006.479.
[21] He K M, Zhang X Y, Ren S Q, Jian S. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016; pp.770–778. doi: 10.1109/ CVPR.2016.90.
[22] Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA: IEEE, 2017; pp.2261–2269. doi: 10.1109/CVPR.2017.243.
[23] in T Y, Dollár P, Girshick R, He K, Hariharan B. Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA: IEEE, 2017; pp.936–944. doi: 10.1109/CVPR.2017.106.
[24] Ye M, Liu G Y. Facial expression recognition method based on shallow small convolution kernel capsule network. Journal of Circuits, Systems and Computers, 2020; 30(10): 2150177. doi: 10.1142/S0218126621501772.
[25] Tek F B, Çam I, Karlı D. Adaptive convolution kernel for artificial neural networks. Journal of Visual Communication and Image Representation, 2021; 75: 103015. arXiv: 2009.06385.
[26] Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, et al. Accurate, large minibatch SGD: Training ImageNet in 1 hour. 2017. arXiv: 1706.02677v1.

Litchi detection in the field using an improved YOLOv3 model

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

IJABE is an international peer reviewed open access journal, adopting Creative Commons Copyright Notices as follows.

Authors who publish with this journal agree to the following terms:

Information

Developed By