Litchi detection in the field using an improved YOLOv3 model
Keywords:
deep learning, residual network, dense connection, feature pyramid networkAbstract
Due to the illumination, complex background, and occlusion of the litchi fruits, the accurate detection of litchi in the field is extremely challenging. In order to solve the problem of the low recognition rate of litchi-picking robots in field conditions, this study was inspired by the ideas of ResNet and dense convolution and proposed an improved feature-extraction network model named “YOLOv3_Litchi”, combining dense connections and residuals for the detection of litchis. Firstly, based on the traditional YOLOv3 deep convolution neural network and regression detection, the idea of residuals was to be put into the feature-extraction network to effectively avoid the problem of decreasing detection accuracy due to the excessive depths of the network layers. Secondly, under the premise of a good receptive field and high detection accuracy, the large convolution kernel was replaced by a small convolution kernel in the shallow layer of the network, thereby effectively reducing the model parameters. Finally, the idea of feature pyramid was used to design the network to identify the small target litchi to ensure that the shallow features were not lost and simultaneously reduced the model parameters. Experimental results show that the improved YOLOv3_Litchi model achieved better results than the classic YOLOv3_DarkNet-53 model and the YOLOv3_Tiny model. The mean average precision (mAP) score was 97.07%, which was higher than the 95.18% mAP of the YOLOv3_DarkNet-53 model and the 94.48% mAP of the YOLOv3_Tiny model. The frame frequency was 58 fps, which was higher than 29 fps of the YOLOv3_DarkNet-53 model. Compared with the classic Faster R-CNN model with the feature-extraction network VGG16, the mAP was increased by 1%, and the FPS advantage was obvious. Compared with the classic single shot multibox detector (SSD) model, both the accuracy and the running efficiency were improved. The results show that the improved YOLOv3_Litchi model had stronger robustness, higher detection accuracy, and less computational complexity for the identification of litchi in the field conditions, which should be helpful for litchi orchard precision management. Keywords: deep learning; residual network; dense connection; feature pyramid network DOI: 10.25165/j.ijabe.20221502.6541 Citation: Peng H X, Xue C, Shao Y Y, Chen K Y, Liu H N, Xiong J T, et al. Litchi detection in the field using an improved YOLOv3 model. Int J Agric & Biol Eng, 2022; 15(2): 211–220.References
[1] Wei X Q, Ji K, Lan J H, Li Y W, Zeng Y L, Wang C M. Automatic method of fruit object extraction under complex agricultural background for vision system of fruit picking robot. Optik, 2014; 125(19): 5684–5689.
[2] Zhuang J J, Luo S M, Hou C J, Tang Y, He Y, Xue X Y. Detection of orchard citrus fruits using a monocular machine vision-based method for automatic fruit picking applications. Computers and Electronics in Agriculture, 2018; 152: 64–73.
[3] Tao Y T, Zhou J. Automatic apple recognition based on the fusion of color and 3D feature for robotic fruit picking. Computers and Electronics in Agriculture, 2017; 142(Part A): 388–396.
[4] Ni X D, Wang X, Wang S M, Wang S B, Yao Z, Ma Y B. Structure design and image recognition research of a picking device on the apple picking robot. IFAC-PapersOnLine, 2018, 51(17): 489–494.
[5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2012; 60: 84–90.
[6] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA: IEEE, 2015; pp.1–9. doi: 10.1109/CVPR.2015.7298594.
[7] Peng H X, Xue C, Shao Y Y, Chen K Y, Xiong J T, Xie Z H, et al. Semantic segmentation of litchi branches using DeepLabV3+ model. IEEE Access, 2020; 8: 164546–164555.
[8] Kang H W, Zhou H Y, Wang X, Chen C. Real-time fruit recognition and grasping estimation for robotic apple harvesting. Sensors, 2020; 20(19): 5670. doi: 10.3390/s20195670.
[9] He, K., Zhang, X., Ren, S. and Sun, J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas: IEEE, 2016; pp.770–778.
[10] Sa I, Ge Z Y, Dayoub F, Upcroft B, Perez T, McCool C. DeepFruits: A fruit detection system using deep neural networks. Sensors, 2016; 16(8): 1222. doi: 10.3390/s16081222.
[11] Bargoti S, Underwood J P. Image segmentation for fruit detection and yield estimation in apple orchards. Journal of Field Robotics, 2017; 34(6): 1039–1060.
[12] Zhang Y D, Dong Z C, Chen X Q, Jia W J, Du S D, Muhammad K, et al. Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation. Multimedia Tools and Applications, 2019; 78: 3613–3632.
[13] Kestur R, Meduri A, Narasipura O. MangoNet: A deep semantic segmentation architecture for a method to detect and count mangoes in an open orchard. Engineering Applications of Artificial Intelligence, 2019; 77: 59–69.
[14] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016; 39(6): 1137–1149.
[15] Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016; pp.779–788.
[16] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. SSD: Single Shot MultiBox Detector. In: European Conference on Computer Vision-ECCV 2016, Springer, 2016; 9905: 21–37. doi: 10.1007/ 978-3-319-46448-0_2.
[17] Tian Y N, Yang G D, Wang Z, Wang H, Li E, Liang, Z Z. Apple detection during different growth stages in orchards using the improved YOLO-v3 model. Computers and Electronics in Agriculture, 2019; 157: 417–426.
[18] Zhao J Y, Qu J H. Healthy and diseased tomatoes detection based on YOLOv2. In: International Conference on Human Centered Computing-HCC 2018, Springer, 2018; 11354: 347–353. doi: 10.1007/ 978-3-030-15127-0_34.
[19] Liu G X, Nouaze J C, Touko Mbouembe P L, Kim J H. YOLO-Tomato: A robust algorithm for tomato detection Based on YOLOv3. Sensors, 2020(7); 2145. doi: 10.3390/s20072145.
[20] Neubeck A, Gool L J V. Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR'06), HongKong, China: IEEE, 2006; pp.850–855. doi: 10.1109/ICPR.2006.479.
[21] He K M, Zhang X Y, Ren S Q, Jian S. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016; pp.770–778. doi: 10.1109/ CVPR.2016.90.
[22] Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA: IEEE, 2017; pp.2261–2269. doi: 10.1109/CVPR.2017.243.
[23] in T Y, Dollár P, Girshick R, He K, Hariharan B. Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA: IEEE, 2017; pp.936–944. doi: 10.1109/CVPR.2017.106.
[24] Ye M, Liu G Y. Facial expression recognition method based on shallow small convolution kernel capsule network. Journal of Circuits, Systems and Computers, 2020; 30(10): 2150177. doi: 10.1142/S0218126621501772.
[25] Tek F B, Çam I, Karlı D. Adaptive convolution kernel for artificial neural networks. Journal of Visual Communication and Image Representation, 2021; 75: 103015. arXiv: 2009.06385.
[26] Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, et al. Accurate, large minibatch SGD: Training ImageNet in 1 hour. 2017. arXiv: 1706.02677v1.
[2] Zhuang J J, Luo S M, Hou C J, Tang Y, He Y, Xue X Y. Detection of orchard citrus fruits using a monocular machine vision-based method for automatic fruit picking applications. Computers and Electronics in Agriculture, 2018; 152: 64–73.
[3] Tao Y T, Zhou J. Automatic apple recognition based on the fusion of color and 3D feature for robotic fruit picking. Computers and Electronics in Agriculture, 2017; 142(Part A): 388–396.
[4] Ni X D, Wang X, Wang S M, Wang S B, Yao Z, Ma Y B. Structure design and image recognition research of a picking device on the apple picking robot. IFAC-PapersOnLine, 2018, 51(17): 489–494.
[5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2012; 60: 84–90.
[6] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, USA: IEEE, 2015; pp.1–9. doi: 10.1109/CVPR.2015.7298594.
[7] Peng H X, Xue C, Shao Y Y, Chen K Y, Xiong J T, Xie Z H, et al. Semantic segmentation of litchi branches using DeepLabV3+ model. IEEE Access, 2020; 8: 164546–164555.
[8] Kang H W, Zhou H Y, Wang X, Chen C. Real-time fruit recognition and grasping estimation for robotic apple harvesting. Sensors, 2020; 20(19): 5670. doi: 10.3390/s20195670.
[9] He, K., Zhang, X., Ren, S. and Sun, J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas: IEEE, 2016; pp.770–778.
[10] Sa I, Ge Z Y, Dayoub F, Upcroft B, Perez T, McCool C. DeepFruits: A fruit detection system using deep neural networks. Sensors, 2016; 16(8): 1222. doi: 10.3390/s16081222.
[11] Bargoti S, Underwood J P. Image segmentation for fruit detection and yield estimation in apple orchards. Journal of Field Robotics, 2017; 34(6): 1039–1060.
[12] Zhang Y D, Dong Z C, Chen X Q, Jia W J, Du S D, Muhammad K, et al. Image based fruit category classification by 13-layer deep convolutional neural network and data augmentation. Multimedia Tools and Applications, 2019; 78: 3613–3632.
[13] Kestur R, Meduri A, Narasipura O. MangoNet: A deep semantic segmentation architecture for a method to detect and count mangoes in an open orchard. Engineering Applications of Artificial Intelligence, 2019; 77: 59–69.
[14] Ren S Q, He K M, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016; 39(6): 1137–1149.
[15] Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016; pp.779–788.
[16] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, et al. SSD: Single Shot MultiBox Detector. In: European Conference on Computer Vision-ECCV 2016, Springer, 2016; 9905: 21–37. doi: 10.1007/ 978-3-319-46448-0_2.
[17] Tian Y N, Yang G D, Wang Z, Wang H, Li E, Liang, Z Z. Apple detection during different growth stages in orchards using the improved YOLO-v3 model. Computers and Electronics in Agriculture, 2019; 157: 417–426.
[18] Zhao J Y, Qu J H. Healthy and diseased tomatoes detection based on YOLOv2. In: International Conference on Human Centered Computing-HCC 2018, Springer, 2018; 11354: 347–353. doi: 10.1007/ 978-3-030-15127-0_34.
[19] Liu G X, Nouaze J C, Touko Mbouembe P L, Kim J H. YOLO-Tomato: A robust algorithm for tomato detection Based on YOLOv3. Sensors, 2020(7); 2145. doi: 10.3390/s20072145.
[20] Neubeck A, Gool L J V. Efficient non-maximum suppression. In: 18th International Conference on Pattern Recognition (ICPR'06), HongKong, China: IEEE, 2006; pp.850–855. doi: 10.1109/ICPR.2006.479.
[21] He K M, Zhang X Y, Ren S Q, Jian S. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016; pp.770–778. doi: 10.1109/ CVPR.2016.90.
[22] Huang G, Liu Z, Van Der Maaten L, Weinberger K Q. Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA: IEEE, 2017; pp.2261–2269. doi: 10.1109/CVPR.2017.243.
[23] in T Y, Dollár P, Girshick R, He K, Hariharan B. Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA: IEEE, 2017; pp.936–944. doi: 10.1109/CVPR.2017.106.
[24] Ye M, Liu G Y. Facial expression recognition method based on shallow small convolution kernel capsule network. Journal of Circuits, Systems and Computers, 2020; 30(10): 2150177. doi: 10.1142/S0218126621501772.
[25] Tek F B, Çam I, Karlı D. Adaptive convolution kernel for artificial neural networks. Journal of Visual Communication and Image Representation, 2021; 75: 103015. arXiv: 2009.06385.
[26] Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, et al. Accurate, large minibatch SGD: Training ImageNet in 1 hour. 2017. arXiv: 1706.02677v1.
Downloads
Published
2022-04-23
How to Cite
Peng, H., Xue, C., Shao, Y., Chen, K., Liu, H., Xiong, J., … Yang, Z. (2022). Litchi detection in the field using an improved YOLOv3 model. International Journal of Agricultural and Biological Engineering, 15(2), 211–220. Retrieved from https://ijabe.migration.pkpps03.publicknowledgeproject.org/index.php/ijabe/article/view/6541
Issue
Section
Information Technology, Sensors and Control Systems
License
IJABE is an international peer reviewed open access journal, adopting Creative Commons Copyright Notices as follows.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).