Comparison of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 Backbones on YOLO V4 as Object Detector

Marsa Mahasin, Irma Amelia Dewi


YOLO v4 has a structure consisting of 3 parts: backbone, neck, and head. The backbone is a part of the YOLO v4 structure that serves as a feature extractor from the image; the backbone is also a convolutional neural network that can be replaced with another convolutional neural network. Many backbones are recommended by previous research, such as CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0. Therefore, research needs to be done to determine the effect of different backbones on the  YOLO v4 model. One of the research objects that can be used is a microfossil. Research on the detection of microfossils is fundamental to assist paleontologists in knowing the species of microfossils as a determinant of rock age and distinguishing between similar microfossils. In this research, three backbones consisting of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 were used to train and detect image sets of 5 species of foraminiferal microfossils. The results were evaluated to determine the advantages of each backbone. There are a few metrics are that being used for evaluation, namely precision, recall, f1-score, average precision (AP), mean average precision (mAP), frames per second (FPS), and model size. As a result, the mean average precision (mAP) of the CSPDarkNet53 model reached 83.41%, the highest compared to CSPResNeXt-50 and EfficientNet-B0, which get a value of 81,00% and 81,76%. CSPResNeXt-50 model has a precision of 75.60%, recall of 81.10%, and f1-score of 78%. CSPDarkNet53 model also got the highest FPS value of 33.4FPS. However, the YOLO v4 model with the EfficientNet-B0 backbone is the lightest model, with only 156.8 MB.


YOLO, CSPDarkNet53, CSPResNeXt-50, EfficientNet-B0, Microfossil

Full Text:



L. Tan, T. Huangfu, L. Wu, and W. Chen, “Comparison of RetinaNet, SSD, and YOLO v3 for real-time pill identification.,†BMC Med. Inform. Decis. Mak., vol. 21, no. 1, p. 324, Nov. 2021, doi: 10.1186/s12911-021-01691-8.

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,†CoRR, vol. abs/2004.1, 2020.

C. Gaucher and D. Poiré, “Chapter 4.3 Biostratigraphy,†in Neoproterozoic-Cambrian Tectonics, Global Change And Evolution: A Focus On South Western Gondwana, vol. 16, C. Gaucher, A. N. Sial, H. E. Frimmel, and G. P. Halverson, Eds. Elsevier, 2009, pp. 103–114.

R. P. de Lima et al., “Convolutional Neural Networks as an Aid to Biostratigraphy and Micropaleontology: A Test On Late Paleozoic Microfossils,†Palaios, vol. 35, pp. 391–402, 2020.

R. Marchant, M. Tetard, A. Pratiwi, M. Adebayo, and T. De Garidel-Thoron, “Automated analysis of foraminifera fossil records by image classification using a convolutional neural network,†J. Micropalaeontology, vol. 39, no. 2, pp. 183–202, 2020, doi: 10.5194/jm-39-183-2020.

J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,†CoRR, vol. abs/1506.0, 2015.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,†CoRR, vol. abs/1512.0, 2015.

C.-Y. Wang, H.-Y. M. Liao, I.-H. Yeh, Y.-H. Wu, P.-Y. Chen, and J.-W. Hsieh, “CSPNet: {A} New Backbone that can Enhance Learning Capability of {CNN},†CoRR, vol. abs/1911.1, 2019.

Z. Yao, Y. Cao, S. Zheng, G. Huang, and S. Lin, “Cross-Iteration Batch Normalization,†CoRR, vol. abs/2002.0, 2020.

C.-J. Chou, J.-T. Chien, and H.-T. Chen, “Self Adversarial Training for Human Pose Estimation,†CoRR, vol. abs/1707.02439, 2017, [Online]. Available:

D. Misra, “Mish: {A} Self Regularized Non-Monotonic Neural Activation Function,†CoRR, vol. abs/1908.0, 2019.

G. Ghiasi, T.-Y. Lin, and Q. V Le, “DropBlock: {A} regularization method for convolutional networks,†CoRR, vol. abs/1810.1, 2018.

Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression,†in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, vol. 34, pp. 12993–13000, doi: 10.1609/aaai.v34i07.6999.

K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,†CoRR, vol. abs/1406.4, 2014.

S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,†CoRR, vol. abs/1803.0, 2018.

J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,†CoRR, vol. abs/1804.0, 2018.

J. Park, J. Baek, J. Kim, K. You, and K. Kim, “Deep Learning-Based Algal Detection Model Development Considering Field Application,†Water, vol. 14, no. 8, 2022, doi: 10.3390/w14081275.

M. Tan and Q. V Le, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks,†CoRR, vol. abs/1905.1, 2019.

J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,†CoRR, vol. abs/1709.0, 2017, [Online]. Available:

P. Ramachandran, B. Zoph, and Q. V Le, “Searching for Activation Functions,†CoRR, vol. abs/1710.0, 2017, [Online]. Available:


Article Metrics

Abstract view : 191 times
PDF - 85 times


  • There are currently no refbacks.

Copyright (c) 2022 Marsa Mahasin, Irma Amelia Dewi

International Journal of Engineering, Science and Information Technology (IJESTY) eISSN 2775-2674