Abstract:Accurately detecting tea buds is the key to achieving automation and intelligence in tea bud harvesting. However, to accurately identify tea bud images, it is necessary to overcome the problem of tea bud colors being similar to the background and the target size being too small. Therefore, this article studies a YOLOv5s model based on hybrid attention mechanism and applies it to tea bud detection. This article makes optimizations in the following two aspects: firstly, a hybrid attention mechanism (HAM) is proposed and added to the YOLOv5s backbone network, which enables the network to focus on the target area, extract features more fully, and improve the accuracy of object recognition by the model. Secondly, by introducing normalized Wasserstein distance (NWD) as a new metric and combining it with the existing CIoU loss function. The NWD loss function calculates the similarity between the bounding boxes based on their corresponding Gaussian distributions, thereby improving the model's accuracy in detecting small targets in images. The experimental results show that compared with the original YOLOv5s model, the improved model mAP0.5 increased by 0.9%, mAP0.5:0.95 increased by 1.3%, while the number of parameters only increased by 0.044×106. These results confirm the effectiveness of the proposed method in achieving precise tea picking recognition, providing technical reference for intelligent tea picking in practical scenarios.