Abstract:Fine-grained image classification is a fundamental and important task in field of computer vision.The purpose of the task is to distinguish between object categories that have subtle inter-class differences (e.g.,birds,flowers,or animals of different sub-categories).Different from traditional image classification tasks that can employ a large number of common people for image annotations,fine-grained image classification usually requires expert-level knowledge.In addition to the common classification challenges of pose,lighting,and viewing changes,fine-grained datasets have larger inter-class similarity and intra-class variability.Therefore,it puts a high demand on the models to capture the subtle visual differences between classes and common intra-class characteristics.Furthermore,owing to the difficulty in obtaining samples of different categories,fine-grained datasets suffer from long-tail distribution problem.In summary,fine-grained data distribution has the characteristics of small,non-uniform,and indistinguishable inter-class differences,which also poses a huge challenge to the powerful deep learning algorithms.In this paper,we first introduce the formulation and challenges of fine-grained visual categorization tasks,and then illustrate two mainstream methods about local features and global features,as well as their advantages and disadvantages.Finally,we compare the performance of related works on common used datasets,and we make the required summarization and forecast.