Abstract:This paper proposes a data-driven regionalization algorithm based on machine learning and modern statistical methods to address current issues such as limited climate regionalization variables, underutilized information, and insufficient consideration of climate change impacts. Firstly, we use the Mann-Kendall test and sliding t-test to identify change points of time series of primary variables and segment the study period accordingly. Next, we employ BP canonical correlation analysis to select covariates and establish a multivariate Self-Organizing Map (SOM) clustering algorithm to achieve climate regionalization for different stages. Finally, we analyze the practical significance of regionalization results in combination with climate zone profiles, and assess the impact of climate change on climate regionalization. Experimental results demonstrate that the proposed regionalization algorithm, driven by data rather than contour lines of primary variables or manually set thresholds, improves data utilization and ensures a more objective and rational regionalization process. By incorporating multiple covariates and climate change impacts into the algorithm, the efficiency and reliability of regionalization are effectively enhanced without considering climate background during the regionalization process.