3. Methodology and Data
3.1. Methodology
3.1.1. GW-RF Model
The Random Forest (RF) model is a non-parametric ensemble learning method that constructs multiple decision trees to capture complex nonlinear relationships, offering an effective solution to challenges faced by government statistical systems in the digital economy era. Unlike traditional parametric models, RF does not require prior assumptions about variable correlations and is capable of efficiently handling high-dimensional data. However, conventional RF models generally treat spatial data as homogeneously distributed, assigning equal weights to all observations and thereby overlooking spatial heterogeneity. This limitation hinders their ability to detect localized differences across geographical contexts
[24] | Quiñones, S.; Goyal, A.; Ahmed, Z. U. Geographically Weighted Machine Learning Model for Untangling Spatial Heterogeneity of Type 2 Diabetes Mellitus (T2D) Prevalence in the USA. Sci Rep 2021, 11, https://doi.org/10.1038/s41598-021-85381-5 |
[24]
. The Geographically Weighted Regression (GWR) model, by contrast, is a local regression technique that incorporates a spatial weight matrix (SWM) to assign localized regression coefficients to each spatial unit (e.g., counties), thereby effectively capturing spatial heterogeneity. GWR reveals how the influence of explanatory variables on the dependent variable varies across geographic space. Nonetheless, it typically relies on linear assumptions and is thus unable to model nonlinear relationships among variables effectively
. To overcome the respective limitations of RF and GWR, this study employs the Geographically Weighted Random Forest (GW-RF) model. GW-RF integrates the nonlinear modeling capabilities of RF with the spatial weighting mechanism of GWR, enabling simultaneous accommodation of spatial heterogeneity and nonlinear interactions among variables. Specifically, GW-RF trains local random forest models using spatially weighted data, thereby providing a more accurate representation of spatial characteristics and nonlinear effects across different regions.
Specifically, the Geographically Weighted Random Forest (GW-RF) model constructs a localized forest regression model at each County Unit (CU) location based on its neighborhood data. This process centers around the Spatial Weight Matrix (SWM), integrating geographic distance information into the sample weighting mechanism. For each geographic unit, the weight between it and the sample is defined as. In this study, an adaptive kernel and bi-square weighting kernel function are employed to compute the spatial weights, with the calculation formula as follows:
(1) Here,

denotes the distance between County Unit

and

, while

represents the bandwidth parameter. Based on this weighting function, a local training set can be formed for each geographic unit

by selecting its neighboring samples, upon which a localized random forest model is constructed
[26] | Yu, Z.; Li, Z.; Yang, H.; Wang, Y.; Cui, Y.; Lei, G.; Ye, S. Contrasting Responses of Spatiotemporal Patterns of Cropland to Climate Change in Northeast China. Food Sec. 2023, 15, 1197–1214, https://doi.org/10.1007/s12571-023-01379-z |
[26]
. Therefore, the prediction result of GW-RF at location

can be expressed as

, where represents the GW-RF prediction for I, and

denotes the coordinates of center point

:
(2) 3.1.2. SHAP Model
To enhance the interpretability of the model, this study adopts SHAP (Shapley Additive Explanations) to quantify the contribution of each influencing factor to cropland fragmentation and to reveal its direction of effect
[27] | Antwarg, L.; Miller, R. M.; Shapira, B.; Rokach, L. Explaining Anomalies Detected by Autoencoders Using Shapley Additive Explanations. Expert Systems with Applications 2021, 186, 115736, https://doi.org/10.1016/j.eswa.2021.115736 |
[27]
. SHAP values are derived from the Shapley value in cooperative game theory, aiming to assign each feature a contribution value to the model's prediction. For a modelwith a feature set and an input sample, the SHAP value of a feature is defined as:
(3) where

is a subset of features that does not include feature

,

denotes the model output when only the features in subset

are used for prediction, and

represents the total number of features.
3.1.3. Getis-Ord Gi* Hotspot Analysis
Hotspot analysis (Getis-Ord Gi*) is a spatial statistical technique used to detect clustering patterns in geographic data. When analyzing the spatial distribution of cultivated land fragmentation in China, this method helps identify areas with significant concentrations of high or low values, thereby distinguishing “hotspots” and “cold spots.” By calculating the

statistic and its corresponding

-score for each county unit, one can determine whether a given region exhibits significant clustering of cultivated land fragmentation
[28] | Chen, E.; Ye, Z.; Wu, H. Nonlinear Effects of Built Environment on Intermodal Transit Trips Considering Spatial Heterogeneity. Transportation Research Part D: Transport and Environment 2021, 90, 102677, https://doi.org/10.1016/j.trd.2020.102677 |
[28]
. The calculation formulas are as follows (
4) and (
5):
(4)
(5) In equation (
5),

denotes the clustering index of county unit

;

represents the standardized value, reflecting the deviation between the expected value and the variance. When the

-value is greater than 0 and the

-value is less than 0.1, the location is identified as a hotspot; conversely, when the

-value is less than 0 and the

-value is less than 0.1, the location is identified as a cold spot.
3.2. Indicator Selection and Data Sources
The cropland fragmentation index, as an important tool for measuring the degree of cropland fragmentation, has been widely validated in previous studies. In the context of data governance and the modernization of government statistics, this study adopts a multi-indicator comprehensive evaluation approach, emphasizing the shared-economy-oriented statistical monitoring methods. Drawing on both domestic and international literature, the study constructs a composite index system by selecting nine representative indicators, including the PLAND, LPI, SHAPE_MN, FRAC_MN, PARA_MN, PLADJ, COHESION, DIVISION, AI. All indicators are calculated at the county-unit level using Fragstats 4.2 software.
To simplify the modeling and interpretation of the complex nonlinear relationships between cropland fragmentation and multiple factors, principal component analysis (PCA) was introduced to reduce the dimensionality of landscape indices. After extracting the principal components, they were weighted and integrated to form the comprehensive cropland fragmentation index

[29] | Liu, J.; Jin, X.; Xu, W.; Sun, R.; Han, B.; Yang, X.; Gu, Z.; Xu, C.; Sui, X.; Zhou, Y. Influential Factors and Classification of Cultivated Land Fragmentation, and Implications for Future Land Consolidation: A Case Study of Jiangsu Province in Eastern China. Land Use Policy 2019, 88, 104185, https://doi.org/10.1016/j.landusepol.2019.104185 |
[29]
. The calculation formula is as follows, where

represents the

extracted principal component, and its corresponding weight is the variance contribution rate of that component:
(6) The data used in this study come from the following sources: 1 km resolution land use remote sensing monitoring data of China for the years 1995, 2000, 2005, 2010, 2015, and 2020; the nine major agricultural zoning datasets of China; China’s SRTM 3 arc-second (90m) DEM data; annual precipitation and temperature data; population and GDP data sourced from the Resource and Environmental Science and Data Center of the Chinese Academy of Sciences (http://www.resdc.cn/).
Table 1. Typical indicators of Landscape Fragmentation.
Indicator level | Indicator meaning |
PLAND | proportion of cropland area in the entire landscape (%) |
LPI | degree of cropland concentration (%) |
SHAPE_MN | shape complexity of cropland patches |
FRAC_MN | morphological complexity of cropland patches |
PARA_MN | irregularity of patches |
PLADJ | adjacency of same-type cropland |
COHESION | cropland connectivity (%) |
DIVISION | degree of landscape fragmentation |
AI | continuity of cropland distribution (%) |
4. Results
4.1. Spatiotemporal Pattern Analysis of Farmland Fragmentation
4.1.1. Spatiotemporal Evolution Characteristics of Farmland Fragmentation in China
From 1995 to 2020, the cultivated land landscape in China exhibited a staged evolution characterized by “initial concentration followed by subsequent fragmentation” (
Table 2). However, the direction and magnitude of changes across different dimensional indicators varied significantly, reflecting the complexity and regional heterogeneity of the fragmentation process. The composite fragmentation index

increased from 0.0139 in 1995 to a peak of 0.0208 in 2005, before gradually declining to 0.0112 by 2020. Although the overall trend of fragmentation at the national scale has moderated, localized fragmentation pressures have continued to intensify. During this period, the trajectories of cultivated land concentration, morphological complexity, and spatial connectivity diverged, resulting in a governance dilemma characterized by “macro-level improvement and micro-level deterioration.”
In terms of cultivated land quantity and concentration, both the percentage of landscape (PLAND) and the largest patch index (LPI) showed slight increases from 40.2% and 32.1% in 1995 to 40.6% and 32.6% in 2000, indicating initial progress in large-scale land use. However, both indicators declined steadily after 2000, reaching 37.4% and 28.7%, respectively, reflecting intensified patch fragmentation. Regarding morphological complexity, the mean shape index (SHAPE\_MN) decreased from 1.92 to 1.80, suggesting that high-standard farmland construction contributed to more regular macro-level boundaries. In contrast, the mean perimeter-area ratio (PARA\_MN) increased from 30.1 to 31.5, indicating a worsening of edge fragmentation, such as jagged boundaries and nested micro-patches. The mean fractal dimension (FRAC\_MN) remained relatively stable between 1.04 and 1.05, implying low-level fluctuations in geometric boundary complexity. In terms of spatial connectivity, the percentage of like adjacencies (PLADJ) declined from 54.2 to 51.7, the aggregation index (AI) fell from 57.8 to 55.0, and the landscape cohesion index (COHESION) dropped from 86.0 to 84.4, all of which indicate increasing spatial dispersion of patches. Meanwhile, the landscape division index (DIVISION) rose from 0.808 to 0.839, further suggesting a heightened degree of patch isolation. This multifaceted pattern—characterized by “scale contraction, edge fragmentation, and spatial dispersion”—underscores the need for integrated governance approaches that coordinate both quantitative control and morphological optimization of cultivated land.
Table 2. The degree of Landscape Fragmentation in China from 1995 to 2020.
Year | PLAND | LPI | SHAPE_MN | FRAC_MN | PARA_MN | PLADJ | COHESION | DIVISION | AI | LF |
1995 | 40.2 | 32.1 | 1.91 | 1.05 | 30.1 | 54.2 | 86.0 | 0.808 | 57.8 | 0.0139 |
2000 | 40.6 | 32.6 | 1.92 | 1.05 | 30.2 | 54.2 | 86.3 | 0.805 | 57.5 | 0.0194 |
2005 | 39.9 | 31.7 | 1.89 | 1.05 | 30.5 | 53.6 | 85.8 | 0.813 | 56.9 | 0.0208 |
2010 | 39.5 | 31.2 | 1.87 | 1.04 | 30.7 | 53.2 | 85.5 | 0.818 | 56.5 | 0.0205 |
2015 | 38.9 | 30.4 | 1.84 | 1.04 | 31.0 | 52.5 | 85.1 | 0.824 | 55.8 | 0.0196 |
2020 | 37.4 | 28.7 | 1.80 | 1.04 | 31.5 | 51.7 | 84.4 | 0.839 | 55.0 | 0.0112 |
1995 | 40.2 | 32.1 | 1.91 | 1.05 | 30.1 | 54.2 | 86.0 | 0.808 | 57.8 | 0.0139 |
Notably, despite a later moderation in the overall fragmentation trend, spatially continuous cold and hot spot clusters of cultivated land fragmentation emerged across all six periods from 1995 to 2020 (
Figure 1). Hot spots were predominantly distributed in the central-eastern Northern Arid and Semi-Arid Region, the southern Qinghai-Tibet Plateau Region, the Yunnan-Guizhou Plateau, South China, and the southern Yangtze River Middle-Lower Reaches, while cold spots were primarily concentrated in the central Northeast China Plain, the Huang-Huai-Hai Plain, the eastern Loess Plateau, the northern Yangtze River Middle-Lower Reaches, and the Sichuan Basin. Further analysis revealed significant spatio-temporal dynamics in the fragmentation process within these zones: Hot spots in western and northeastern Xinjiang (Northern Arid and Semi-Arid Region) fluctuated through phases of "expansion–contraction–insignificance", contrasting with an "initial intensification followed by mitigation" trend observed in western Inner Mongolia. The Songnen Plain (Northeast China Plain) remained dominated by cold spots with peripheral hot spots, exhibiting marked cold spot expansion during 1995–2000 and 2015–2020. Qinghai (Qinghai-Tibet Plateau) experienced hot spots that "initially contracted before expanding later"; concurrently in Tibet, increased cultivated land converted some counties into hot spots alongside an expansion of insignificant areas. A noticeable reduction in cultivated land occurred in parts of the eastern Loess Plateau and Huang-Huai-Hai Plain around 2005. In the Sichuan Basin, cold spots shrank and partially transitioned to hot spots during 1995–2000, re-expanded by 2005, and subsequently stabilized with minor fluctuations. The Yangtze River Middle-Lower Reaches displayed relatively stable patterns, bisected by an insignificant zone with persistent hot spots in the south and cold spots in the north, showing minimal overall change. Throughout the study period, the Yunnan-Guizhou Plateau and South China consistently maintained concentrated hot spot distributions.
Figure 1. Spatiotemporal Evolution of Farmland Fragmentation in Major Agricultural Regions of China.
4.1.2. Spatiotemporal Evolution of Farmland Fragmentation in Major Agricultural Regions of China
Based on this foundation, this study investigated the cultivated land fragmentation patterns across China's nine major agricultural regions from 1995 to 2020, revealing significant regional differentiation and phased evolutionary characteristics. The Percentage of Landscape (PLAND) exhibited an inverted V-shaped pattern in the Northeast China Plain, Huang-Huai-Hai Plain, Loess Plateau, Yunnan-Guizhou Plateau, and South China, rising from 1995 to a peak of 40.6% in 2000 before gradually declining to 37.4% by 2020. Conversely, PLAND in the Yangtze River Middle-Lower Reaches and Sichuan Basin showed a continuous decrease, while the Northern Arid and Semi-Arid Region experienced steady growth from 32.1% to 35.8%. The trend of the Largest Patch Index (LPI) was highly coupled with PLAND; the Qinghai-Tibet Plateau displayed a distinct fluctuating pattern of "increase–decrease–recovery", declining from 32.6% to 28.7% before rebounding to 30.2%, whereas LPI in the Yangtze River Middle-Lower Reaches decreased continuously from 33.5% to 26.8%. Among morphological complexity metrics, the Mean Shape Index (SHAPE_MN) in the Northeast China Plain underwent two cycles of fluctuation, rising from 1.82% to 1.85% before falling to 1.79%; SHAPE_MN in the Huang-Huai-Hai Plain increased from 1.89% to 1.93% and then decreased to 1.78%. The Mean Perimeter-Area Ratio (PARA_MN) increased in the Northeast China Plain (from 30.1% to 31.5%) and the Yangtze River Middle-Lower Reaches (from 29.3% to 32.1%), indicating intensified edge fragmentation. Regarding spatial connectivity, the Percentage of Like Adjacencies (PLADJ) decreased in the Huang-Huai-Hai Plain (from 54.2% to 49.4%) and the Yangtze River Middle-Lower Reaches (from 53.1% to 48.9%). Landscape Cohesion (COHESION) decreased from a peak of around 86.0% to 84.4% in most regions, while it increased in the Qinghai-Tibet Plateau (from 83.5% to 85.2%). The Landscape Division Index (DIVISION) increased in the Huang-Huai-Hai Plain (from 0.808 to 0.839) and the Yangtze River Middle-Lower Reaches (from 0.795 to 0.827). The Aggregation Index (AI) decreased in the Huang-Huai-Hai Plain (from 57.8% to 53.2%) and fluctuated in the Qinghai-Tibet Plateau (from 51.5% down to 49.8% before recovering to 50.6%). Overall, the period around 2000 marked a critical turning point, with declining concentration and intensifying fragmentation in most regions. The Northern Arid and Semi-Arid Region exhibited contrary growth in PLAND (+3.7%), while the Qinghai-Tibet Plateau demonstrated a unique fluctuating trajectory driven by ecological policy interventions.
Figure 2. Landscape fragmentation in nine major agricultural areas of China from 1995 to 2020.
4.1.3. Spatiotemporal Evolution of Farmland Fragmentation at the County Level
To gain deeper insights into the spatial heterogeneity and regional evolutionary characteristics of cultivated land fragmentation in China,
Figure 3 systematically depicts its spatial distribution patterns and dynamic trends based on the change rates of the cultivated land fragmentation index at the county/district scale from 1995 to 2020. Nationally, 1,321 counties/districts exhibited decreased fragmentation levels, with Yunnan (-18.7%), Sichuan (-15.2%), Guangxi (-12.1%), Guangdong (-10.8%), and Hunan (-9.3%) showing the most significant reductions. However, this improvement displayed spatial imbalance: Sichuan (+8.4%), Henan (+7.9%), Hebei (+7.2%), Heilongjiang (+6.5%), and Anhui (+5.8%) simultaneously ranked among provinces with the most severe fragmentation intensification, forming a concentrated intensification cluster centered on the Huang-Huai-Hai Plain (annual increase: 4.1%). This contradictory pattern reflects asynchronous land-use transitions across regions—slope-to-terrace conversion projects reduced average slope gradients by 15° in southwestern hilly areas, directly driving fragmentation down, while urbanization (annual growth: 1.2%) in the Huang-Huai-Hai Plain triggered farmland fragmentation through non-agricultural conversion.
Temporally, the proportion of counties experiencing fragmentation intensification showed a "rise-fall-rise" fluctuation across three periods: intensification occurred in 21.3% (1995–2000), 18.7% (2000–2010), and 23.4% (2010–2020) of counties, with 68.2% of intensifying counties exceeding a 5% increase. This indicates phased reversals in fragmentation control effectiveness, and the post-2010 intensification trend exhibited spatio-temporal coupling with slowing land transfer growth (annual decline: 2.1%).
The Qinghai-Tibet Plateau showed fragmentation expansion post-2010 (annual rise: 3.7%), with pronounced increases in northern Tibet and Qinghai, reflecting unique challenges in balancing cultivation with ecological constraints under a 15.6% increase in ecological protection redline coverage that compressed marginal farmland. Although southwestern and northeastern Xinjiang maintained high fragmentation levels (>0.35), the region saw an overall 12.6% reduction by 2020, particularly in northwestern counties (-18.9%), demonstrating combined effects of border security policies and land consolidation initiatives (implemented since 2010, covering 870,000 ha). High-fragmentation zones (>0.4) in western and northeastern Inner Mongolia showed marginal improvement (-9.4%) during 2010–2020, potentially linked to the Grassland Subsidy Policy (annual subsidy growth: 12%) and scaled farming promotion (land transfer increase: 15.7%). Southwestern provinces (Yunnan, Guangxi, Guangdong, Hunan) achieved systemic improvement through slope-to-terrace conversion (>3 million ha cumulative implementation) and land consolidation, while the Huang-Huai-Hai Plain (Henan, Hebei, Anhui) and Northeast China Plain (Heilongjiang) experienced intensification due to strained human-land relations (per capita farmland decreased to 0.08 ha). This provincial divergence—"southern alleviation versus northern intensification"—reflects both natural constraints (63% of southwestern farmland has slopes >15°) and regional disparities in agricultural development strategies (e.g., lagging scaled operations in northern grain-producing regions).
Figure 3. The rate of landscape fragmentation at county level in China from 1995 to 2020.
4.2. In-Depth Analysis of Influencing Factors of Farmland Fragmentation
4.2.1. Attribution Analysis of Farmland Fragmentation in China
(i). Selection of Influencing Factors
To comprehensively analyze the combined effects and interactions of factors influencing cultivated land fragmentation (LF), this study selects LF as the dependent variable and identifies six representative independent variables reflecting natural environmental and socioeconomic influences. First, annual precipitation is chosen as a critical natural factor affecting agricultural production. Precipitation directly influences soil moisture, crop growth cycles, and agricultural management practices, thereby impacting land parceling and utilization efficiency. Second, the annual average temperature serves as a vital climatic variable. Suitable temperature conditions facilitate healthy crop growth, whereas unfavorable temperatures can reduce yields or degrade quality, consequently affecting land use patterns and fragmentation levels. Third, mean elevation significantly affects natural conditions such as climate, soil, and vegetation, which in turn influence agricultural practices and land use efficiency. High-altitude areas typically experience lower temperatures and uneven precipitation distribution, leading to more scattered land distribution and higher fragmentation degrees. Fourth, slope represents a key topographic factor influencing land use and agricultural production. Steep slopes restrict the application of mechanized farming, increase cultivation difficulty and costs, thereby affecting land use efficiency and fragmentation. Fifth, per capita regional GDP reflects the level of economic development, which strongly impacts land use and agricultural structure. Higher GDP generally implies stronger economic capacity and greater agricultural investment, potentially promoting large-scale land operations and land consolidation, thus reducing fragmentation. Sixth, population density directly affects land use and agricultural activities. Densely populated areas often face greater land pressure, leading to increased land parceling and higher fragmentation levels.
(ii). Spatial Autocorrelation Analysis
To assess the spatial clustering characteristics of cultivated land fragmentation (LF), Moran’s I index was employed for spatial autocorrelation analysis. Moran’s I is a widely used statistic for measuring the degree of spatial clustering in a dataset, with values ranging from -1 to 1. Positive values indicate spatial positive autocorrelation, negative values indicate spatial negative autocorrelation, and values near zero suggest spatial randomness. The analysis results reveal that throughout the study period from 1995 to 2020, LF values consistently exhibited significant positive spatial autocorrelation (p < 0.01).
Table 3. Spatial autocorrelation test results.
Year | Moran’s I | P | Spatial autocorrelation diagnosis |
1995 | 0.8756 | 0.0000 | Significantly positive correlation |
2000 | 0.8736 | 0.0000 | Significantly positive correlation |
2005 | 0.8689 | 0.0000 | Significantly positive correlation |
2010 | 0.8650 | 0.0000 | Significantly positive correlation |
2015 | 0.8597 | 0.0000 | Significantly positive correlation |
2020 | 0.8527 | 0.0000 | Significantly positive correlation |
(iii). Multicollinearity Diagnosis
By calculating and analyzing the Variance Inflation Factors (VIF) of the independent variables across different years (1995–2020), the results indicate that the VIF values for all variables in each year remain well below the commonly accepted threshold of 10. Specifically, the VIF values across years are relatively stable and consistently low, suggesting the absence of serious multicollinearity among the explanatory variables. This confirms the appropriateness of variable selection and provides a robust foundation for subsequent model construction and analysis, effectively avoiding issues such as unstable coefficient estimates, inflated standard errors, and reduced model interpretability caused by multicollinearity.
Table 4. Variable multicollinearity diagnostic results.
project | Precipitation | Temperature | Elevation | Slope | GDP | Population |
VIF1995 | 2.7267 | 3.3090 | 3.3956 | 2.7152 | 2.0951 | 2.1705 |
VIF2000 | 3.1507 | 3.5970 | 3.2616 | 2.8004 | 1.7909 | 1.8573 |
VIF2005 | 2.8242 | 3.0799 | 3.2943 | 2.7278 | 2.0352 | 1.9849 |
VIF2010 | 2.3174 | 2.4466 | 3.2502 | 2.7448 | 2.0890 | 1.9632 |
VIF2015 | 2.7539 | 3.2584 | 3.3385 | 2.6461 | 3.2871 | 3.3652 |
VIF2020 | 2.6272 | 2.9979 | 3.4849 | 2.6846 | 2.9722 | 3.0861 |
(iv). Model Selection
The dataset was divided into training, validation, and test sets in an 8:1:1 ratio, and the GW-RF model was employed for training and prediction. Scatter plots were generated for each dataset to visually assess the model’s fitting performance, illustrating the relationship between predicted and actual values in both the training and test sets. In the plots, training data points are represented by light blue circles, while test data points are shown as dark blue circles. Performance metrics, including R2, RMSE, and MAE, are also displayed to provide a comprehensive evaluation of the model’s performance. As shown in
Figure 4, for all six time points, the models achieved R2 values of approximately 0.8, and the differences in performance metrics between the training and test sets were minimal. This indicates that the models did not suffer from overfitting. These results further confirm the effectiveness of the GW-RF model as a robust approach for capturing spatial heterogeneity and nonlinear relationships in the analysis of cultivated land fragmentation in China.
Figure 4. The fitted scatter plot of the GW-RF model.
(v). Descriptive Analysis
As shown in
Table 5, variable importance in the GW-RF model is measured by the percentage increase in mean squared error (%IncMSE). The results for 2020 indicate that slope (180.65) and population density (178.34) contributed the most to model prediction, suggesting that topographic complexity and the intensity of human activity are the primary driving factors. Elevation (172.50) and precipitation (115.78) followed in importance, highlighting the significant influence of natural geographical conditions on regional fragmentation patterns. In contrast, temperature (81.25) and GDP (60.36) showed lower importance scores, possibly due to interactions with other variables or region-specific effects. The descriptive statistics of these variables further reveal strong spatial heterogeneity. Slope and population density exhibited large standard deviations (10.22 and 13.56, respectively), indicating highly uneven spatial distributions and the presence of local extremes. Precipitation and temperature had moderate standard deviations (33.94 and 25.30), reflecting regional climatic differences with relatively stable variability. GDP displayed the smallest standard deviation (13.08), which may suggest a more balanced level of regional economic development or the presence of spatial smoothing effects in the data.
Table 5. Variable multicollinearity diagnostic results.
Variables/Indicators | %IncMSE | Minimum | Maximum | Mean | Standard |
Precipitation | 115.78 | 2.34 | 180.69 | 80.56 | 33.94 |
Temperature | 81.25 | 6.78 | 35.89 | 56.95 | 25.30 |
Elevation | 172.50 | -14.32 | 56.78 | 14.35 | 11.68 |
Slope | 180.65 | -3.85 | 55.06 | 21.94 | 10.22 |
GDP | 60.36 | 0.75 | 119.08 | 84.48 | 13.08 |
Population | 178.34 | 0.06 | 98.06 | 54.32 | 13.56 |
4.2.2. Analysis of Nonlinear Effects of Influencing Factors
Figure 5 combines bee swarm plots and feature dependence plots to illustrate the SHAP values of the influencing factors of cultivated land fragmentation from 1995 to 2020. This visualization reveals both the positive and negative effects of each factor on fragmentation, as well as their relative importance rankings in the GW-RF model. The bar charts show that slope and population density consistently ranked as the top two contributors to cultivated land fragmentation throughout the study period, with slope being the most important factor, followed by population density. From 1995 to 2015, precipitation had a greater influence on fragmentation than elevation; however, from 2015 to 2020, elevation surpassed precipitation in importance. In contrast, GDP and temperature had relatively minor effects on fragmentation. Specifically, before 2000, GDP exerted a greater influence than temperature. Between 2000 and 2015, temperature played a more significant role than GDP, while in the 2015–2020 period, GDP once again became more influential than temperature.
Figure 5. Global interpretation graph.
The bee swarm plots visualize the SHAP values of each variable across all county units (CUs), indicating both the direction and magnitude of their effects. In the plots, red dots represent higher feature values, while blue dots denote lower ones. A SHAP value above zero indicates a positive contribution to cultivated land fragmentation, whereas a value below zero indicates a negative contribution. Overall, slope, population density, precipitation, elevation, GDP, and temperature all show a positive correlation with land fragmentation. Regions with higher levels of fragmentation correspond to higher SHAP values, while areas with lower fragmentation levels show lower SHAP values—demonstrating that, from 1995 to 2020, regions with higher values of these factors tended to experience greater degrees of fragmentation. Specifically, for slope, the SHAP values of low-slope regions extend further left to below –2.0, suggesting a strong mitigating effect of gentle terrain on fragmentation, promoting land contiguity. In contrast, high-slope regions show positive SHAP values, with cluster centers shifting rightward over time—from left of 1.0 to beyond—indicating that the impact of steep terrain on fragmentation intensified year by year. For population density, SHAP value clusters in sparsely populated areas gradually decreased over time, while those in densely populated regions remained consistently higher and extended rightward, reaching up to 1.5 by 2020. This trend confirms that the influence of sparsely populated areas on fragmentation has weakened over time, whereas the impact of densely populated areas has grown, especially by 2020, when higher population density became a more significant contributor to fragmentation. Regarding precipitation, a clustering pattern is observed in low-precipitation areas, while SHAP values in high-precipitation areas initially extend rightward. After 2015, this extension diminishes and clustering emerges. This suggests that prior to 2015, the intensifying effect of high precipitation on fragmentation outweighed the mitigating effect of low precipitation. However, after 2015, the effects of both high and low precipitation levels on fragmentation tended to balance out. In terms of elevation, high-elevation areas show SHAP value clusters between 0 and 0.5, extending up to around 1.0. In contrast, low-elevation areas show fewer clusters, with values extending only to about –0.1, indicating a more pronounced effect of high elevations on fragmentation. For GDP, SHAP values in high-GDP areas first exhibit a reduced rightward extension, then increase again, while those in low-GDP areas cluster between –0.5 and 0. This suggests that from 1995 to 2005, the effects of GDP increases and decreases on fragmentation were roughly balanced. However, from 2005 to 2020, the effect of rising GDP in intensifying fragmentation clearly exceeded the mitigating effect of GDP declines. Lastly, for temperature, between 1995 and 2010, the rightward extension of SHAP values in high-temperature areas remained around 0.5, while that of low-temperature areas remained flat. From 2015 to 2020, the extension in high-temperature zones increased significantly, indicating a growing intensifying effect of rising temperatures on cultivated land fragmentation.
Figure 6. SHAP-PDP graph.
The marginal effects of the key influencing factors on cultivated land fragmentation (LF) reveal distinct nonlinear patterns, as captured by SHAP value analyses: Elevation exhibits a pronounced nonlinear positive relationship with LF. In low-elevation areas (approximately 0–200 m), SHAP values are largely negative, indicating a suppressive effect on fragmentation. As elevation increases to 200–1000 m, SHAP values rise sharply into positive territory, suggesting increasing fragmentation pressure. Beyond 1000 m, SHAP values continue to increase, implying that higher elevations significantly exacerbate land fragmentation due to harsher natural conditions and scattered land use patterns. Slope shows a steep increase in marginal contribution to fragmentation at lower degrees, followed by a stabilization. When slope is low (0–3°), SHAP values rapidly shift from negative to positive, indicating that even small increases in slope can significantly reduce land suitability and increase fragmentation. Between 3° and 5°, SHAP values plateau, and beyond 5°, they remain high but relatively stable, suggesting diminishing marginal effects in steeper regions—likely because such areas are already highly fragmented. Temperature demonstrates a U-shaped nonlinear relationship. In low-temperature zones (0–10°C), the SHAP values fluctuate slightly around zero, indicating weak influence. Starting around 10°C, the marginal impact becomes increasingly negative, peaking at approximately 18°C—implying that moderate warming enhances land use efficiency and reduces fragmentation. However, beyond 20°C, SHAP values rise sharply into positive territory, peaking near 25°C. This suggests that in high-temperature (tropical/subtropical) zones, elevated temperatures may increase fragmentation due to natural hazards and limited agricultural viability. Precipitation shows a sine-like nonlinear pattern: an initial decline, followed by an increase, and then another decline. In areas with low precipitation (<625 mm), SHAP values decrease as rainfall increases, reaching a minimum around 625 mm, suggesting improved land cohesion with better moisture availability. Between 625 mm and 1200 mm, SHAP values rise and turn positive, indicating that excess rainfall starts contributing to fragmentation—potentially due to erosion or reduced land usability. SHAP values peak around 2000 mm, then decline again at extremely high precipitation levels, reflecting complex hydrological impacts and diminishing marginal effects. GDP follows a "check-mark" shaped nonlinear path. In less developed areas (GDP < 12.5 million CNY), GDP and SHAP values are negatively correlated, indicating that economic development helps consolidate land and reduce fragmentation. Around 12.5 million CNY, the relationship reverses, and fragmentation increases with rising GDP—peaking around 43 million CNY—likely due to urban expansion and land-use transformation. However, as GDP exceeds 116.5 million CNY, SHAP values begin to decline slightly, suggesting that advanced economies may mitigate fragmentation through land-use planning and policy interventions. Population shows a combined "check-mark" and inverted U-shaped relationship. In counties with populations under 1.25 million, SHAP values decline with increasing population, suggesting that moderate population growth supports land consolidation. After 1.25 million, SHAP values begin to rise, turning positive around 2.5 million, and peaking at nearly 10 million—indicating intensifying fragmentation due to high population pressure. Beyond this point, SHAP values gradually decrease, implying that in extremely high-population regions, fragmentation effects may taper off—potentially due to urban land consolidation or saturation effects. These findings underscore the complex, threshold-driven nature of fragmentation drivers and highlight the importance of region-specific policies for sustainable land management.
4.2.3. Analysis of Spatial Heterogeneity of Influencing Factors
This study further investigates the spatial heterogeneity in the effects of precipitation, temperature, elevation, slope, GDP, and population on cultivated land fragmentation. Spatial distribution maps of local regression coefficients reveal significant regional disparities in how each factor influences fragmentation. Among natural factors, precipitation shows a strong positive influence in the central Northeast Plain and southern Qinghai–Tibet Plateau, suggesting that increased rainfall may exacerbate land fragmentation in these areas. Temperature, by contrast, generally exhibits a negative effect, especially in western Sichuan, the Qinghai–Tibet Plateau, Inner Mongolia, and parts of Hunan, where large negative coefficients are observed. However, in the northern Yangtze River Delta and southern Huang-Huai-Hai region, temperature has a positive influence, reflecting differing regional climate adaptations. Regarding topographical factors, elevation predominantly has a positive impact on fragmentation, indicating that higher altitudes are generally unfavorable to land consolidation. Yet, in northern Northeast China, the northern Huang-Huai-Hai Plain, and eastern middle-lower reaches of the Yangtze River, elevation effects are negative, suggesting that high-standard farmland construction or policy interventions may mitigate the negative impacts of elevation in these regions. Slope tends to have a negative effect overall, with particularly large negative coefficients in western parts of the Northeast Plain, possibly due to the constraints slope places on agricultural mechanization, which leads to increased fragmentation. Conversely, some areas in the central-northern Huang-Huai-Hai Plain show positive coefficients, underscoring regional variability in slope impacts. Among socioeconomic factors, GDP generally has a positive effect on fragmentation, indicating that economic development may promote land subdivision and fragmented land use. Population, however, shows a more pronounced and widespread influence—especially across major plain regions—highlighting population pressure as a key driver of land fragmentation. In contrast, in the western Yunnan-Guizhou Plateau, both GDP and population exhibit negative effects, likely due to successful land consolidation policies or terrain constraints that limit development intensity in these mountainous areas.
Figure 7. Local coefficient diagram of GW-RF model.