|
| Interpretable analysis of rice cadmium prediction based on ensemble learning |
| Received:February 06, 2025 |
| View Full Text View/Add Comment Download reader |
| KeyWord:rice cadmium;machine learning;ensemble learning;SHAP analysis;interaction analysis |
| Author Name | Affiliation | E-mail | | DONG Zexin | Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China | | | AN Yi | Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China | simon8601@126.com | | DONG Qi | Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China | | | ZHANG Chenchen | Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China | | | SUN Sijia | Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China School of Resources and Environment, Northeast Agricultural University, Harbin 150030, China | |
|
| Hits: 1078 |
| Download times: 790 |
| Abstract: |
| This study focuses on the major rice-growing provinces in southern China and collected 303 paired soil-rice samples. To address the generalization bottleneck of traditional methods in large-scale heterogeneous data, an innovative framework of“multi-scale feature engineering-hybrid evaluation strategy-explainability verification”was proposed. First, eight core indicators were selected based on variance filtering and Recursive feature elimination(RFE). Second, the Extreme gradient boosting(XGB)algorithm was employed, and its classification performance was compared with other models such as Random forest(RF)and Support vector machine(SVM). Finally, Shapley additive explanations(SHAP)was introduced to interpret the model's decision-making mechanism, revealing the interaction effects of features and non-linear threshold rules. The results showed that the model achieved an accuracy of 84.1%. Moreover, considering the balance between recall and precision, the model obtained an F1 score of 85.3%. This study demonstrates that ensemble learning can provide a scientific basis and technical support for the precise prediction of cadmium content in rice. |
|
|
|