Advanced Search
Interpretable analysis of rice cadmium prediction based on ensemble learning
Received:February 06, 2025  
View Full Text  View/Add Comment  Download reader
KeyWord:rice cadmium;machine learning;ensemble learning;SHAP analysis;interaction analysis
Author NameAffiliationE-mail
DONG Zexin Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China
Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China 
 
AN Yi Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China
Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China 
simon8601@126.com 
DONG Qi Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China
Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China 
 
ZHANG Chenchen Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China
Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China 
 
SUN Sijia Agro-Environmental Protection Institute, Ministry of Agriculture and Rural Affairs, Tianjin 300191, China
Xiangtan Experimental Station of Chinese Academy of Agricultural Sciences, Xiangtan 411100, China
School of Resources and Environment, Northeast Agricultural University, Harbin 150030, China 
 
Hits: 1078
Download times: 790
Abstract:
      This study focuses on the major rice-growing provinces in southern China and collected 303 paired soil-rice samples. To address the generalization bottleneck of traditional methods in large-scale heterogeneous data, an innovative framework of“multi-scale feature engineering-hybrid evaluation strategy-explainability verification”was proposed. First, eight core indicators were selected based on variance filtering and Recursive feature elimination(RFE). Second, the Extreme gradient boosting(XGB)algorithm was employed, and its classification performance was compared with other models such as Random forest(RF)and Support vector machine(SVM). Finally, Shapley additive explanations(SHAP)was introduced to interpret the model's decision-making mechanism, revealing the interaction effects of features and non-linear threshold rules. The results showed that the model achieved an accuracy of 84.1%. Moreover, considering the balance between recall and precision, the model obtained an F1 score of 85.3%. This study demonstrates that ensemble learning can provide a scientific basis and technical support for the precise prediction of cadmium content in rice.