Advanced Search
Prediction of cadmium bioconcentration factor for peanuts based on machine-learning methods
Received:December 18, 2023  
View Full Text  View/Add Comment  Download reader
KeyWord:soil;peanut;cadmium;random forest;prediction model
Author NameAffiliationE-mail
BI Weidong State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
University of Chinese Academy of Sciences, Beijing 100049, China 
 
DING Changfeng State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
University of Chinese Academy of Sciences, Beijing 100049, China 
cfding@issas.ac.cn 
ZHOU Zhigao State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China  
WANG Xingxiang State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing 210008, China
University of Chinese Academy of Sciences, Beijing 100049, China
Experimental Station of Red Soil, Chinese Academy of Sciences, Yingtan 335211, China 
 
Hits: 1222
Download times: 1323
Abstract:
      In this study, 100 pairs of soil and peanut samples were collected from 14 provinces in China. The soil-peanut cadmium(Cd) contamination characteristics and soil physicochemical properties were analyzed. Prediction models of the Cd bioconcentration in peanuts were established based on machine-learning methods and the important factors influencing Cd enrichment in peanuts were identified. The results showed that the soil samples collected were mainly acidic, with 60% of the soils being pH<6.5. The average Cd content in peanut kernels was 0.27 mg · kg-1 and the average bioconcentration factor was 2.42. The prediction performance was significantly better for the random forest models(R2=0.930–0.966), based on the data for the whole country, and the grouped northern and southern producing areas, than for the corresponding multiple linear regression models(R2=0.471–0.657). The results of random forest model analysis showed that the characteristic variables with high relative importance were different in different regions. The most important variables affecting the prediction of Cd bioconcentration in northern producing areas were the free manganese oxide content, free iron oxide content, and pH of the soil, while the most important variables affecting the Cd bioconcentration in southern producing areas were the free manganese oxide, clay, free iron oxide, and organic matter contents of the soil. The results revealed that, compared with the traditional multiple linear regression models, random forest models had better performance at predicting the Cd bioconcentration of peanuts. This provides a new perspective and solution for predicting Cd transfer in soil-peanut systems at a large scale in the field.