Machine learning detection of manipulative environmental disclosures in corporate reports

Detecting manipulative environmental disclosures remains a critical yet unresolved challenge for regulators and investors. This study proposes a machine learning framework that integrates financial indicators, textual sentiment, and public attention data to identify potential manipulation among Chinese listed firms. A Random Forest model is trained using multi-source features derived from corporate reports and Baidu Index trends. The optimized model demonstrates strong discriminatory ability under severe class imbalance (ROC-AUC = 0.94, PR-AUC = 0.78, Balanced Accuracy = 0.86, MCC = 0.72), indicating robust and reliable performance across both majority and minority classes. Evaluation through balanced metrics further confirms the model’s genuine predictive capacity rather than overfitting to training data. SHAP-based interpretation reveals that financial pressure, abnormal public attention, and sentiment deviation are the primary determinants of manipulation risk. Overall, the framework highlights how interpretable machine learning can strengthen data-driven environmental supervision. The findings are context-specific to the Chinese market due to reliance on Baidu-based indicators, warranting validation in other regulatory contexts in future research.

Description

Publication status: Published

Funder: Nanjing Tech University Joint Corporate Postdoctoral Research Program

Keywords

Machine learning, Corporate transparency, Manipulative behavior detection, Environmental information disclosure, Data imbalance techniques

Journal Title

Scientific Reports

Journal ISSN

2045-2322

Volume Title

15

Publisher

Nature Publishing Group UK

Publisher DOI

https://doi.org/10.1038/s41598-025-29621-y

Rights and licensing

Except where otherwised noted, this item's license is described as http://creativecommons.org/licenses/by-nc-nd/4.0/

Collections

Jisc Publications Router