华南预防医学 ›› 2026, Vol. 52 ›› Issue (3): 264-268.doi: 10.12183/j.scjpm.2026.0264

• 论著 • 上一篇    下一篇

基于LSTM网络模型、LSTM-XGBoost模型的新疆地区肺结核发病趋势评价

马晓薇1, 古丽娜·巴德尔汗2, 依帕尔·艾海提2, 祖丽呼玛尔·艾尔肯2, 王森路2, 王希江2   

  1. 1.新疆医科大学公共卫生学院,新疆 乌鲁木齐 830011;
    2.新疆维吾尔自治区疾病预防控制中心
  • 收稿日期:2025-04-21 出版日期:2026-03-20 发布日期:2026-04-07
  • 通讯作者: 王希江,E-mail:2500681917@qq.com
  • 作者简介:马晓薇(1999—),女,在读硕士研究生,研究方向为结核病防治
  • 基金资助:
    新疆维吾尔自治区重点研发计划项目(2024B03021-3); 新发突发与重大传染病防控国家科技重大专项(2025ZD01901000); 新疆维吾尔自治区疾病预防控制中心科研基金项目(XJJK-2024010,XJJK-2024011)

An evaluation of tuberculosis incidence trends in Xinjiang using LSTM and LSTM-XGBoost models

Ma Xiaowei1, Gulina Badeerhan2, Yipaer Aiheiti2, Zulihumaer Aierken2, Wang Senlu2, Wang Xijiang2   

  1. 1. School of Public Health, Xinjiang Medical University, Urumqi, Xinjiang 830017, China;
    2. Xinjiang Uygur Autonomous Region Center for Disease Control and Prevention
  • Received:2025-04-21 Online:2026-03-20 Published:2026-04-07

摘要: 目的 基于长短期记忆(LSTM)神经网络模型、长短期记忆-极致梯度提升树(LSTM-XGBoost)模型预测新疆维吾尔自治区(简称“新疆”)5个县市肺结核发病趋势,为各县市结核病防控策略提供科学依据。方法 描述2011—2023年哈巴河县、尼勒克县、库尔勒市、皮山县和洛浦县结核病流行特征,分别对2011—2022年5个县市的肺结核年报告发病率建立LSTM神经网络模型和LSTM-XGBoost混合模型,评价模型对各县市2017—2023年预测效果并进行比较,预测各县市2024—2030年发病率趋势。结果 2011—2023年哈巴河县、尼勒克县、库尔勒市、皮山县和洛浦县平均报告发病率分别为112.21/10万、101.85/10万、56.86/10万、249.79/10万、359.78/10万。2017—2023年5个县市肺结核年发病率的报告实际值与LSTM模型、LSTM-XGBoost模型预测值的误差对比显示,LSTM-XGBoost模型在多个县市的MAEMAPERMSE较LSTM模型低。LSTM-XGBoost模型对2024—2030年的预测趋势与LSTM模型类似。至2030年,各县市的预测发病率分别为:哈巴河县30.48/10万(95% CI:25.0/10万~36.0/10万)、尼勒克县3.90/10万(95% CI:3.2/10万~4.6/10万)、库尔勒市24.46/10万(95% CI:20.1/10万~28.9/10万)、皮山县89.43/10万(95% CI:73.3/10万~105.5/10万)、洛浦县89.92/10万(95% CI:73.7/10万~106.1/10万)。与2015年实际值比较,2030年预测降幅分别为哈巴河县75.2%、尼勒克县96.7%、库尔勒市61.8%、皮山县68.4%、洛浦县78.8%。结论 LSTM神经网络模型、LSTM-XGBoost模型均可预测结核病发病趋势,LSTM-XGBoost模型在大多数县市的主要指标上表现出更优的预测性能,5个县市未来肺结核年发病率预测结果总体呈现下降趋势,为实现2030规划目标,各县市特别是高疫情县需采取更加针对性、综合性的防控措施。

关键词: 肺结核, LSTM神经网络, LSTM-XGBoost模型

Abstract: Objective To forecast the incidence trends of pulmonary tuberculosis in five counties and cities within the Xinjiang Uygur Autonomous Region (Xinjiang) utilizing Long Short-Term Memory (LSTM) network and LSTM-XGBoost models, with the aim of providing a scientific basis for tuberculosis prevention and control strategies in these localities. Methods This study first describes the epidemiological characteristics of tuberculosis in Habahe County, Nilek County, Korla City, Pishan County, and LuopuLuopu County from 2011 to 2023. Subsequently, an LSTM neural network model and an LSTM-XGBoost hybrid model were respectively established using the annual reported incidence rates of pulmonary tuberculosis for the period of 2011-2022 in the five selected localities. The predictive performance of these models for the years 2017-2023 was evaluated and compared. These models were then employed to project the incidence trends from 2024 to 2030. Results The average reported incidence rates from 2011 to 2023 were 112.21/100 000 in Habahe County, 101.85/100 000 in Nilek County, 56.86/100 000 in Korla City, 249.79/100 000 in Pishan County, and 359.78/100 000 in Luopu County. A comparison of the predictive accuracy for the 2017-2023 period revealed that the LSTM-XGBoost model demonstrated lower Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RMSE) values than the standalone LSTM model in several of the studied areas. The forecasted trends for 2024-2030 from the LSTM-XGBoost model were analogous to those of the LSTM model. By 2030, the projected incidence rates for the respective localities are as follows: Habahe County, 30.48/100 000 (95% CI: 25.0/100 000 - 36.0/100 000); Nilek County, 3.90/100 000 (95% CI: 3.2/100 000-4.6/100 000); Korla City, 24.46/100 000 (95% CI: 20.1/100 000 - 28.9/100 000); Pishan County, 89.43/100 000 (95% CI: 73.3/100 000 - 105.5/100 000); and Luopu County, 89.92/100 000 (95% CI: 73.7/100 000-106.1/100 000). Relative to the actual incidence rates in 2015, the anticipated reductions by 2030 are 75.2% for Habahe County, 96.7% for Nilek County, 61.8% for Korla City, 68.4% for Pishan County, and 78.8% for Luopu County. Conclusion Both the LSTM neural network model and the LSTM-XGBoost model are capable of predicting tuberculosis incidence trends. The LSTM-XGBoost model exhibited superior predictive performance across key metrics in the majority of the counties and cities. The projections indicate a general downward trend in the future annual incidence of pulmonary tuberculosis across the five localities. To achieve the 2030 planning objectives, it is imperative that each jurisdiction, particularly those with high epidemic burdens, implements more targeted and comprehensive prevention and control measures.

Key words: Pulmonary tuberculosis, LSTM neural network, LSTM-XGBoost model

中图分类号: 

  • R183.3