华南预防医学 ›› 2025, Vol. 51 ›› Issue (2): 142-147.doi: 10.12183/j.scjpm.2025.0142

• 论著 • 上一篇    下一篇

乙肝肝硬化患者次年住院风险及高医疗费用预测模型的构建

陈舸1, 李观海2, 杨朔1, 贾卫东3, 李粤平3, 郜艳晖4, 梁晓峰4   

  1. 1.中山大学公共卫生学院,广东 广州 510080;
    2.广东省结核病控制中心;
    3.广州市第八人民医院;
    4.暨南大学基础医学与公共卫生学院
  • 收稿日期:2024-05-09 发布日期:2025-03-18
  • 通讯作者: 郜艳晖,E-mail:gao_yanhui@163.com;梁晓峰,E-mail:liangxf@jnu.edu.cn
  • 作者简介:陈舸(1992—),男,在读博士研究生,主要从事流行病与卫生统计学研究
  • 基金资助:
    2022年康泰基金疫苗及传染病防控创新项目(KTR002)

Prediction models of the following year hospitalization risk and high medical cost for patients with hepatitis B cirrhosis

CHEN Ge1, LI Guanhai2, YANG Shuo1, JIA Weidong3, LI Yueping3, GAO Yanhui4, LIANG Xiaofeng4   

  1. 1. School of Public Health, Sun Yat-sen University, Guangzhou, Guangdong 510080, China;
    2. Centre for Tuberculosis Control of Guangdong Province;
    3. Guangzhou Eighth People's Hospital;
    4. School of Medicine, Jinan University
  • Received:2024-05-09 Published:2025-03-18

摘要: 目的 构建乙肝肝硬化患者住院风险及高医疗费用的预测模型,以提高患者管理和临床决策的科学性。方法 研究样本来源于广州市某传染病专科医院诊断为乙肝肝硬化的患者。通过随机抽样方法,将数据划分为训练集(70%)和验证集(30%)。针对类别不平衡问题,运用合成少数类过采样技术(SMOTE)算法均衡训练集后,通过随机森林算法结合logistic回归构建次年住院风险及高医疗费用的预测模型。随后,采用类别平衡数据集和验证数据集对模型进行验证,以评估其预测效能。结果 共纳入7 022例乙肝肝硬化患者,其中次年住院患者602例(8.57%),次年高医疗费用患者179例(2.55%)。随机森林算法、logistic回归预测模型结果显示,当年住院、总蛋白异常、白蛋白偏低是发生次年住院和次年高费用的危险因素(均OR 95% CI>1),高谷丙转氨酶、恩替卡韦使用分别是次年住院和次年高费用的保护因素(均OR 95% CI<1)。在类别平衡数据集中,次年住院风险预测模型的AUC为0.944,高医疗费用预测模型的AUC为0.962。在验证数据集中,次年住院风险预测模型的AUC为0.787,高医疗费用预测模型的AUC为0.857,模型具有良好的预测性能。结论 本研究构建的预测模型在预测乙肝肝硬化患者次年住院风险及高医疗费用方面表现出良好的性能,对于优化患者管理、降低医疗成本及提高医疗服务质量具有重要价值。

关键词: 乙型肝炎肝硬化, 随机森林, 住院风险, 高医疗费用

Abstract: Objective To construct prediction models of hospitalization risk and high medical cost for patients with hepatitis B cirrhosis, in order to improve the scientific nature of patient management and clinical decision-making. Methods The study samples were collected from patients diagnosed as hepatitis B cirrhosis in an infectious disease hospital in Guangzhou. The data were divided into a training set (70%) and a validation set (30%) by random sampling. In view of the class imbalance, SMOTE method was used to balance the training set, and the prediction models of hospitalization risk and high medical cost in the following year were established by the random forest algorithm combined with logistic regression. The models were then validated with class-balanced and validation datasets to evaluate its predictive effectiveness. Results This study included 7 022 patients with hepatitis B cirrhosis, of whom 602 (8.57%) were hospitalization in the following year, and 179 (2.55%) had high medical expenses in the following year. Random forest algorithm and logistic regression prediction models showed that hospitalization, abnormal total protein, and low albumin in the current year were risk factors for hospitalization and high cost in the following year (all OR 95% CI >1). Glutamic-pyruvic transaminase and entecavir use were protective factors for hospitalization and high costs in the following year (both OR 95% CI <1). In the class-balanced dataset, the AUC for the following year hospitalization risk prediction model was 0.944, and the AUC for the high medical cost prediction model was 0.962. In the validation dataset, the AUC of the following year hospitalization risk prediction model was 0.787, and the AUC of the high medical cost prediction model was 0.857, indicating good predictive performance. Conclusion The predictive models constructed in this study showed good performance in predicting the risk of hospitalization and high medical expenses in the following year of patients with hepatitis B cirrhosis, which are of great value for optimizing patient management, reducing medical costs, and improving the quality of medical services.

Key words: Hepatitis B cirrhosis, Random forest, Hospitalization risk, High medical cost

中图分类号: 

  • R195.4