基于C5.0算法的胃癌生存预测模型研究
作者:
基金项目:

国家自然科学基金(71473039)


Gastric cancer prediction model based on C5.0 classification algorithm
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • | | |
  • 文章评论
    摘要:

    我国的胃癌发病率高,每年新增胃癌患者占全世界每年新增数量的42%,胃癌成为我国恶性肿瘤防控的重点.本文针对胃癌数据的特征,给出数据预处理和集成方法;采用C5.0分类算法,构建了胃癌生存预测模型,并首次采用美国癌症研究所的SEER数据库进行预测实验.实验结果表明:C5.0预测的精确度、特异性均高于BP-神经网络算法;胃癌患者的出生地点与最终的存活状态之间存在较强的相关性.该研究是数据挖掘技术在医学领域的一个实际应用,对胃癌的临床诊断具有一定的参考价值,可为医生制定合理的治疗和预防方案提供一定参考.

    Abstract:

    The incidence of gastric cancer is very high in China,and the number of new patients diagnosed with gastric cancer accounts for 42% of that of the whole world every year,so gastric cancer has become the focus of the prevention and control of malignant tumors in China.In this paper,the C5.0 classification algorithm is used to predict the survival rate of gastric cancer,and experiments are carried out using the SEER database of the American National Cancer Institute.The data preprocessing and data integration methods are given according to the unbalanced characteristics of gastric cancer record data.The prediction experimental results show that,the accuracy and specificity of C5.0 algorithm are high compared with BP-neural network method;and there is an obvious correlation between birth place and survival state of gastric cancer patients.This study is a practical application of data mining technology in the field of medicine,which has certain reference value for the clinical diagnosis of gastric cancer;it can provide reference for doctors to formulate reasonable treatment and prevention program.

    参考文献
    [1] 王欣萍,李燕.数据挖掘技术于医学电子病历系统的应用[J].现代预防医学,2008,35(13):2450-2451 WANG Xinping,LI Yan.Application of data mining technology in electronic medical records system[J].Modern Preventive Medicine,2008,35(13):2450-2451
    [2] Kusiak A,Kernstine K H,Kern J A,et al.Data mining:Medical and engineering case studies[C]//Proceedings of the Industrial Engineering Research 2000 Conference,Cleveland,Ohio,2000:1-7
    [3] Fiasché M,Cuzzola M,Fedele R,et al.Computational intelligence methods for discovering diagnostic gene targets about aGVHD[J].Frontiers in Artifical Intelligence & Applications,2009,204:271-280
    [4] 赵一鸣.分类与回归树:一种适用于临床研究的统计分析方法[J].北京大学学报(医学版),2001,33(6):562-565 ZHAO Yiming.Classification and regression trees:A statistical method suitable for clinical researches[J].Journal of Peking University (Health Sciences),2001,33(6):562-565
    [5] 孙清,鞠建峰,曲庆美,等.支持向量机在胃癌诊断预测中的应用[J].食品与药品,2010,12(11):401-404 SUN Qing,JU Jianfeng,QU Qingmei,et al.Application of support vector machine in prediction of gastric cancer[J].Food and Drug,2010,12(11):401-404
    [6] 马梦妍.基于数据挖掘的舒鹏教授治疗胃癌临床病案的回顾性研究[D].南京:南京中医药大学基础医学院,2016 MA Mengyan.A retrospective study based on data mining,Professor Shupeng clinical case of treatment of gastric cancer[D].Nanjing:College of Basic Medicine,Nanjing University of Chinese Medicine,2016
    [7] 王泽明,柴可群,陈嘉斌.基于数据挖掘的柴可群治疗胃癌用药规律研究[J].江西中医药大学学报,2017,29(1):38-41 WANG Zeming,CHAI Kequn,CHEN Jiabin.Analysis on the medication rules of CHAI Kequn for the treatment of gastric cancer based on data mining[J].Journal of Jiangxi University of Traditional Chinese Medicine,2017,29(1):38-41
    [8] 郭佳栋,张雪梅,刘影,等.基于数据挖掘技术对胃癌化疗药物不良反应关联性研究[J].药物流行病学杂志,2017,26(1):46-49 GUO Jiadong,ZHANG Xuemei,LIU Ying,et al.Correlation analysis of gastric cancer chemotherapy drugs adverse drug reaction based on data mining technology[J].Chinese Journal of Pharmacoepidemiology,2017,26(1):46-49
    [9] Wiggins M C,Firpi H A,Blanco R R,et al.Prediction of atrial fibrillation following cardiac surgery using rough set derived rules[J]//Conf Proc IEEE Eng Med Biol Soc,2006,1(1):4006-4009
    [10] Das S K,Zhou S M,Zhang J N,et al.Predicting lung radiotherapy-induced pneumonitis using a model combining parametric Lyman probit with nonparametric decision trees[J].International Journal of Radiation Oncology Biology Physics,2007,68(4):1212-1221
    [11] 李辉,王金莲.基于基因表达谱的肿瘤预测模型研究[J].电子学报,2008,36(5):989-992 LI Hui,WANG Jinlian.Study of tumor molecular prediction model based on gene expression profiles[J].Acta Electronica Sinica,2008,36(5):989-992
    [12] Tung W L,Quek C.GenSo-FDSS:A neural-fuzzy decision support system for pediatric ALL cancer subtype identification using gene expression data[J].Artificial Intelligence in Medicine,2005,33(1):61-88
    [13] Mitra P,Mitra S,Pal S K.Evolutionary modular MLP with rough sets and ID3 algorithm for staging of cervical cancer[J].Neural Computing and Applications,2001,10(1):67-76
    [14] 李建更,贺益恒,郭庆雷.基于多数据集的胃癌亚型标志基因选择[J].北京工业大学学报,2013,39(10):1590-1595 LI Jiangeng,HE Yiheng,GUO Qinglei.Marker gene selection of gastric cancer subtype based on multi microarray data sets[J].Journal of Beijing University of Technology,2013,39(10):1590-1595
    [15] Prather J C,Lobach D F,Goodwin L K,et al.Medical data mining:Knowledge discovery in a clinical data warehouse[C]//Proc AMIA Annu Fall Symp,1997:101-105
    [16] Dutau G,Micheau P,Juchet A,et al.Chronic cough in children:Etiology and decision trees[J].Archives de Pediatrie:Organe Officiel de la Societe Francaise de Pediatrie,2001,8(sup 3):610-622
    [17] Zhang X P,Wang Z L,Tang L,et a1.Support vector machine model for diagnosis of lymph node metastasis in gastric cancer with multidetector computed tomography:A preliminary study[J].BMC Cancer,2011,11(1):1-6
    [18] 王文文,周涛,陆惠玲,等.基于聚类和支持向量机的胃癌患者住院费用建模[J].中国初级卫生保健,2016,30(2):1-4 WANG Wenwen,ZHOU Tao,LU Huiling,et al.A new model for hospitalization expenses of gastric cancer based on clustering and support vector machine[J].Chinese Primary Health Care,2016,30(2):1-4
    [19] 王永川,魏丽娟,刘俊田,等.发达与发展中国家癌症发病率与死亡率的比较与分析[J].中国肿瘤临床,2012,39(10):679-682 WANG Yongchuan,WEI Lijuan,LIU Juntian,et al.Comparison and analysis of the incidence and mortality rate of cancer in developed and developing countries[J].Chinese Journal of Clinical Oncology,2012,39(10):679-682
    [20] 王晓瑜.胃癌研究相关文献热点变化分析[J].临床军医杂志,2015,43(9):955-959 WANG Xiaoyu.A biblimentric analysis on gastric cancer research literature[J].Clinical Journal of Medical Officers,2015,43(9):955-959
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

黄志刚,刘虹,刘娟,张岐山.基于C5.0算法的胃癌生存预测模型研究[J].南京信息工程大学学报(自然科学版),2017,9(4):406-410
HUANG Zhigang, LIU Hong, LIU Juan, ZHANG Qishan. Gastric cancer prediction model based on C5.0 classification algorithm[J]. Journal of Nanjing University of Information Science & Technology, 2017,9(4):406-410

复制
分享
文章指标
  • 点击次数:1014
  • 下载次数: 2059
  • HTML阅读次数: 0
  • 引用次数: 0
历史
  • 收稿日期:2017-06-28
  • 在线发布日期: 2017-07-11

地址:江苏省南京市宁六路219号    邮编:210044

联系电话:025-58731025    E-mail:nxdxb@nuist.edu.cn

南京信息工程大学学报 ® 2025 版权所有  技术支持:北京勤云科技发展有限公司