Corpus construction and named entity recognition for landslide geological hazards
Author:
Affiliation:

1.School of Remote Sensing and Geomatics Engineering,Nanjing University of Information Science and Technology;2.Beijing Harzard of Geological Disaster Prevention;3.School of Geographical Sciences,Nanjing University of Information Science and Technology,Nanjing,China;4.Anhui University Of Science and Technology;5.School of Geography,Nanjing Normal University,Nanjing

  • Article
  • | |
  • Metrics
  • |
  • Reference [34]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Extracting valuable landslide geohazard entities from massive texts describing landslide geological hazards is the basis for constructing a landslide geohazard knowledge graph. Based on relevant unstructured text data such as landslide geological hazard exploration report documents, this paper analyzes the text language description characteristics of landslide geological hazards based on the mechanism of landslide geological hazards, formulates an annotation system and annotation specifications for semantic information of landslide geological hazards, and constructs a corpus for the field of landslide geological hazards. Meanwhile, the entity recognition experiments based on the corpus show that the accuracy, recall and precision of the named entity recognition model reach more than 90%, which verifies the applicability of the corpus and provides strong data support for the subsequent research work of landslide geology knowledge mapping.

    Reference
    [1] . 许强,崔圣华,黄维,等.面向工程地质领域的滑坡知识图谱构建方法研究[J].武汉大学学报(信息科学版),2023,48(10):1601-1615.
    [2] . 吴龙华,朱月霞,侯振华,等.基于“互联网+”思路的地质大数据平台建设研究[J].中国矿业,2023,32(05):65-74.
    [3] . 魏东琦,江宝得,张静雅.非结构化地质数据内容存储方法研究[J].西北地质,2021,54(04):266-273.
    [4] . 史培军.五论灾害系统研究的理论与实践[J].自然灾害学报,2009,18(05):1-9.
    [5] . 杨腾飞,解吉波,李振宇,等.微博中蕴含台风灾害损失信息识别和分类方法[J].地球信息科学学报,2018,20(07):906-917.
    [6] . 胡段牧,袁武,牛方曲,等.中文文本蕴含气象灾害事件信息多模型融合抽取方法[J].地球信息科学学报,2022,24(12):2342-2355.
    [7] . 刘淑涵,王艳东,付小康.利用卷积神经网络提取微博中的暴雨灾害信息[J].地球信息科学学报,2019,21(07):1009-1017.
    [8] . 刘文聪,张春菊,汪陈等.基于BiLSTM-CRF的中文地质时间信息抽取[J].地球科学进展,2021,36(02):211-220.
    [9] . 张雪英,叶鹏,王曙等.基于深度信念网络的地质实体识别方法[J].岩石学报,2018,34(02):343-351.
    [10] . 张春菊,张磊,陈玉冰等.基于BERT的交互式地质实体标注语料库构建方法[J].地理与地理信息科学,2022,38(04):7-12.
    [11] . 赵继贵,钱育蓉,王魁,等.中文命名实体识别研究综述[J].计算机工程与应用,2024,60(01):15-27.
    [12] . 刘昕,徐洪珍,刘爱华,等.基于MacBERT和R-Drop的地质命名实体识别[J/OL].郑州大学学报(工学版):1-7[2024-04-08].
    [13] . 储德平,万波,李红,等.基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别[J].地球科学,2021,46(08):3039-3048.
    [14] . 王颖洁,张程烨,白凤波,等.中文命名实体识别研究综述[J].计算机科学与探索,2023,17(02):324-341.
    [15] . 邓依依,邬昌兴,魏永丰,等.基于深度学习的命名实体识别综述[J].中文信息学报,2021,35(09):30-45.
    [16] . 王刘坤,李功权.基于GeoERNIE-BiLSTM-Attention-CRF模型的地质命名实体识别[J].地质科学,2023,58(03):1164-1177.
    [17] . 陈忠良,袁峰,李晓晖,等.基于BERT—BiLSTM—CRF模型的中文岩石描述文本命名实体与关系联合提取[J].地质论评,2022,68(02):742-750.
    [18] . 储德平,万波,李红,等.基于ELMO-CNN-BiLSTM-CRF模型的地质实体识别[J].地球科学,2021,46(08):3039-3048.
    [19] . 简文彬,吴振祥.地质灾害及其防治[M].北京:人民交通出版社股份有限公司,2015:66-97
    [20] . 刘文聪.滑坡知识图谱构建及应用[D].合肥工业大学,2022.
    [21] . 舒良树等.普通地质学[M].北京:地质出版社,2010:101-118
    [22] . 易靖松,王峰,程英建,等.高山峡谷区地质灾害危险性评价——以四川省阿坝县为例[J].中国地质灾害与防治学报,2022,33(03):134-142.
    [23] . 杨立中,王高峰,王爱军,等.陇东黄土丘陵区滑坡形成机理分析——以环县西北地区为例[J].中国地质灾害与防治学报,2016,27(02):39-48.
    [24] . 叶旭光,王新刚,刘凯,等.典型降雨诱发型堆积层滑坡机理研究[J].西北大学学报(自然科学版),2024,54(01):101-110.
    [25] . 冯鸾鸾,李军辉,李培峰,等.面向国防科技领域的技术和术语语料库构建方法[J].中文信息学报,2020,34(08):41-50
    [26] . OGREN P,SAVOVA G,CHUTE C.Constructing evaluationcorporaforautomatedclinicalnamedentity recognition[C]//Proceedingsofthe12th WorldCongresson Health (Medical)Informatics.Marrakech, Morocco: European Language Resources Association,2008:2325-2330.
    [27] . ARTSTEIN,Ron,Poesio,etal.Inter-coderagreementforcomputationallinguistics[J].Computational Linguistics,2008,34:555-596.
    [28] . Y. Cui, W. Che, T. Liu, B. Qin and Z. Yang, "Pre-training with whole word masking for Chinese BERT", IEEE/ACM Trans. Audio Speech Language Process., vol. 29, pp. 3504-3514, 2021.
    [29] . 叶娅娟,胡斌,张坤丽,等.糖尿病电子病历实体及关系标注语料库构建[J].中文信息学报,2023,37(12):17-25.
    [30] . 谢雪景,谢忠,马凯,等.结合BERT与BiGRU-Attention-CRF模型的地质命名实体识别[J].地质通报,2023,42(05):846-855.
    [31] . 温雯,伍思杰,蔡瑞初,等.面向专业文献知识实体类型的抽取和标注[J].中文信息学报,2018,32(01):102-115.
    [32] . 邱芹军,王斌,徐德馨,等.地质领域文本实体关系联合抽取方法[J].高校地质学报,2023,29(03):419-428.
    [33] . Chen, s., Hua, w., Liu, X., Deng, X.,Zeng, X., Duan, J. Chinesefine-grained geological named entityrecognition with rules and FLAT. Earthand Space Science, 9, e2022EA002617.2022
    [34] . Qiu, Q., Xie, Z., Wu, L. et al. BiLSTM-CRF for geological named entity recognition from the geoscience literature. Earth Sci Inform 12, 565–579 ,2019.
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:85
  • PDF: 0
  • HTML: 0
  • Cited by: 0
History
  • Received:April 29,2024
  • Revised:May 21,2024
  • Adopted:May 27,2024
Article QR Code

Address:No. 219, Ningliu Road, Nanjing, Jiangsu Province

Postcode:210044

Phone:025-58731025