Adaptive recognition of complex invoices based on Tesseract-OCR
Author:
Clc Number:

TP391

  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    An adaptive recognition method based on Tesseract-OCR engine is proposed to solve the problem of extracting and real-time recognition of specific table items in any region of complex invoices.First,the invoice image is preprocessed by OpenCV for filtering,adaptive threshold,etc.,to get a binary image.Then,the open operation in morphology is used to extract the global line segments and position of the table.The coordinates of the intersection points of the table is combined with the custom template to realize the adaptive adaptation between the table header and the content.Then the jTessBoxEditor is used to train and optimize the content of the table items,and finally the character recognition based on Tesseract-OCR is realized.The experimental results show that this method has high accurate recognition rate,supports the adaptive recognition of ROI (Region of Interest),and is highly available.

    Reference
    Related
    Cited by
Get Citation

SUN Ruibin, QIAN Kui, XU Weimin, LU Hong. Adaptive recognition of complex invoices based on Tesseract-OCR[J]. Journal of Nanjing University of Information Science & Technology,2021,13(3):349-354

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 18,2021
  • Online: June 25,2021
Article QR Code

Address:No. 219, Ningliu Road, Nanjing, Jiangsu Province

Postcode:210044

Phone:025-58731025