Degraded Document Image Binarization Using Optical Character Recognition

Manimaraboopathy, M.; Anto Bennet, M.; Kalpana, M.; Premalatha, S.; Gayathri, G.

Volume 22, Issue 2, April 2016, Pages 304–311

Degraded Document Image Binarization Using Optical Character Recognition

BibTex | RIS | EndNote | RefWorks

@article{IJISR-16-112-06,
author = {M. Manimaraboopathy and M. Anto Bennet and M. Kalpana and S. Premalatha and G. Gayathri},
title = {{Degraded Document Image Binarization Using Optical Character Recognition}},
journal = {International Journal of Innovation and Scientific Research},
volume = {22},
year = {2016},
pages = {304--311},
issue = {2},
number = {2},
issn = {2351-8014},
url = {http://www.ijisr.issr-journals.org/abstract.php?article=IJISR-16-112-06},
abstract_html_url = {http://www.ijisr.issr-journals.org/abstract.php?article=IJISR-16-112-06},
pdf_url = {http://www.issr-journals.org/links/papers.php?journal=ijisr&application=pdf&article=IJISR-16-112-06},
document_type={Article},
source={www.issr-journals.org}
}

TY  - JOUR
ID  - 
TI  - Degraded Document Image Binarization Using Optical Character Recognition
AU  - M. Manimaraboopathy
AU  - M. Anto Bennet
AU  - M. Kalpana
AU  - S. Premalatha
AU  - G. Gayathri
PY  - 2016
VL  - 22
IS  - 2
SP  - 304
EP  - 311
JO  - International Journal of Innovation and Scientific Research
T2  - International Journal of Innovation and Scientific Research
SN  - 23518014
UR  - http://www.ijisr.issr-journals.org/abstract.php?article=IJISR-16-112-06
AB  - The proposed OCR algorithm to retrieve the text in the scanned document images. Here the text detection algorithm based on two machine learning classifiers: one allows generating candidate word regions and the other filters out non-text ones. The extract connected components (CCs) in images by using the maximally stable extremal region algorithm. In CC clustering adaboost classifiers are used to determine whether the region contains text or not. Then using  binarization method, the gray image is converted into binary image. The binarization outcomes are subject to OCR and the corresponding result is evaluated with respect to character and word accuracy. As more and more text documents are scanned fast and accurate. Additional performance metrics of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. This effectiveness of the proposed method is also confirmed by tests carried on realistic document images. For proposed algorithm MATLAB version 13 software is used.
ER  -

TY  - JOUR
ID  - 
TI  - Degraded Document Image Binarization Using Optical Character Recognition
AU  - M. Manimaraboopathy
AU  - M. Anto Bennet
AU  - M. Kalpana
AU  - S. Premalatha
AU  - G. Gayathri
PY  - 2016
VL  - 22
IS  - 2
SP  - 304
EP  - 311
JO  - International Journal of Innovation and Scientific Research
SN  - 23518014
AB  - 
The proposed OCR algorithm to retrieve the text in the scanned document images. Here the text detection algorithm based on two machine learning classifiers: one allows generating candidate word regions and the other filters out non-text ones. The extract connected components (CCs) in images by using the maximally stable extremal region algorithm. In CC clustering adaboost classifiers are used to determine whether the region contains text or not. Then using  binarization method, the gray image is converted into binary image. The binarization outcomes are subject to OCR and the corresponding result is evaluated with respect to character and word accuracy. As more and more text documents are scanned fast and accurate. Additional performance metrics of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. This effectiveness of the proposed method is also confirmed by tests carried on realistic document images. For proposed algorithm MATLAB version 13 software is used.
ER  -

RT Journal Article
ID IJISR-16-112-06
A1 M. Manimaraboopathy
A1 M. Anto Bennet
A1 M. Kalpana
A1 S. Premalatha
A1 G. Gayathri
YR 2016
T1 Degraded Document Image Binarization Using Optical Character Recognition
JF International Journal of Innovation and Scientific Research

Download

M. Manimaraboopathy¹, M. Anto Bennet², M. Kalpana³, S. Premalatha⁴, and G. Gayathri⁵

¹ Assistant Professor, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
² Professor, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
³ UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
⁴ UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India
⁵ UG Student, Department of Electronics and Communication Engineering, VELTECH, Chennai-600062, India

Original language: English

Copyright © 2016 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The proposed OCR algorithm to retrieve the text in the scanned document images. Here the text detection algorithm based on two machine learning classifiers: one allows generating candidate word regions and the other filters out non-text ones. The extract connected components (CCs) in images by using the maximally stable extremal region algorithm. In CC clustering adaboost classifiers are used to determine whether the region contains text or not. Then using binarization method, the gray image is converted into binary image. The binarization outcomes are subject to OCR and the corresponding result is evaluated with respect to character and word accuracy. As more and more text documents are scanned fast and accurate. Additional performance metrics of the percentage rates of broken and missed text, false alarms, background noise, character enlargement and merging. This effectiveness of the proposed method is also confirmed by tests carried on realistic document images. For proposed algorithm MATLAB version 13 software is used.

Author Keywords: Maximally Stable Extremal Regions(MSER), optical character recognition (OCR).

How to Cite this Article

M. Manimaraboopathy, M. Anto Bennet, M. Kalpana, S. Premalatha, and G. Gayathri, “Degraded Document Image Binarization Using Optical Character Recognition,” International Journal of Innovation and Scientific Research, vol. 22, no. 2, pp. 304–311, April 2016.

About IJISR

News

Submission

Downloads

Archives

Custom Search

Contact

Connect with IJISR

Degraded Document Image Binarization Using Optical Character Recognition

Abstract

How to Cite this Article