Habeeb, Imad Q. and Mohd Yusof, Shahrul Azmi and Ahmad, Faudziah (2014) Two bigrams based language model for auto correction of Arabic OCR errors. International Journal of Digital Content Technology and its Applications (JDCTA), 8 (1). pp. 72-80. ISSN 2233-9310
![]() |
PDF
Restricted to Registered users only Download (942kB) | Request a copy |
Abstract
In Optical character recognition (OCR), the characteristics of Arabic text cause more errors than in English text.In this paper, a two bi-grams based language model that uses Wikipedia's database is presented.The method can perform auto detection and correction of non-word errors in Arabic OCR text, and auto detection of real word errors. The method consists of two parts: extracting the context information from Wikipedia's database, and implement the auto detection and correction of incorrect words.This method can be applied to any language with little modifications.The experimental results show successful extraction of context information from Wikipedia's articles. Furthermore, it also shows that using this method can reduce the error rate of Arabic OCR text.
Item Type: | Article |
---|---|
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Divisions: | School of Computing |
Depositing User: | Dr. Shahrul Azmi Mohd. Yusof |
Date Deposited: | 11 Nov 2014 09:03 |
Last Modified: | 15 May 2016 01:07 |
URI: | https://repo.uum.edu.my/id/eprint/12602 |
Actions (login required)
![]() |
View Item |