Develop an Arabic Omnifont OCR System

Type of Service: 
Software Development and Customization
Country: 
Saudi Arabia
Industry: 
Technical & Scientific Research Services
Duration: 
6 months
Challenge: 

To develop an Arabic Optical Character Recognition application based on a hidden Markov model (HMM) toolkit called HTK. The system had to automatically and flexibly recognize printed Arabic text using different fonts and different sizes, something that was traditionally more challenging (especially different fonts). The system was also required to recognize handwritten Arabic text. The HTK open-source toolkit was expected to be customized to facilitate the recognition tasks.

The client was the largest dedicated research center in the country. This explains the last objective: to explore new directions and tracks in Arabic OCR systems in order to participate in an international competition and well-known conferences and journals.

Solution: 

Systematic Computer Science built a stand alone application to meet the client's requirements. The application allowed for training and recognizing. It also provided lots of supplementary performance-related information such as the time consumption for each process, and statistical parameters useful for comparison with other OCR systems.

The system was submitted to an international competition for Arabic OCR. The results were interesting and many tracks appear promising for further development.

Benefits: 

The key benefits of this development/research task for the client:

  • we delivered a satisfactory software system that is easily customizable to recognize other fonts and sizes
  • we facilitated the integration of new researchers to follow up on our work
  • we provided a documented system to help other researchers understand the existing systems