An introduction to optical character recognition (OCR)

Suppose you want to digitise a print sales receipt, contract or newspaper article. You could get behind your computer and spend hours retyping your document and then correcting typing mistakes. Or, you could convert the print material into digital form within a few minutes using Optical Character Recognition software, abbreviated as OCR.


Optical Character Recognition (OCR) is a technology in the field of computer science that enables you to convert different types of printed records or documents into editable, searchable, digital form that a computer can manipulate. The printed records or documents include any number of scanned paper documents, PDF files, mail, receipts, invoices and images from a digital camera.

The OCR method of digitising printed text and translating images into electronic form is commonly used to store paper files and images in a form that the text or image can be displayed on the internet, stored more compactly, easily searched electronically and applied in machine processes like text mining, text-to-speech and machine translation.

How it works

Generally, OCR systems and applications recognise texts first by analysing the structure of the document or image. This entails dividing the page into various basic elements, such as tables, blocks of texts and images. Sentences are further divided into words and then into characters, while characters are singled out and compared against a set of pre-determined patterns.

The OCR program then advances different hypotheses to determine what the characters are. It analyses variants in the pattern involved in breaking of sentences into words and words into characters based on sophisticated probabilistic hypotheses. After processing the data, the program makes a decision and presents your text and images in a recognisable form.


OCR finds applications in many fields, including artificial intelligence, computer vision and pattern recognition research. OCR is particularly widely used in the legal profession to accomplish searches in a few minutes that once required hours or even days to complete,

An advanced OCR program with dictionary support provides even greater accurate in document recognition and analysis and simplifies verification of results.


OCR systems allow you to save a lot of time and effort when digitising, processing and repurposing various print documents. You can scan documents for further editing or for sharing with your social relations and business colleagues. Moreover, with an OCR system you can capture images from print magazines, posters and even banners and use them for your own purposes. The entire process of digitising print files produces copies that look exactly like the original and can take less than minute.

