Optical character Recognition (OCR) refers to the process of converting printed tamil text documents into software translated Unicode tamil text. The printed documents available in the form of books, projects, magazines etc are scanned using standard scanners which produce an image of the scanned documents. As part of the preprocessing phase the image like is checked for skewing. If the image is skewed, it is corrected by a simple rotation technique in the appropriate direction. Then the image is passed through a noise elimination phase and is binarized. The preprocessed image is segmented using an algorithm which decomposes the scanned text into paragraphs using special space detection technique and then the paragraphs into lines using vertical histograms, and lines into words using horizontal histograms, and words into character image glyphs using horizontal histograms. Each image glyph is comprised of 32 x 32 pixels, thus a data base of character image glyphs is created out of the segmentation phase. Then all the image glyphs are considered for recognition using Unicode mapping. Each image glyph is passed through various routines which extract the features of the glyph. The various features that are considered for classification are the character height, character width, then number of horizontal lines (Long and short), the number of vertical lines (long and short), the horizontally oriented curves, the vertically oriented curves, the number of circles, number of slope lines, image centroid and special dots. The glyphs are now set ready for classification based on these features. The extracted features are passed to a support vector machine (SVM) where the characters are classified by supervised learning algorithm. These classes are mapped into Unicode for recognition. Then the text is reconstructed using Unicode fonts.
📚 Over 50,000 Research Thesis
📱 100% Offline: No internet needed
📝 Over 98 Departments
🔍 Thesis-to-Journal Publication
🎓 Undergraduate/Postgraduate Thesis
📥 Instant Whatsapp/Email Delivery
This research focuses on developing a clear and practical framework for how courts and judges can better include digital evidence when making legal decisions. D...
This research focuses on developing a new way to evaluate risks in insurance by bringing together concepts from behavioral economics. Traditionally, insurance c...
This research aims to develop a comprehensive framework that helps manufacturing companies optimize their systems for sustainability while maintaining high effi...
This research aims to create a comprehensive and personalized approach to dietary interventions for people with diabetes. Diabetes management often involves rec...
This research focuses on understanding how post-colonial countries’ stories and perspectives have influenced international diplomacy during the 20th century. ...
This research focuses on creating a comprehensive model to help increase physical activity among teenagers. Adolescents often engage less in physical activity t...
This research aims to develop a comprehensive framework to improve how adolescents make career choices. Many young people face difficulty in selecting suitable ...
This research explores how to combine two different geophysical methods—seismic and electromagnetic (EM) surveys—to better understand what lies beneath the ...
This research aims to develop a structured framework to better combine mineralogical and geochemical data to improve understanding and modeling of ore deposits....