Home / Computer Science / Neural network for unicode optical character recognition

Neural network for unicode optical character recognition

 

Table Of Contents


Title page         –       –       –       –       –       –       –      –       ii

Certification    –       –       –       –       –       –       –      –       iii

Approval page         –       –       –       –       –       –      –       iv

Dedication       –       –       –       –       –       –       –      –       v

Acknowledgement   –       –       –       –       –       –       –      vi

Abstract –       –       –       –       –       –       –       –      –       vii

Table of contents     –       –       –       –       –       –      –       ix

Chapter ONE

 1.0 INTRODUCTION    –      –      –      –      –      –       1

1.1      Statement of the problem       –       –       –       –       5

1.2      Purpose of the study       –       –       –       –       –      6

1.3      Aims and objectives        –       –       –       –       –      6

1.4      Scope of study         –       –       –       –       –       –      8

1.5      Limitations of the study –       –       –       –       –       8

1.6      Definition of terms.-       –       –       –       –       –       9

Chapter TWO

 2.0 LITERATURE REVIEW –      –      –      –      –      11

Chapter THREE

3.0      Methods for fact finding and details discussions on the subject matter.        –       –       –       –       –      –       15

3.1      Methodologies for fact finding         –       –       –      15

3.2      Discussions     –       –       –       –       –       –       –      16

Chapter FOUR

4.0      Futures, Implications and challenges of the subject matter for the society             –       –       –       –      20

4.1      Futures   –       –       –       –       –       –       –       –      20

4.2      Implications    –       –       –       –       –       –       –      21

4.3      Challenges      –       –       –       –       –       –       –      22

Chapter FIVE

5.0      SUMMARY, RECOMMENDATION AND CONCLUSION 24

5.1      Summary        –       –       –       –       –       –       –      24

5.2      Recommendation    –       –       –       –       –       –      25

5.3      Conclusion      –       –       –       –       –       –       –      28

References       –       –       –       –       –       –      –       30


Thesis Abstract

Neural networks have shown tremendous potential in the field of Optical Character Recognition (OCR) for various scripts and languages. Unicode, a standard for consistent encoding, encompasses a wide range of characters from different writing systems globally. Recognition of Unicode characters poses a significant challenge due to the vast diversity and complexity of characters. This research focuses on developing a neural network for Unicode OCR, specifically targeting the recognition of characters from different languages and scripts encoded in Unicode. The neural network architecture consists of multiple layers, including convolutional layers for feature extraction, followed by recurrent layers for sequence modeling. The use of convolutional layers allows the network to capture spatial patterns within the characters, while recurrent layers enable the model to learn the sequential dependencies within the characters. Training the neural network involves feeding it with labeled Unicode character images to learn the mapping between the input images and their corresponding Unicode labels. The network is trained using a large dataset of annotated Unicode characters to ensure robustness and generalization. Data augmentation techniques are employed to increase the diversity of the training dataset and improve the model's performance on unseen data. The proposed neural network architecture is optimized using various techniques such as dropout regularization to prevent overfitting and gradient descent optimization for efficient training. Hyperparameter tuning is performed to enhance the model's performance and convergence speed. The network is evaluated on a separate test dataset to measure its accuracy and performance metrics such as precision, recall, and F1 score. Experimental results demonstrate the effectiveness of the neural network in recognizing Unicode characters with high accuracy across different languages and scripts. The model shows robustness to variations in font styles, sizes, and noise levels, making it suitable for real-world applications where diverse Unicode characters are encountered. The use of neural networks for Unicode OCR opens up new possibilities for multilingual text recognition and document processing. By leveraging the power of deep learning, this research contributes to advancing the field of OCR for Unicode characters and paves the way for developing more sophisticated systems capable of handling complex scripts and languages with high accuracy and efficiency.

Thesis Overview

1.0 INTRODUCTION

Character is the basic building block of any language that is used to build different structures of a language. Characters are the alphabets and the structures are the words, strings and sentences.

Optical character Recognition (OCR) is the process of converting an image of text, such as a scanned project character, document or electronic fax file, into computer-editable text. The text in an image is not editable. The letters/characters are made of tiny dots (pixels) that together form a picture of text. During OCR, the software analyzes an image and converts the pictures of the characters to editable text based on the patterns of the pixels in the image. After OCR, you can expert the converted text and use it with a variety of word-processing, page layout and spreadsheet applications. OCR also enables screen readers and refreshable bralle displays to read the text contained in images.

Optical character Recognition (OCR) deals with machine recognition of characters present in an input image obtained using scanning operation. It refers to the process by which scanned images are electronically processed and converted to an editable text. The need for OCR arises in the context of digitizing tamil documents from the ancient and old era to the latest, which helps in sharing the data through the internet.

A properly printed document is chosen for scanning. It is placed over the scanner, A scanner software is invoked which scans the document. The document is sent to a program that saves it in preferably TIF, JPG or GIF format, so that the image of the document can be obtained when needed. This is the first step in OCR (Vijaya Kumar, 2001), the size of the input image is as specific by the user and can be of any length but is inherently restricted by the scope of the vision and by the scanner software length.

This is the first step in the processing of scanned image. The scanned image is checked for skewing, there are possibilities of image getting skewed with either left or right orientation.

Here, the image is first brightened and binarized the function for skew detection checks for an angle of orientation between +15 degrees and if detected than a simple image rotation is carried out till the lines match with the true horizontal axis, which produce a skew corrected image.

After pre-processing, the noise free image is passed to the segmentation phase, where the image is decomposed into individual characters.

Algorithm for Segmentation


Blazingprojects Mobile App

📚 Over 50,000 Research Thesis
📱 100% Offline: No internet needed
📝 Over 98 Departments
🔍 Thesis-to-Journal Publication
🎓 Undergraduate/Postgraduate Thesis
📥 Instant Whatsapp/Email Delivery

Blazingprojects App

Related Research

Computer Science. 4 min read

Applying Machine Learning Techniques to Detect Financial Fraud in Online Transaction...

The project titled "Applying Machine Learning Techniques to Detect Financial Fraud in Online Transactions" aims to address the critical issue of detec...

BP
Blazingprojects
Read more →
Computer Science. 3 min read

Anomaly Detection in IoT Networks Using Machine Learning Algorithms...

The project titled "Anomaly Detection in IoT Networks Using Machine Learning Algorithms" focuses on addressing the critical challenge of detecting ano...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Applying Machine Learning Algorithms for Predicting Stock Market Trends...

The project titled "Applying Machine Learning Algorithms for Predicting Stock Market Trends" aims to explore the application of machine learning algor...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Applying Machine Learning Algorithms for Sentiment Analysis in Social Media Data...

The project titled "Applying Machine Learning Algorithms for Sentiment Analysis in Social Media Data" focuses on utilizing machine learning algorithms...

BP
Blazingprojects
Read more →
Computer Science. 3 min read

Applying Machine Learning for Predictive Maintenance in Industrial IoT Systems...

The project titled "Applying Machine Learning for Predictive Maintenance in Industrial IoT Systems" focuses on leveraging machine learning techniques ...

BP
Blazingprojects
Read more →
Computer Science. 2 min read

Implementation of a Machine Learning Algorithm for Predicting Stock Prices...

The project, "Implementation of a Machine Learning Algorithm for Predicting Stock Prices," aims to leverage the power of machine learning techniques t...

BP
Blazingprojects
Read more →
Computer Science. 4 min read

Development of an Intelligent Traffic Management System using Machine Learning Algor...

The project titled "Development of an Intelligent Traffic Management System using Machine Learning Algorithms" aims to revolutionize the traditional t...

BP
Blazingprojects
Read more →
Computer Science. 2 min read

Anomaly Detection in Network Traffic Using Machine Learning Algorithms...

No response received....

BP
Blazingprojects
Read more →
Computer Science. 2 min read

Applying Machine Learning for Intrusion Detection in IoT Networks...

The project titled "Applying Machine Learning for Intrusion Detection in IoT Networks" aims to address the increasing cybersecurity threats targeting ...

BP
Blazingprojects
Read more →
WhatsApp Click here to chat with us