ABSTRACT
The major aim of this work is to develop an efficient and effective k-means algorithm to cluster malaria microarray data to enable the extraction of a functional relationship of genes for malaria treatment discovery. However, traditional k-means and most k-means variants are still computationally expensive for large datasets such as microarray data, which have large datasets with a large dimension size d. Huge data is generated and biologists have the challenge of extracting useful information from volumes of microarray data. Firstly, in this work, we develop a novel k-means algorithm, which is simple but more efficient than the traditional k-means and the recent enhanced k-means. Using our method, the new k-means algorithm is able to save significant computation time at each iteration and thus arrive at an O(nk2) expected run time. Our new algorithm is based on the recently established relationship between principal component analysis and the k-means clustering. We further prove that our algorithm is correct theoretically. Results obtained from testing the algorithm on three biological data and three non-biological data also indicate that our algorithm is empirically faster than other known k-means algorithms. We assessed the quality of our algorithm clusters against the clusters of known structure using the Hubert-Arabie Adjusted Rand index (ARIHA), we found that when k is close to d, the quality is good (ARIHA > 0.8) and when k is not close to d, the quality of our new k-means algorithm is excellent (ARIHA > 0.9). We compare three different k-means algorithms including our novel Metric Matrics k-means (MMk-means), results from an in-vitro microarray data with the classification from an in-vivo microarray data in order to perform a comparative functional classification of P. falciparum genes and further validate the effectiveness of our MMk-means algorithm. Results from this study indicate that the resulting distribution of the comparison of the three algorithmsβ in- vitro clusters against the in-vivo clusters is similar, thereby authenticating our MMk-means method and its effectiveness. Lastly using clustering, R programming (with Wilcoxon statistical test on this platform) and the new microarray data of P. yoelli at the liver stage and the P. falciparum microarray data at the blood stages, we extracted twenty nine (29) viable P. falciparum and P. yoelli genes that can be used for designing a Polymerase Chain Reaction (PCR) primer experiment for the detection of malaria at the liver stage. Due to the intellectual property right, we are unable to list these genes here.
π Over 50,000 Research Thesis
π± 100% Offline: No internet needed
π Over 98 Departments
π Thesis-to-Journal Publication
π Undergraduate/Postgraduate Thesis
π₯ Instant Whatsapp/Email Delivery
The project titled "Applying Machine Learning Techniques to Detect Financial Fraud in Online Transactions" aims to address the critical issue of detec...
The project titled "Anomaly Detection in IoT Networks Using Machine Learning Algorithms" focuses on addressing the critical challenge of detecting ano...
The project titled "Applying Machine Learning Algorithms for Predicting Stock Market Trends" aims to explore the application of machine learning algor...
The project titled "Applying Machine Learning Algorithms for Sentiment Analysis in Social Media Data" focuses on utilizing machine learning algorithms...
The project titled "Applying Machine Learning for Predictive Maintenance in Industrial IoT Systems" focuses on leveraging machine learning techniques ...
The project, "Implementation of a Machine Learning Algorithm for Predicting Stock Prices," aims to leverage the power of machine learning techniques t...
The project titled "Development of an Intelligent Traffic Management System using Machine Learning Algorithms" aims to revolutionize the traditional t...
No response received....
The project titled "Applying Machine Learning for Intrusion Detection in IoT Networks" aims to address the increasing cybersecurity threats targeting ...