Using Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets
University: Munster Technological University.
Program: Master's in Artificial Intelligence - 2021.
Role: Supervisor.
Level: Master.
Location: Cork, Ireland.
Status: Finished
Description:
This thesis evaluated and proved the effectiveness of using a bi-objective Genetic Algo- rithm fitness function to select subsets of highly predictive features (microbes) against 4 different microbiome datasets for disease or host condition classification (Colorectal Cancer, Bacterial Vaginosis, Vagina Black/White, Liver Cirrhosis). Using a fitness func- tion that merges the actual classification performance and the chromosome size of an individual into a single metric is essential for guiding the GA search towards finding high-performance feature subsets. Using the proposed fitness function allowed to build classifiers that outperformed the baseline classification performance by using only 0.13% to 1.87% of the total dataset features. Where classifiers without such a feature selection already achieved almost perfect results, the proposed method performed ’on par’. The results of this study also highlight that working with absolute or relative microbe abundance counts was sufficient to achieve the best results per classification task. In contrast, Centered Log Ratio transformed datasets resulted in less performant feature subsets across all experiments and tasks analyzed throughout this research project. Selecting the most important microbes for predicting a host condition inherently contributes to building more explainable and interpretable prediction results.