Francisco J. Moreno-Barea, Leonardo Franco, David Elizondo, Martin Grootveld

School of Computer Science and Informatics, Faculty of Technology, De Montfort University, Leicester, United Kingdom

Leicester School of Pharmacy, Faculty of Health and Life Sciences, De Montfort University, Leicester, United Kingdom
Niemann-Pick Class 1 (NPC1) disease is a rare and neurodegenerative disease, and often metabolomics datasets of NPC1 patients are limited in the number of samples and severely imbalanced. In order to improve the predictive capability and identify new biomarkers in an NPC1 disease urinary dataset, data augmentation (DA) techniques based on computational intelligence are employed to create additional synthetic samples. This paper presents DA techniques, based on the addition of noise, on oversampling techniques and using conditional generative adversarial networks, to evaluate their predictive capacities on a set of Nuclear Magnetic Resonance (NMR) profiles of urine samples. Prediction results obtained show increases in sensitivity (30%) and in F score (20%). In addition, multivariate data analysis and variable importance in projection scores have been applied. These analyses show the ability of the DA methods to replicate the information of the metabolites and determined that selected metabolites (such as 3-aminoisobutyrate, 3-hidroxivaleric, quinolinate and trimethylamine) may be valuable biomarkers for the diagnosis of NPC1 disease.