Francisco J. Moreno-Barea, José M. Jerez, Leonardo Franco
Within the area of bioinformatics, Deep Learning (DL) models have shown exceptional results in applications in which histological images, scans and tomographies are used. However, when gene expression data are used, the performance often does not reach the expected results. The reason is that these datasets commonly have a high dimensionality and a low number of samples. To improve results in this type of data, Data Augmentation (DA) techniques can be used. DA techniques are methods that can generate synthetic samples from original data to increase the size of the dataset. In this work, three different DA techniques have been developed and tested on six different cancer datasets. Results show that DA techniques can improve classification results with significant improvements in sensitivity, specificity and F1-score when applied to cancer gene expression datasets.