Generative augmentation to improve lung nodules detection in resource-limited settings
Keywords:
lung nodules classification, data augmentation, generative adversarial networks, StyleGAN, CT-imageAbstract
Introduction: Lung cancer is one of the most formidable cancers. The use of neural networks technologies in its diagnostics is promising, but the datasets collected from real clinical practice cannot cover a variety of lung cancer manifestations. Purpose: Assessment of the possibility of improving the classification of pulmonary nodules by means of generative augmentation of available datasets under resource constraints. Methods: We used part of LIDC-IDRI dataset, the StyleGAN architecture for generating artificial lung nodules and the VGG11 model as a classifier. We generated pulmonary nodules using the proposed pipeline and invited four experts to visually evaluate them. We formed four experimental datasets with different types of augmentation, including use of synthesized data, and we compared the effectiveness of the classification performed by the VGG11 network when training for each dataset. Results: 10 generated nodules in each group of characteristics were presented for assessment. In all cases, positive expert assessments were obtained with a Fleiss's kappa coefficient k = 0.6–0.9. We got the best values of ROCAUC=0.9604 and PRAUC=0.9625 with the proposed approach of a generative augmentation. Discussion: The obtained efficience metrics are superior to the baseline results obtained using comparably small training datasets, and slightly less than the best results achieved using much more powerful computational resources. So, we have shown that one can effectively use for augmenting an unbalanced dataset a combination of StyleGAN and VGG11, which does not require large computing resources as well as a large initial dataset for training.