Login / Signup

On the generation of realistic synthetic petrographic datasets using a style-based GAN.

Ivan FerreiraLuis OchoaArdiansyah Koeshidayatullah
Published in: Scientific reports (2022)
Deep learning architectures have transformed data analytics in geosciences, complementing traditional approaches to geological problems. Although deep learning applications in geosciences show encouraging signs, their potential remains untapped due to limited data availability and the required in-depth knowledge to provide a high-quality labeled dataset. We approached these issues by developing a novel style-based deep generative adversarial network (GAN) model, PetroGAN, to create the first realistic synthetic petrographic datasets across different rock types. PetroGAN adopts the architecture of StyleGAN2 with adaptive discriminator augmentation (ADA) to allow robust replication of statistical and esthetical characteristics and improve the internal variance of petrographic data. In this study, the training dataset consists of > 10,000 thin section images both under plane- and cross-polarized lights. Here, using our proposed novel approach, the model reached a state-of-the-art Fréchet Inception Distance (FID) score of 12.49 for petrographic images. We further observed that the FID values vary with lithology type and image resolution. The generated images were validated through a survey where the participants have various backgrounds and level of expertise in geosciences. The survey established that even a subject matter expert observed the generated images were indistinguishable from real images. This study highlights that GANs are a powerful method for generating realistic synthetic data in geosciences. Moreover, they are a future tool for image self-labeling, reducing the effort in producing big, high-quality labeled geoscience datasets. Furthermore, our study shows that PetroGAN can be applied to other geoscience datasets, opening new research horizons in the application of deep learning to various fields in geosciences, particularly with the presence of limited datasets.
Keyphrases